## Working with Langchain



### Requirements and Imports


In [None]:
!pip install openai langchain langchain_community

In [None]:
from langchain.chains import LLMChain
from langchain_community.llms import VLLMOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

### Langchain

Langchain (https://www.langchain.com/) is a framework for developing applications powered by language models. It will take care for us of all the boilerplate code we would have to manually write to properly query an LLM.

We will start by creating an **llm** instance, defined by the location where the LLM API can be queried and some parameters that will be applied to the model. For example, `max_new_tokens` will instruct the model to answer with a maximum of 512 tokens (words or parts of words). `temperature`, set really low here, will instruct the model to stay truth-grounded, and not try to be too "creative". After all, we're not trying to write a fancy poem here!

In [None]:
# LLM Inference Server URL
infer_endpoint = "https://model-vllm.apps.clusterx.sandboxx.opentlc.com"
model_name = "served-model-name"
api_key = "EMPTY"

# LLM definition
llm = VLLMOpenAI(
    openai_api_key=api_key,
    openai_api_base= f"{infer_endpoint}/v1", #be sure to use /v1
    model_name=f"{model_name}",
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

We also need a **template** to be applied to every request we are sending to the model (the "Prompt").

When querying a model, you almost never want to send directly what the user has typed. On top of this entry, you need to give proper instructions to the model so that it knows how to handle it: what and how to answer, what NOT to answer, the tone it must use...

In [None]:
# This is a prompt for Granite 3.  Replace the prompt for your LLM
template="""<|start_of_role|>system<|end_of_role|>
I am Granite 3 8B Instruct, an AI language model.
My primary function is to be a chat assistant.
<|start_of_role|>user<|end_of_role|>
Answer the following question.
Question:
{question}
Answer:
<|start_of_role|>assistant<|end_of_role|>
"""
PROMPT = PromptTemplate(input_variables=["input"], template=template)

Langchain allows us to now easily "stitch" those elements together and create a **conversation** object that we will use to query the model.

In [None]:
conversation = LLMChain(llm=llm,
                        prompt=PROMPT,
                        verbose=False
                        )

We are now ready to query the model!

In [None]:
query = "What is Artificial Intelligence?"

conversation.predict(input=query); # ";" at the end of the line hides final output (repetion of the streamed answer)

You can come back to this notebook at section 3.7 for some optional exercises if you want.