## Example of querying the vLLM Inference server with Mistral-7b, Langchain and a custom Prompt

### Set the Inference server url (replace with your own address)

In [1]:
inference_server_url = "https://vllm-mistral-7b-predictor-vllm-mistral-7b.apps.ocp-ia.jccm.es"

## Preparation

In [2]:
!pip install -q langchain==0.1.9 openai==1.13.3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# Imports
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_community.llms import VLLMOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

#### Create the LLM instance

In [10]:
# LLM definition
llm = VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base= f"{inference_server_url}/v1",
    model_name="/mnt/models/",
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

#### Create the Prompt

In [11]:
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
{history}
Human: {input}
AI:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)

#### And finally some memory for the conversation

In [12]:
memory=ConversationBufferMemory()

## First example, fully verbose mode

#### Create the Chain using the different components

In [13]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=True,
                                 memory=memory
                                )

#### Let's talk...

In [14]:
first_input = "Describe Paris in 100 words or less."
conversation.predict(input=first_input);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:

Human: Describe Paris in 100 words or less.
AI:
[/INST]
[0m
Paris, the City of Light, is an enchanting metropolis renowned for its iconic landmarks, rich history, and exquisite art scene. Known as the "City of Love," it's home to the Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, and Montmartre. Parisian streets are filled with charming cafes, patiss

In [15]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a river?"
conversation.predict(input=second_input);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
Human: Describe Paris in 100 words or less.
AI: Paris, the City of Light, is an enchanting metropolis renowned for its iconic landmarks, rich history, and exquisite art scene. Known as the "City of Love," it's home to the Eiffel Tower, Louvre Museum, Notre-Dame Cathedral, and Montmartre. Parisian streets are filled with charming cafes, patisseries, and bou

### Second example, no verbosity

In [16]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation2 = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=False,
                                 memory=memory
                                )

In [17]:
# Let's clear the previous conversation first
memory.clear()

In [18]:
first_input = "Can you describe London in 100 words?"
conversation2.predict(input=first_input);

London, the vibrant capital city of the United Kingdom, is a global cultural hub teeming with history, art, and modern innovation. Known for iconic landmarks like the Tower of London, Buckingham Palace, and the British Museum, it's a city rich in heritage. The River Thames winds through its heart, offering scenic views and transportation via the iconic red double-decker buses and the Tube. London's diverse neighborhoods, from Soho's bustling streets to the tranquil Regent's Park, showcase its multicultural population and world-class cuisine. Its energy and diversity make it an unforgettable destination.

In [19]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a river?"
conversation2.predict(input=second_input);

Yes, the River Thames runs through London. It's an essential part of the city's landscape and history, providing scenic views and transportation via various watercraft.