# RAG example with vLLM (Mistral-7b-Instruct-v0.2), Milvus and LangChain

Prerequisites:
- a vLLM inference endpoint
- a Milvus instance with some defined collection (e.g. RHOAI documentation)
- configured the Milvus credentials in the Workbench environment using the RHOAI dashboard

### Install required library depencencies

In [1]:
!pip install -q einops==0.7.0 langchain==0.1.9 pymilvus==2.3.6 sentence-transformers==2.4.0 openai==1.13.3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
from langchain.callbacks.base import BaseCallbackHandler
from langchain.chains import RetrievalQA
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms import VLLMOpenAI
from langchain.prompts import PromptTemplate
from langchain_community.vectorstores import Milvus

### Base parameters initialization

In [3]:
# Replace values according to your Milvus deployment
INFERENCE_SERVER_URL = "https://test-llm10-test-llm.apps.cluster-bjzm6.bjzm6.sandbox2625.opentlc.com"
MODEL_NAME = "/mnt/models/"
MAX_TOKENS=1024
TOP_P=0.95
TEMPERATURE=0.01
PRESENCE_PENALTY=1.03
MILVUS_HOST = "vectordb-milvus.milvus.svc.cluster.local"
MILVUS_PORT = 19530
MILVUS_USERNAME = os.getenv('MILVUS_USERNAME')
MILVUS_PASSWORD = os.getenv('MILVUS_PASSWORD')
MILVUS_COLLECTION = "demo_collection"

### You need the SSL cert if you have self-signed certs for your LLM inference endpoint

In [4]:
import os
import socket
import OpenSSL
import socket
from cryptography.hazmat.primitives import serialization
from urllib.parse import urlparse

def save_srv_cert(host, port=443) :
    dst = (host, port)
    sock = socket.create_connection(dst)
    context = OpenSSL.SSL.Context(OpenSSL.SSL.SSLv23_METHOD)
    connection = OpenSSL.SSL.Connection(context, sock)
    connection.set_tlsext_host_name(host.encode('utf-8'))
    connection.set_connect_state()
    try:
        connection.do_handshake()
        certificate = connection.get_peer_certificate()
    except:
        certificate = connection.get_peer_certificate()
    pem_file = certificate.to_cryptography().public_bytes(serialization.Encoding.PEM)
    cert_filename = f"cert-{host}.cer"
    with open(cert_filename, "w") as fout:
        fout.write(pem_file.decode('utf8'))
    return cert_filename

# Extract the hostname"
hostname = urlparse(INFERENCE_SERVER_URL).netloc
os.environ["SSL_CERT_FILE"] = save_srv_cert(hostname, port=443)

### Initialize the embeddings model, and the vector DB connection

In [5]:
model_kwargs = {'trust_remote_code': True}
embeddings = HuggingFaceEmbeddings(
    #model_name="nomic-ai/nomic-embed-text-v1",
    model_kwargs=model_kwargs,
    show_progress=False
)

store = Milvus(
    embedding_function=embeddings,
    connection_args={"host": MILVUS_HOST, "port": MILVUS_PORT, "user": MILVUS_USERNAME, "password": MILVUS_PASSWORD},
    collection_name=MILVUS_COLLECTION,
    metadata_field="metadata",
    text_field="page_content",
    drop_old=False
    )

You try to use a model that was created with version 2.4.0.dev0, however, your version is 2.4.0. This might cause unexpected behavior or errors. In that case, try to update to the latest version.



<All keys matched successfully>


### Setup a prompt for the LLM

In [6]:
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant named HatBot answering questions.
You will be given a question you need to answer, and a context to provide you with information. You must answer the question based as much as possible on this context.
Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Context: 
{context}

Question: {question} [/INST]
"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

### Initialize the LLM and use LangChain to connect the RAG part to the vector database

In [7]:
llm =  VLLMOpenAI(
    openai_api_key="EMPTY",
    openai_api_base=f"{INFERENCE_SERVER_URL}/v1",
    model_name=MODEL_NAME,
    max_tokens=MAX_TOKENS,
    top_p=TOP_P,
    temperature=TEMPERATURE,
    presence_penalty=PRESENCE_PENALTY,
    streaming=True,
    verbose=False,
    callbacks=[StreamingStdOutCallbackHandler()]
)

qa_chain = RetrievalQA.from_chain_type(
        llm,
        retriever=store.as_retriever(
            search_type="similarity",
            search_kwargs={"k": 4}
            ),
        chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
        return_source_documents=True
        )

os.environ["TOKENIZERS_PARALLELISM"] = "false"

### Test the RAG setup by asking a question from the indexed documentation

In [8]:
question = "How can I create a Data Science Project?"
result = qa_chain.invoke({"query": question})

To create a Data Science project in Red Hat OpenShift AI Self-Managed, you can follow these general steps:

1. Log in to your OpenShift cluster using the OpenShift CLI or the web console.
2. Create a new project by running the following command in the OpenShift CLI: `oc new-project <project_name>`. Replace `<project_name>` with the desired name for your project.
3. Once the project is created, you can create a new container image based on one of the provided images such as PyTorch, Minimal Python, TrustyAI, or HabanaAI. You can use the `oc create` command to create a new image stream and build the image. For example, to create an image stream named `my-pytorch-image` based on the PyTorch image, run `oc create is my-pytorch-image --image=registry.redhat.io/rhel7/python-36-pytorch`.
4. After the image is built, you can create a new container based on the image stream and start a new instance of the Jupyter Notebook server. For example, to create a new container named `my-notebook` based 

In [9]:
def remove_duplicates(input_list):
    unique_list = []
    for item in input_list:
        if item.metadata['source'] not in unique_list:
            unique_list.append(item.metadata['source'])
    return unique_list

results = remove_duplicates(result['source_documents'])

for s in results:
    print(s)

https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.6/html-single/getting_started_with_red_hat_openshift_ai_self-managed/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.6/html-single/serving_models/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.6/html-single/working_on_data_science_projects/index
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.6/html-single/release_notes/index
