## Extending the capabilities of our model

An LLM is a very capable tool, but only to the extent of the knowledge or information it has been trained on. After all, you only know what you know, right? But what if you need to ask a question that is not in the training data? Or what if you need to ask a question that is not in the training data, but is related to it?

There are different ways to solve this problem, depending on the resources you have and the time or money you can spend on it. Here are a few options:

- Fully retrain the model to include the information you need. For an LLM, it's only possible for a handful of companies in the world that can afford literally thousands of GPUs running for weeks.
- Fine-tune the model with this new information. This requires way less resources, and can usually be done in a few hours or minutes (depending on the size of the model). However as it does not fully retrain the model, the new information may not be completely integrated in the answers. Fine-tuning excels at giving a better understanding of a specific context or vocabulary, a little bit less on injecting new knowledge. Plus you have to retrain and redeploy the model anyway any time you want to add more information.
- Put this new information in a database and have the parts relevant to the query retrieved and added to this query as a context before sending it to the LLM. This technique is called **Retrieval Augmented Generation, or RAG**. It is interesting as you don't have to retrain or fine-tune the model to benefit of this new knowledge, that you can easily update at any time.

We have already prepared a Vector Database using [Milvus](https://milvus.io/), where we have stored (in the form of [Embeddings](https://www.ibm.com/topics/embedding)) the content of the [California Driver's Handbook](https://www.dmv.ca.gov/portal/handbook/california-driver-handbook/).

In this Notebook, we are going to use RAG to **make some queries about a Claim** and see how this new knowledge can help without having to modify our LLM.

### Requirements and Imports

If you have selected the right workbench image to launch as per the Lab's instructions, you should already have all the needed libraries. If not uncomment the first line in the next cell to install all the right packages.

In [1]:
!pip install openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# !pip install --no-cache-dir --no-dependencies --disable-pip-version-check -r requirements.txt # Uncomment only if you have not selected the right workbench image

import json
import os
from os import listdir
from os.path import isfile, join
from langchain.chains import LLMChain
from langchain.chains import LLMChain, RetrievalQA
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_community.llms import VLLMOpenAI
from langchain_community.vectorstores import Milvus
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate
from milvus_retriever_with_score_threshold import MilvusRetrieverWithScoreThreshold

#Turn off warnings when downloading the embedding model
import transformers
transformers.logging.set_verbosity_error()

### Langchain elements

Again, we are going to use Langchain to define our task pipeline.

First, the **LLM** where we will send our queries.

In [24]:
# LLM Inference Server URL
inference_server_url = "https://mistral-7b-vllm-mistral-7b-2x-t4.apps.cluster-78z4l.sandbox2699.opentlc.com"

# LLM definition
llm = VLLMOpenAI(           # we are using the vLLM OpenAI-compatible API client. But the Model is running on OpenShift, not OpenAI.
    openai_api_key="EMPTY",   # and that is why we don't need an OpenAI key for this.
    openai_api_base= f"{inference_server_url}/v1",
    model_name="/mnt/models/",
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

Then the connection to the **vector database** where we have prepared and stored the California Driver Handbook.

In [4]:
MILVUS_HOST = "vectordb-milvus.milvus.svc.cluster.local"
MILVUS_PORT = 19530
MILVUS_USERNAME = os.getenv('MILVUS_USERNAME')
MILVUS_PASSWORD = os.getenv('MILVUS_PASSWORD')
MILVUS_COLLECTION = os.getenv('MILVUS_COLLECTION_NAME')

In [25]:
# First we define the embeddings that we used to process the Handbook
model_kwargs = {"trust_remote_code": True}
embeddings = HuggingFaceEmbeddings(
            model_name="nomic-ai/nomic-embed-text-v1",
            model_kwargs=model_kwargs,
            show_progress=False,
        )


# Then we define the retriever that will fetch the relevant data from the Milvus vector store
retriever = MilvusRetrieverWithScoreThreshold(
            embedding_function=embeddings,
            collection_name=MILVUS_COLLECTION,
            collection_description="",
            collection_properties=None,
            connection_args={
                "host": MILVUS_HOST,
                "port": MILVUS_PORT,
                "user": MILVUS_USERNAME,
                "password": MILVUS_PASSWORD,
            },
            consistency_level="Session",
            search_params=None,
            k=4,
            score_threshold=0.99,
            metadata_field="metadata",
            text_field="page_content",
        )

You try to use a model that was created with version 2.4.0.dev0, however, your version is 2.4.0. This might cause unexpected behavior or errors. In that case, try to update to the latest version.



<All keys matched successfully>


We will now define the **template** to use to make our query. Note that this template now contains a **References** section. That's were the documents returned from the vector database will be injected.

In [26]:
template="""<s>[INST]
You are a helpful, respectful and honest assistant named "Parasol Assistant".
You will be given a context, references to provide you with information, and a question.
You must answer the question based as much as possible on this context.
Always answer as helpfully as possible, while being safe.
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, please don't share false information.
<</SYS>>

Context:
{{context}}

Question: {{question}} [/INST]
"""

We are now ready to query the model!

### First test, no additional knowledge

Let's start with a first query about the claim, but without help from our vector database.

In [84]:
# Create and send our query.

# query = "Does Red Hat OpenShift AI support Habana Gaudi devices?"
# query = "How can you work with GPU and taints in OpenShift AI?"
query = "How can you create a Jupyter Notebook in Red Hat OpenShift AI?"

# Quick hack to reuse the same template with a different type of query.
prompt_template = template.format(context="NO CONTEXT", question=query)
prompt = PromptTemplate.from_template(prompt_template)
conversation = LLMChain(
            llm=llm,
            prompt=prompt,
            verbose=False
        )
resp = conversation.predict(context="", question=query)

To create a Jupyter Notebook in Red Hat OpenShift AI, you can follow these general steps:

1. Log in to your Red Hat OpenShift cluster using the `oc` command-line tool or the web console.
2. Create a new project if you don't have one already. You can do this by running the following command:
   ```
   oc new-project <project_name>
   ```
3. Create a new Jupyter Notebook application by running the following command:
   ```
   oc new-app quay.io/openshift-katas/jupyter-notebook~https://github.com/openshift/jupyter-notebook-example.git --name=<notebook_name>
   ```
   Replace `<project_name>` with the name of your project and `<notebook_name>` with the name you want to give to your Jupyter Notebook application.

4. Once the application is created, you can access it by running the following command:
   ```
   oc expose svc/<notebook_name>-jupyter --type=NodePort
   ```
   This will expose the Jupyter Notebook application as a NodePort service. The port number will be printed in the output 

We can see that the answer is valid. Here the model is using its general understanding of traffic regulation.

### Second test, with added knowledge

We will use the same prompt and query, but this time the model will have access to some references from the California's Driver Handbook.

In [85]:
# Create and send our query.

prompt = PromptTemplate.from_template(prompt_template)
rag_chain = RetrievalQA.from_chain_type(
            llm,
            retriever=retriever,
            chain_type_kwargs={"prompt": prompt},
            return_source_documents=True,
        )
resp = rag_chain.invoke({"query": query})

To create a Jupyter Notebook in Red Hat OpenShift AI, you need to follow these steps:

1. Log in to Red Hat OpenShift AI and ensure that you are part of the required user or admin group for your OpenShift AI project.
2. Create a data science project and a workbench using the Standard Data Science notebook image.
3. Create and configure a pipeline server within the data science project that contains your workbench.
4. Create and launch a Jupyter server from a notebook image that contains the Elyra extension (Standard data science, TensorFlow, TrustyAI, PyTorch, or HabanaAI). Make sure you have access to S3-compatible storage.
5. After launching the Jupyter server, open JupyterLab and confirm that the JupyterLab launcher is automatically displayed. In the Elyra section of the JupyterLab launcher, click the Pipeline Editor tile to open it.

You can now create and edit Jupyter Notebooks within the Pipeline Editor in JupyterLab.

That is pretty neat! Now the model refers directly to a source stating that **a red traffic signal light means "STOP."**.

But where did we get this information from? We can look into the sources associated with the answers from the vector database.

In [86]:
for doc in resp['source_documents']:
    if hasattr(doc, 'metadata') and isinstance(doc.metadata, dict) and 'source' in doc.metadata and 'page' in doc.metadata:
        print(f'{doc.metadata["source"]} {doc.metadata["page"]}')
    else:
        print(doc.metadata)

https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.9/html-single/working_on_data_science_projects/index 8
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.9/html-single/working_on_data_science_projects/index 65
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.9/html-single/getting_started_with_red_hat_openshift_ai_self-managed/index 44
https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.9/html-single/managing_resources/index 2


That's it! We now know how to complement our LLM with some external knowledge!