# Retrieval Augmented Generation (RAG) with OpenAI and Qdrant

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Setting up all the pieces to implement the RAG pipeline using Qdrant and OpenAI SDKs

In [None]:
!pip install qdrant-client fastembed openai

Collecting qdrant-client
  Downloading qdrant_client-1.7.0-py3-none-any.whl (203 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/203.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.9/203.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.7/203.7 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fastembed
  Downloading fastembed-0.1.1-py3-none-any.whl (14 kB)
Collecting openai
  Downloading openai-1.3.8-py3-none-any.whl (221 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.5/221.5 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
Collecting grpcio-tools>=1.41.0 (from qdrant-client)
  Downloading grpcio_tools-1.60.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m48.0 MB/s[0m eta [36m0

Qdrant will act as a knowledge base providing the context information for the prompts we'll be sending to the LLM. There are various ways of running Qdrant, but we'll simply use the Docker container.

Qdrant collection is the basic unit of organizing your data. Each collection is a named set of points (vectors with a payload) among which you can search. After connecting to our running Qdrant container, we can check whether we already have some collections.

In [None]:
from qdrant_client import QdrantClient
client = QdrantClient(":memory:")
client.get_collections()

CollectionsResponse(collections=[])

### Building the knowledge base

In [None]:
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/datasets/WebMining/drugdata.csv")
df.head()

Unnamed: 0,drug_name,review_source,drug_review,tokens,concreteness,modality,sentiments,disease,side effects,side effects grouped,side effects cleaned,side_effects_cleaned_grouped
0,Elidel,Drugs.com,no! no! no! do not use Elidel it's worse than ...,"['use', 'elidel', 'bad', 'hydrocortisone', 'ta...",0.365854,Deontic modality,Negative,Eczema,Topical Steroid Withdrawal,Flu-like,Topical Steroid Withdrawal,Skin-Related
1,Elidel,Drugs.com,I've been so interested to find out what other...,"['interested', 'find', 'think', 'elidel', 'suf...",0.08,Epistemic modality,Negative,Eczema,Topical Steroid Withdrawal,Flu-like,Topical Steroid Withdrawal,Skin-Related
2,Elidel,Drugs.com,"After being prescribed Elidel, my skin got muc...","['prescribe', 'elidel', 'skin', 'get', 'bad', ...",0.257143,Deontic modality,Negative,Eczema,"Topical Steroid Withdrawal, Weakened Immunity ...",Flu-like,"Topical Steroid Withdrawal, Weakened Immune Sy...","Skin-Related,General"
3,Elidel,Drugs.com,"After having eczema most of my life, it seemed...","['have', 'eczema', 'life', 'bad', 'have', 'all...",0.258427,Deontic modality,Neutral,Eczema,"Weakened Immune System, Fatigue",Flu-like,"Weakened Immune System, Fatigue","General,Psychological"
4,Elidel,Drugs.com,"I never leave reviews for anything, but knowin...","['leave', 'review', ' ', 'know', 'like', 'come...",0.135135,Uncertain modality,Positive,Eczema,Nil,Nil,Nil,Nil


In [None]:
reviews = ["I was diagnosed with " + str(row.disease) + ". The doctor prescribed me " + str(row.drug_name) + ". " + "Review: " + str(row.drug_review) for row in df.itertuples()]
reviews[0]

"I was diagnosed with Eczema. The doctor prescribed me Elidel. Review: no! no! no! do not use Elidel it's worse than using hydrocortisone. It took me over a year for my face to clear up which is where my doctor told me to apply it on. Big mistake. Please note that prolonged use of any topical treatment for eczema such as topical steroids and even immunosuppressants like Elidel can lead to developing Topical Steroid Withdrawal/topical steroid withdrawal. if you feel your eczema is getting worse and worse please look up topical steroid withdrawal!"

In [None]:
client.add(
    collection_name="knowledge-base",
    documents=reviews
)

100%|██████████| 77.7M/77.7M [00:03<00:00, 20.8MiB/s]


['d3432b85dda64cb3ac93e1f58d66fa23',
 '18ec4b8d4c3a4773b5153f8d47acfc9f',
 '9957b10fa0f94ba3b37eda2ed9b8d547',
 '49fe1398994f49efa44607e525a6c889',
 '34572e49e1264a0ba9a1bec8644f9b16',
 '5bac72003f884a11bb572d5954f61fb2',
 '8b85c9d95d4a40aea59c84646b1f1a86',
 'f205c97bf6db4e97bb8732110f193e93',
 '25b691528eb743a4a10da20d849184ef',
 '79eb68a75bdc40cca797127cc42b754a',
 '62a6d25894954a96b55436b7e8ac9254',
 '78371365bf9f48658b3ab78f159e2da3',
 '5845da21b7654828ada452c86f713890',
 '24fae92db45b4aa19018cdc4d457b62f',
 'c86ae6327008450f974fcd09d2d67e23',
 'c84ef18df1eb43998a283f3b57559222',
 '8dd4e1ecc0ff48f5a2e7c1eb3ba785c9',
 '23d0d85fd99b40e68d54c4057210ac05',
 '76e799d2fa6c422db35ccb0f9f73d3c6',
 '738c2f543f38409c9efd739b562e1992',
 '8fc76431cb6a49ac891d5b959fed0deb',
 'a0316f7a13884efe8de78c9f49fe2447',
 '2a6d91d876c141188e63ac8a916fba81',
 '924c301f1d134c948fb705932c8346b5',
 'd63725ff6aaf401e85b6370cf6f8102c',
 'e9c6b71435b0469697fd50984178c85a',
 '9a8dd02dcb594fa1a44172ac929c0274',
 

## Retrieval Augmented Generation

In [None]:
prompt = """What are the various side effects of Elidel?"""

Using OpenAI API requires providing the API key. Our example demonstrates setting the `OPENAI_API_KEY` using an environmental variable.

In [None]:
import os
os.environ["OPENAI_API_KEY"] = 'INSERT OPENAI KEY HERE'

In [None]:
from openai import AsyncOpenAI

openai_client = AsyncOpenAI()

completion = await openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": prompt},
    ]
)
print(completion.choices[0].message.content)

Elidel (pimecrolimus) is a medication used to treat atopic dermatitis (eczema). Although it is generally well-tolerated, some people may experience side effects. The most common side effects of Elidel include:

1. Burning or stinging sensation at the site of application
2. Itching or mild redness of the skin
3. Skin infections, cold sores, or minor skin irritations

Less common side effects may include:

1. Headaches
2. Flu-like symptoms
3. Allergic reactions, such as hives, rash, or swelling of the face, tongue, or throat
4. Eye irritation, such as redness, itching, or watery eyes



### Testing out the RAG pipeline

By leveraging the semantic context we provided our model is doing a better job answering the question. Let's enclose the RAG as a function, so we can call it more easily for different prompts.

In [None]:
import asyncio

async def rag(question: str, n_points: int = 3) -> str:
    results = client.query(
        collection_name="knowledge-base",
        query_text=question,
        limit=n_points,
    )

    context = "\n".join(r.document for r in results)

    metaprompt = f"""
    You are a Pharmacist.
    Answer the following question using the provided context.
    If you can't find the answer, do not pretend you know it, but answer
    'I don't know'.

    Question: {question.strip()}

    Context:
    {context.strip()}

    Answer:
    """

    completion = await openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "user", "content": metaprompt},
        ],
    )
    return completion.choices[0].message.content

Now it's easier to ask a broad range of questions.

In [None]:
print(await rag("Name 2 drugs used to treat Diabetes"))

Onglyza and Janumet are two drugs used to treat diabetes.


In [None]:
print(await rag( "Which drug should not be used on children?"))

Based on the provided context, it is unclear which specific drug should not be used on children. However, both Elidel and Eucrisa are mentioned in the context as medications that were used on children with severe eczema and have caused negative side effects. It is important to consult with a healthcare professional, such as a pediatrician or pharmacist, before using any medication on children.


In [None]:
print(await rag("What are the various side effects of Elidel?"))

The various side effects of Elidel, based on the provided context, include fever, headache, muscle stiffness, nasal congestion, numbness in hands and legs, heartburn, discomfort after application, swollen and burning lips, unusual redness and bumps, severe itch, redness, swelling, and heat rash-type bumps on the face.


In [None]:
print(await rag("What drug can be used to treat a broken jaw?"))

I don't know.


In [None]:
await rag("To build an NLP API, what should I use?")

"I don't know."

Our model can now:

1. Take advantage of the knowledge in our vector datastore.
2. Answer, based on the provided context, that it can not provide an answer.

We have just shown a useful mechanism to mitigate the risks of hallucinations in Large Language Models.