## Building a Retrieval-Augmented Generation (RAG) System
#### Alan García Zermeño
06/13/2024

### Section 3:  Integrating the Generation Component
#### This section includes:
- Code snippets for integrating the generative model.
- Examples of generated answers.
- A brief discussion on the challenges faced and how they were addressed.

Lets import our script module for clean and load the database and our script for calling the retrieval system

In [1]:
import sys
import os

# Import script modules
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../Scripts')))
from datacleaner import data_cleaner
from RetrievalSystem import evaluator, generGemini

  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Downloading package punkt to /home/alan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


This is how we use our data loader and cleaner. This function returns the complete corpus, and two arrays with the Questions and Answers.

In [2]:
corpus,questions,answers = data_cleaner()

49 Question/Answer pairs extracted!


And in this way, we will refer to our document retrieval system. The evaluator function receives the query to be evaluated, the corpus, and the responses from the corpus. It will return the most relevant response to the query if our system determines that it clearly answers the query; otherwise, it will return a boolean variable: *False*.

In [5]:
query = "How effective is Keytruda in treating non-small cell lung cancer?"
context = evaluator(query,corpus,answers)
print(context)

Does the following document have exact information to answer the following query?
    Please choose one of the two possible options: Yes, or No.
    

        Question: How effective is Keytruda in treating non-small cell lung cancer?

        Document: Keytruda has shown to improve survival rates significantly in non-small cell lung cancer patients with PD-L1 expression.

        Evaluation: [Select one: Yes, No]:
Yes
Keytruda has shown to improve survival rates significantly in non-small cell lung cancer patients with PD-L1 expression.


The fundamental idea of integrating the generative language model into the information retrieval system is to provide it with the context authorized by the evaluator model and ask it to provide its own response to the query within the context of the authorized response.
Given that in the previous query, our evaluator model found an appropriate document and returned it in the 'context' variable, an example of a prompt to the generative model would be as follows:

In [27]:
prompt = f"""
Please generate an informative and concise response to the following query.
Use the provided context information to ensure your response is accurate and relevant.

Context: {context}

Query: {query}

Response: 
"""

And we can now call Gemini itself to generate a response based on this prompt.

In [30]:
if context:
    print(generGemini(prompt))
else:
    print("I cant answer this prompt due a lack of proper information in the database")

Keytruda (pembrolizumab) has demonstrated significant effectiveness in treating non-small cell lung cancer (NSCLC) patients with PD-L1 expression. It has shown to improve survival rates by blocking the PD-1 protein on immune cells, allowing them to better recognize and attack cancer cells. Studies have found that Keytruda can extend overall survival and progression-free survival in patients with advanced NSCLC who have high PD-L1 expression.


However, the ideal approach would be to use a different and more advanced model as the generative model. We will use the GPT API for using GPT-4o.

- We will need an API key, so you can get one in: https://platform.openai.com/api-keys.
- Then, save it in the 'gpt.txt' file in the APIS directory. 
- **Please only paste the API key**.

In [14]:
from openai import OpenAI

with open("../APIS/gpt.txt", 'r') as file: apik = file.readline().strip()
client = OpenAI(api_key = apik)

With GPT, we will use the same prompt but take advantage of the ability to break it down into different roles. We will also adjust the 'max_tokens = 120' parameter since this API is not free to use. 

In [32]:
model = "gpt-3.5-turbo"
prompt = f"""
Please generate an informative and concise response to the following query.
Use the provided context information to ensure your response is accurate and relevant.

Context: {context}
"""
if context:
  completion = client.chat.completions.create(
        model=model,
        messages=[
        {"role": "system", "content": prompt},
        {"role": "user", "content": query}
        ],
        max_tokens=120
      )
  print(completion.choices[0].message.content)
else:
    print("I cant answer this prompt due a lack of proper information in the database")

Keytruda has demonstrated significant efficacy in treating non-small cell lung cancer patients with PD-L1 expression. Clinical studies have shown that Keytruda can improve survival rates and prolong progression-free survival in this patient population. It is considered a breakthrough treatment option for those with advanced non-small cell lung cancer, particularly when PD-L1 expression is present. However, the efficacy of Keytruda may vary based on individual patient factors and disease characteristics, so it is important to consult with a healthcare provider for personalized treatment recommendations.


Lets write a function to call the evaluator (Gemini) and the generator (GPT) and test some queries.

In [35]:
def CRAG(query,corpus,answers,client):
    """
    Calls the evaluator model given a query, a corpus and the set of answers.
    Then, calls the generation model if the evaluator model autorizes.
    Returns the system response.
    Args:
        query:      String query to evaluate
        corpus:     Documents array to search for information
        answers:    Answers array from the corpus without the questions
        client:     GPT client
    """
    #Calls the evaluator
    context = evaluator(query,corpus,answers)

    #Define model and prompt
    model = "gpt-3.5-turbo"
    prompt = f"""
    Please generate an informative and concise response to the following query.
    Use the provided context information to ensure your response is accurate and relevant.

    Context: {context}
    """
    #If context != False, we will call GPT to generate a response given a query and the context
    if context:
        completion = client.chat.completions.create(
                model=model,
                messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": query}
                ],
                max_tokens=120
            )
        print("\n\n GENERATOR RESPONSE:")
        print(completion.choices[0].message.content)
    else:
        print("I cant answer this prompt due a lack of proper information in the database")

In [37]:
query = "what is Keytruda?"
CRAG(query,corpus,answers,client)

Does the following document have exact information to answer the following query?
    Please choose one of the two possible options: Yes, or No.
    

        Question: what is Keytruda?

        Document: Keytruda is administered as an intravenous infusion over 30 minutes.

        Evaluation: [Select one: Yes, No]:
No
I cant answer this prompt due a lack of proper information in the database


In [36]:
query = "How long does it take to see the effects of Keytruda in treating cancer?"
CRAG(query,corpus,answers,client)

Does the following document have exact information to answer the following query?
    Please choose one of the two possible options: Yes, or No.
    

        Question: How long does it take to see the effects of Keytruda in treating cancer?

        Document: Some patients may see effects as early as 2 to 3 months into the treatment.

        Evaluation: [Select one: Yes, No]:
Yes


 GENERATOR RESPONSE:
Patients undergoing Keytruda treatment for cancer may start to see effects as early as 2 to 3 months into the treatment. This can vary among individuals and depend on several factors such as the type and stage of cancer being treated. It is important to regularly consult with your healthcare provider to monitor progress and discuss any changes or improvements in response to the treatment.


In [41]:
query = "Can Keytruda be used in combination with other therapies?"
CRAG(query,corpus,answers,client)

Does the following document have exact information to answer the following query?
    Please choose one of the two possible options: Yes, or No.
    

        Question: Can Keytruda be used in combination with other therapies?

        Document: Yes, Keytruda can be used in combination with chemotherapy and other immunotherapies depending on the cancer type.

        Evaluation: [Select one: Yes, No]:
Yes


 GENERATOR RESPONSE:
Yes, Keytruda can be used in combination with chemotherapy and other immunotherapies depending on the type of cancer being treated. It is often used in combination with other treatments to enhance its effectiveness in fighting cancer. It is important for healthcare providers to determine the most suitable combination therapy based on the specific cancer diagnosis and individual patient characteristics.


### Conclusions and Answers

We can see that indeed when we input a query that the evaluator model authorizes, given that there is a clear answer in the corpus, the generative model (GPT) produces an appropriate and more detailed response to the query. The only thing missing in this system is implementing a web search when the evaluator model does not find an appropriate document in the database, so that we can also receive a response from the evaluator model in these cases. Again, we will implement this in section 6.

Some of the complications in implementing the generative model were:

- There are answers in the database that do not really offer useful data, for example *40- Question: Can Keytruda cause fatigue in NSCLC patients?: Answer: No detailed information aviable on the given topic.* Fortunately, the evaluator model does not allow these to pass to the generative model.

- A prompt explaining the task had to be added before the query to make it clear to the generative model that it needed to improve the response, because without that explanation, the model often simply generated the response provided in the prompt.

- To avoid exhausting resources, we used GPT-3.5 turbo as the generative model here. In the next evaluation section, we will use GPT-4o for better results.