# How to use Retrieval Augmented Generation (RAG) technique to improve the performance of a chatbot on questions on an SBML model?

## Using chatbots to ask questions
<img src="./resources/simple_prompt.png" alt="Simple Prompt" width="2500"/>

#### Example: Asking a question from a chatbot on a generic topic

Question: "What is the capital of the U.S.?"

In [1]:
# loading open ai api key
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from openai import OpenAI
client = OpenAI()


question = "What is the capital of the U.S.?"
prompt = question

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
)

print(completion.choices[0].message.content)

The capital of the U.S. is Washington, D.C.


 #### Asking a question from a chatbot on a non-generic topic

Question: "Who are my audience for this presentation?"

In [19]:
from openai import OpenAI
client = OpenAI()

question = "Who are my audience for this presentation?"
prompt = question

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
)

print(completion.choices[0].message.content)

It is important to identify the target audience for your presentation in order to tailor the content and delivery method to best suit their needs and preferences. The audience for your presentation could vary depending on the topic, but some potential audiences could include:

1. Colleagues or peers within your organization
2. Clients or customers
3. Industry professionals
4. Students or academic professionals
5. Investors or stakeholders
6. General public

Consider the background, knowledge level, interests, and preferences of your audience when planning your presentation to ensure it is engaging and informative for them.


## Adding Context (prompt before the prompt)
<img src="./resources/prompt_before_prompt.png" alt="Prompt Before Prompt" width="2500"/>

#### Example: Adding context to a question on a non-generic topic and then asking it from a chatbot
Context: "I am giving this presentation to the sys-bio group meeting."
Question: "Who are my audience for this presentation?"

In [20]:
context = "I am giving this presentation to the sys-bio group meeting."
question = "Who are my audience for this presentation?"
prompt = context + " " + question

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
)

print(completion.choices[0].message.content)

Your audience for this presentation would be the members of the sys-bio group meeting, which likely includes researchers, scientists, and experts in the field of systems biology. They may have a background in biology, bioinformatics, computational biology, mathematics, or related fields, and have an interest in understanding complex biological systems at a systems-level. They are likely interested in the latest research, advancements, and technologies in systems biology.


## Retrieval Augmented Generation (RAG)
RAG lets language models fetch relevant knowledge documents and then generate a response based on the fetched documents.

<img src="./resources/feed_with_documents.png" alt="Feed With Documents" width="2500"/>

### How does RAG work?
<img src="./resources/architecture.png" alt="Architecture" width="2500"/>

### What makes a difference in RAG?
##### Documents
##### Language Model
##### Embeddings
##### Vector Store

<img src="./resources/tools_selection.png" alt="Tools Selection" width="2500"/>

## SBML Use Case
Asking questions on an SBML model using:
- an antimony model
- some very basic instructions on antimony syntax

In [2]:
import paperinquirer

model_dir = "./data/antimony"
inquirer = paperinquirer.PaperInquirer([model_dir])

In [3]:
print(inquirer.inquire("What is the list of species in Pritchard2002_glycolysis?"))

[1;3;38;5;200mThought: I need to use the Pritchard2002_glycolysis_model tool to get the list of species in the Pritchard2002_glycolysis model.
Action: Pritchard2002_glycolysis_model
Action Input: {'input': 'list_species'}
[0m[1;3;34mObservation: GLCo, GLCi, ATP, G6P, ADP, F6P, F16bP, AMP, F26bP, DHAP, GAP, NAD, BPG, NADH, P3G, P2G, PEP, PYR, AcAld, CO2, EtOH, Glycerol, Glycogen, Trehalose, Succinate
[0m[1;3;38;5;200mThought: I have successfully obtained the list of species in the Pritchard2002_glycolysis model.
Answer: The list of species in the Pritchard2002_glycolysis model is: GLCo, GLCi, ATP, G6P, ADP, F6P, F16bP, AMP, F26bP, DHAP, GAP, NAD, BPG, NADH, P3G, P2G, PEP, PYR, AcAld, CO2, EtOH, Glycerol, Glycogen, Trehalose, Succinate.
[0mThe list of species in the Pritchard2002_glycolysis model is: GLCo, GLCi, ATP, G6P, ADP, F6P, F16bP, AMP, F26bP, DHAP, GAP, NAD, BPG, NADH, P3G, P2G, PEP, PYR, AcAld, CO2, EtOH, Glycerol, Glycogen, Trehalose, Succinate.


In [12]:
print(inquirer.inquire("What is the list of reactions in Pritchard2002_glycolysis?"))

[1;3;38;5;200mThought: To find the list of reactions in the Pritchard2002_glycolysis model, I can use the Pritchard2002_glycolysis_model tool.
Action: Pritchard2002_glycolysis_model
Action Input: {'input': 'list_reactions'}
[0m[1;3;34mObservation: The reactions listed in the provided context are:
1. HK (Hexokinase)
2. PGI (Phosphoglucose isomerase)
3. PFK (Phosphofructokinase)
4. ALD (Aldolase)
5. TPI (Triosephosphate isomerase)
6. GAPDH (Glyceraldehyde-3-phosphate dehydrogenase)
7. PGK (Phosphoglycerate kinase)
8. PGM (Phosphoglycerate mutase)
9. ENO (Enolase)
10. PYK (Pyruvate kinase)
11. PDC (Pyruvate decarboxylase)
12. ADH (Alcohol dehydrogenase)
13. ATPase
14. AK (Adenylate kinase)
15. G3PDH (Glyceraldehyde-3-phosphate dehydrogenase)
[0m[1;3;38;5;200mThought: I have obtained the list of reactions in the Pritchard2002_glycolysis model.
Answer: The list of reactions in the Pritchard2002_glycolysis model are:
1. HK (Hexokinase)
2. PGI (Phosphoglucose isomerase)
3. PFK (Phosphofr

In [18]:
print(inquirer.inquire("What is the kinetic law for PFK in Pritchard2002_glycolysis?"))

[1;3;38;5;200mThought: I need to use the Pritchard2002_glycolysis tool to find the kinetic law for PFK.
Action: Pritchard2002_glycolysis
Action Input: {'input': 'kinetic_law_PFK'}
[0m[1;3;34mObservation: PFK: F6P + ATP -> F16bP + ADP; cell*PFK_Vmax_4*(F6P*ATP/(PFK_Kf6p_4*PFK_Katp_4) - F16bP*ADP/(PFK_Kf16bp_4*PFK_Kadp_4*PFK_Keq_4))/((1 + F6P/PFK_Kf6p_4 + F16bP/PFK_Kf16bp_4)*(1 + ATP/PFK_Katp_4 + ADP/PFK_Kadp_4));
[0m[1;3;38;5;200mThought: I have found the kinetic law for PFK in Pritchard2002_glycolysis.
Answer: The kinetic law for PFK in Pritchard2002_glycolysis is:

PFK: F6P + ATP -> F16bP + ADP

cell * PFK_Vmax_4 * (F6P * ATP / (PFK_Kf6p_4 * PFK_Katp_4) - F16bP * ADP / (PFK_Kf16bp_4 * PFK_Kadp_4 * PFK_Keq_4)) / ((1 + F6P / PFK_Kf6p_4 + F16bP / PFK_Kf16bp_4) * (1 + ATP / PFK_Katp_4 + ADP / PFK_Kadp_4))

This equation represents the rate at which the PFK reaction occurs based on the concentrations of F6P, ATP, F16bP, and ADP, as well as the parameters PFK_Vmax_4, PFK_Kf6p_4, PFK_Ka