# How to use Retrieval Augmented Generation (RAG) to improve the performance of a chatbot on questions related to SBML models?

## Using chatbots to ask questions
<img src="./resources/simple_prompt.png" alt="Simple Prompt" width="2500"/>

In [1]:
# loading open ai api key
from dotenv import load_dotenv

load_dotenv()

True

#### Example: Asking a question from a chatbot on a generic topic

Question: "What is the capital of the U.S.?"

In [1]:
from openai import OpenAI
client = OpenAI()

question = "What is the capital of the U.S.?"
prompt = question

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
)

print(completion.choices[0].message.content)

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

 #### Asking a question from a chatbot on a non-generic topic

Question: "Who is my PI?"

In [3]:
from openai import OpenAI
client = OpenAI()

question = "Who is my PI?"
prompt = question

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
)

print(completion.choices[0].message.content)

I'm sorry, I don't have access to that information. Please contact your institution or research program for assistance with identifying your PI.


## Adding Context (prompt before the prompt)
<img src="./resources/prompt_before_prompt.png" alt="Prompt Before Prompt" width="2500"/>

#### Example: Adding context to a question on a non-generic topic and then asking it from a chatbot
Context: "My PI is Prof. Herbert Sauro."
Question: "Who is my PI?"

In [4]:
context = "My PI is Prof. Herbert Sauro."
question = "Who is my PI?"
prompt = context + " " + question

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
)

print(completion.choices[0].message.content)

Your PI is Prof. Herbert Sauro.


## Retrieval Augmented Generation (RAG)
RAG lets language models fetch relevant knowledge documents and then generate a response based on the fetched documents.

<img src="./resources/feed_with_documents.png" alt="Feed With Documents" width="2500"/>

### How does RAG work?
<img src="./resources/architecture.png" alt="Architecture" width="2500"/>

### Tools Selection
##### Language Model
##### Embeddings
##### Vector Store

<img src="./resources/tools_selection.png" alt="Tools Selection" width="2500"/>

## SBML Use Case
Using an antimony model and antimony specifications to answer a question about the model.

In [2]:
import paperinquirer

model_dir = "./data/antimony"
inquirer = paperinquirer.PaperInquirer([model_dir])

In [6]:
print(inquirer.inquire("What is the list of species in Pritchard2002_glycolysis?"))

[1;3;38;5;200mThought: To answer this question, I can use the Pritchard2002_glycolysis tool to retrieve the list of species in the model.
Action: Pritchard2002_glycolysis
Action Input: {'input': 'list_species'}
[0m[1;3;34mObservation: GLCo, GLCi, ATP, G6P, ADP, F6P, F16bP, AMP, F26bP, DHAP, GAP, NAD, BPG, NADH, P3G, P2G, PEP, PYR, AcAld, CO2, EtOH, Glycerol, Glycogen, Trehalose, Succinate
[0m[1;3;38;5;200mThought: I have retrieved the list of species in the Pritchard2002_glycolysis model.
Answer: The list of species in the Pritchard2002_glycolysis model includes GLCo, GLCi, ATP, G6P, ADP, F6P, F16bP, AMP, F26bP, DHAP, GAP, NAD, BPG, NADH, P3G, P2G, PEP, PYR, AcAld, CO2, EtOH, Glycerol, Glycogen, Trehalose, and Succinate.
[0mThe list of species in the Pritchard2002_glycolysis model includes GLCo, GLCi, ATP, G6P, ADP, F6P, F16bP, AMP, F26bP, DHAP, GAP, NAD, BPG, NADH, P3G, P2G, PEP, PYR, AcAld, CO2, EtOH, Glycerol, Glycogen, Trehalose, and Succinate.


In [5]:
print(inquirer.inquire("What is the list of reactions in this model?"))

[1;3;38;5;200mThought: To answer this question, I can use the Pritchard2002_glycolysis tool to obtain the list of reactions in the model.
Action: Pritchard2002_glycolysis
Action Input: {'input': 'list_of_reactions'}
[0m[1;3;34mObservation: The list of reactions mentioned in the context includes:
- HK: http://identifiers.org/kegg.reaction/R02848
- PGI: http://identifiers.org/kegg.reaction/R00771
- PFK: http://identifiers.org/kegg.reaction/R00756
- ALD: http://identifiers.org/kegg.reaction/R01068
- TPI: http://identifiers.org/kegg.reaction/R01015
- GAPDH: http://identifiers.org/kegg.reaction/R01061
- PGK: http://identifiers.org/kegg.reaction/R01512
- PGM: http://identifiers.org/kegg.reaction/R01518
- ENO: http://identifiers.org/kegg.reaction/R00658
- PYK: http://identifiers.org/kegg.reaction/R00200
- PDC: http://identifiers.org/kegg.reaction/R00636
- ADH: http://identifiers.org/kegg.reaction/R00754
- AK: http://identifiers.org/kegg.reaction/R00127
- G3PDH: http://identifiers.org/kegg.

In [3]:
print(inquirer.inquire("What is the kinetic law for PGI?"))

[1;3;38;5;200mThought: I need to use the Pritchard2002_glycolysis tool to find the kinetic law for PGI.
Action: Pritchard2002_glycolysis
Action Input: {'input': 'kinetic law for PGI'}
[0m[1;3;34mObservation: The kinetic law for PGI is cell*PGI_Vmax_4*(G6P/PGI_Kg6p_4 - F6P/(PGI_Kg6p_4*PGI_Keq_4))/(1 + G6P/PGI_Kg6p_4 + F6P/PGI_Kf6p_4 + G6P*F6P/(PGI_Kg6p_4*PGI_Kif6p_4) + F6P*G6P/(PGI_Kf6p_4*PGI_Kig6p_4)).
[0m[1;3;38;5;200mThought: I have found the kinetic law for PGI using the Pritchard2002_glycolysis tool.
Answer: The kinetic law for PGI is cell*PGI_Vmax_4*(G6P/PGI_Kg6p_4 - F6P/(PGI_Kg6p_4*PGI_Keq_4))/(1 + G6P/PGI_Kg6p_4 + F6P/PGI_Kf6p_4 + G6P*F6P/(PGI_Kg6p_4*PGI_Kif6p_4) + F6P*G6P/(PGI_Kf6p_4*PGI_Kig6p_4)).
[0mThe kinetic law for PGI is cell*PGI_Vmax_4*(G6P/PGI_Kg6p_4 - F6P/(PGI_Kg6p_4*PGI_Keq_4))/(1 + G6P/PGI_Kg6p_4 + F6P/PGI_Kf6p_4 + G6P*F6P/(PGI_Kg6p_4*PGI_Kif6p_4) + F6P*G6P/(PGI_Kf6p_4*PGI_Kig6p_4)).
