Author: Akshay Chougule

Originally Created On: Nov 21, 2023

Credit:
- SSDTM for synthetic data
- GPT4 from Open AI
- Using the [mlflow](https://mlflow.org/docs/latest/model-evaluation/index.html) experiment tracking

Goal of the notebook:
- LangChain setup for a RAG-based system using a web page/ web doc 
- Learn expriment tracking with LLMs
- Learn to use SoTA LLMs to evaluate the responses of LLM under evaluation (Yes, LLM to evaluate a LLM)

The CDISC SDTM is tabular data by definition, where traditional methods of retrieval work just fine for simple analysis.

However as the data query gets more complex, like 
- joining multiple data tables
- performing filtering or aggregation operations
- plotting the data in various ways, 
- and eventually building models (survival, forecasting, classification etc)

the time and skills needed increases accordingly.

In the current state of interfaces and analytial engines, a user has to spend significant amount of time to go from the data to insights.

The LLM based inference engines offer certain advantages here:
- Shorter time from data to insights
- Generate code for further reproducibility
- Generate text explanation of each step (making these friendly for non-technical stakeholders as well)

In [1]:
import os
import pandas as pd
import mlflow


* 'schema_extra' has been renamed to 'json_schema_extra'


In [2]:
import sys
sys.path.insert(1, '/home/ubuntu/codebase/my_github/generative-ai-experiments/')
from Constants import OPENAI_API_KEY

In [6]:
os.environ["OPENAI_API_KEY"] = Constants.OPENAI_API_KEY

In [7]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

In [8]:
# Let's load synthetic SDTM data for adverse events
loader = WebBaseLoader("https://raw.githubusercontent.com/AksChougule/gen-sdtm/main/testing/output/ae.csv")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)

qa = RetrievalQA.from_chain_type(
    llm=OpenAI(temperature=0),
    chain_type="stuff",
    retriever=docsearch.as_retriever(),
    return_source_documents=True,
)

The OpenAI system used 0 cents to provide the embeddings, but you must have balance in your account.

In [9]:
def model(input_df):
    answer = []
    for index, row in input_df.iterrows():
        answer.append(qa(row["questions"]))

    return answer

In [10]:
# Create an eval dataset

eval_df = pd.DataFrame(
    {
        "questions": [
            "How many patients (PATID) are there?",
            "How many servere instances of fever happened?",
            "Which PATIDs had severe instances of fever?",
            "How many patients died??",
        ],
    }
)

In [11]:
results = mlflow.evaluate(
    model,
    eval_df,
    model_type="question-answering",
    evaluators="default",
    predictions="result",
    #extra_metrics=[faithfulness_metric, relevance_metric, mlflow.metrics.latency()],
    evaluator_config={
        "col_mapping": {
            "inputs": "questions",
            "context": "source_documents",
        }
    },
)
print(results.metrics)

  string_columns = trimmed_df.columns[(df.applymap(type) == str).all(0)]
  data = data.applymap(_hash_array_like_element_as_bytes)
2023/11/21 12:34:26 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2023/11/21 12:34:26 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
2023/11/21 12:34:30 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2023/11/21 12:34:30 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2023/11/21 12:34:30 INFO mlflow.models.evaluation.defaul

{}


## Only 4 questions cost 9 cents with this context (8 patients, 24 rows, 5 columns)
### And provided wrong answers...

In [13]:
pd.set_option('display.max_colwidth', 0)
results.tables["eval_results_table"]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,questions,outputs,query,source_documents,token_count
0,How many patients (PATID) are there?,There are 6 patients (PATID).,How many patients (PATID) are there?,"[{'lc_attributes': {}, 'lc_secrets': {}, 'metadata': {'source': 'https://raw.githubusercontent.com/AksChougule/gen-sdtm/main/testing/output/ae.csv'}, 'page_content': ',PATID,AETERM,AESEV,AESTDTC,AEENDTC,AETOXGR,AEOUT,AEACT,AESER,AESDTH 0,P00001,Dizziness,Moderate,2023-07-23,2023-08-04,Grade 2,Recovered,Drug withdrawn,N,Y 1,P00001,Rash,Mild,2022-12-13,2022-12-14,Grade 3,Improving,Dose not changed,N,Y 2,P00001,Fever,Severe,2023-06-22,2023-07-09,Grade 2,Unchanged,Dose reduced,N,Y 3,P00001,Nausea,Mild,2023-09-10,2023-10-06,Grade 1,Recovered,Dose not changed,N,Y 4,P00001,Nausea,Mild,2023-02-21,2023-02-25,Grade 3,Unchanged,Dose reduced,Y,N 5,P00002,Rash,Severe,2023-01-17,2023-01-18,Grade 1,Recovered,Drug withdrawn,N,Y 6,P00002,Headache,Severe,2023-08-22,2023-09-19,Grade 3,Worsened,Dose not changed,N,Y 7,P00002,Fatigue,Severe,2023-01-16,2023-01-26,Grade 4,Unchanged,Dose reduced,N,N 8,P00002,Fatigue,Moderate,2023-06-25,2023-07-13,Grade 2,Unchanged,Dose reduced,N,Y 9,P00003,Dizziness,Mild,2023-01-07,2023-01-17,Grade 3,Death,Drug withdrawn,N,Y 10,P00003,Fever,Severe,2022-11-21,2022-12-06,Grade 1,Recovered,Drug withdrawn,N,Y 11,P00003,Headache,Severe,2022-12-23,2023-01-01,Grade 3,Worsened,Drug withdrawn,N,N 12,P00003,Fatigue,Mild,2022-12-25,2023-01-17,Grade 4,Death,Dose reduced,N,Y 13,P00003,Fever,Mild,2023-09-20,2023-10-14,Grade 1,Worsened,Drug withdrawn,Y,N 14,P00004,Fatigue,Moderate,2023-09-26,2023-10-25,Grade 3,Unchanged,Drug withdrawn,Y,Y 15,P00004,Nausea,Severe,2023-01-01,2023-01-19,Grade 4,Worsened,Drug withdrawn,Y,Y 16,P00005,Fever,Severe,2023-02-06,2023-02-14,Grade 3,Recovered,Dose reduced,N,Y 17,P00005,Fever,Mild,2023-08-27,2023-09-22,Grade 3,Unchanged,Dose reduced,Y,Y 18,P00005,Fever,Severe,2022-12-21,2023-01-01,Grade 1,Recovered,Drug withdrawn,Y,N 19,P00006,Headache,Moderate,2023-08-05,2023-08-28,Grade 3,Worsened,Dose reduced,N,N 20,P00006,Nausea,Moderate,2023-04-09,2023-05-02,Grade 3,Death,Dose reduced,N,N 21,P00006,Rash,Severe,2023-09-06,2023-09-20,Grade 1,Worsened,Dose reduced,Y,N 22,P00006,Fever,Moderate,2023-08-04,2023-08-19,Grade 2,Improving,Dose not changed,N,N 23,P00006,Fatigue,Mild,2023-03-11,2023-03-26,Grade 3,Death,Dose reduced,N,Y', 'type': 'Document'}]",9
1,How many servere instances of fever happened?,Two instances of severe fever happened.,How many servere instances of fever happened?,"[{'lc_attributes': {}, 'lc_secrets': {}, 'metadata': {'source': 'https://raw.githubusercontent.com/AksChougule/gen-sdtm/main/testing/output/ae.csv'}, 'page_content': ',PATID,AETERM,AESEV,AESTDTC,AEENDTC,AETOXGR,AEOUT,AEACT,AESER,AESDTH 0,P00001,Dizziness,Moderate,2023-07-23,2023-08-04,Grade 2,Recovered,Drug withdrawn,N,Y 1,P00001,Rash,Mild,2022-12-13,2022-12-14,Grade 3,Improving,Dose not changed,N,Y 2,P00001,Fever,Severe,2023-06-22,2023-07-09,Grade 2,Unchanged,Dose reduced,N,Y 3,P00001,Nausea,Mild,2023-09-10,2023-10-06,Grade 1,Recovered,Dose not changed,N,Y 4,P00001,Nausea,Mild,2023-02-21,2023-02-25,Grade 3,Unchanged,Dose reduced,Y,N 5,P00002,Rash,Severe,2023-01-17,2023-01-18,Grade 1,Recovered,Drug withdrawn,N,Y 6,P00002,Headache,Severe,2023-08-22,2023-09-19,Grade 3,Worsened,Dose not changed,N,Y 7,P00002,Fatigue,Severe,2023-01-16,2023-01-26,Grade 4,Unchanged,Dose reduced,N,N 8,P00002,Fatigue,Moderate,2023-06-25,2023-07-13,Grade 2,Unchanged,Dose reduced,N,Y 9,P00003,Dizziness,Mild,2023-01-07,2023-01-17,Grade 3,Death,Drug withdrawn,N,Y 10,P00003,Fever,Severe,2022-11-21,2022-12-06,Grade 1,Recovered,Drug withdrawn,N,Y 11,P00003,Headache,Severe,2022-12-23,2023-01-01,Grade 3,Worsened,Drug withdrawn,N,N 12,P00003,Fatigue,Mild,2022-12-25,2023-01-17,Grade 4,Death,Dose reduced,N,Y 13,P00003,Fever,Mild,2023-09-20,2023-10-14,Grade 1,Worsened,Drug withdrawn,Y,N 14,P00004,Fatigue,Moderate,2023-09-26,2023-10-25,Grade 3,Unchanged,Drug withdrawn,Y,Y 15,P00004,Nausea,Severe,2023-01-01,2023-01-19,Grade 4,Worsened,Drug withdrawn,Y,Y 16,P00005,Fever,Severe,2023-02-06,2023-02-14,Grade 3,Recovered,Dose reduced,N,Y 17,P00005,Fever,Mild,2023-08-27,2023-09-22,Grade 3,Unchanged,Dose reduced,Y,Y 18,P00005,Fever,Severe,2022-12-21,2023-01-01,Grade 1,Recovered,Drug withdrawn,Y,N 19,P00006,Headache,Moderate,2023-08-05,2023-08-28,Grade 3,Worsened,Dose reduced,N,N 20,P00006,Nausea,Moderate,2023-04-09,2023-05-02,Grade 3,Death,Dose reduced,N,N 21,P00006,Rash,Severe,2023-09-06,2023-09-20,Grade 1,Worsened,Dose reduced,Y,N 22,P00006,Fever,Moderate,2023-08-04,2023-08-19,Grade 2,Improving,Dose not changed,N,N 23,P00006,Fatigue,Mild,2023-03-11,2023-03-26,Grade 3,Death,Dose reduced,N,Y', 'type': 'Document'}]",7
2,Which PATIDs had severe instances of fever?,"P00001, P00003, and P00005 had severe instances of fever.",Which PATIDs had severe instances of fever?,"[{'lc_attributes': {}, 'lc_secrets': {}, 'metadata': {'source': 'https://raw.githubusercontent.com/AksChougule/gen-sdtm/main/testing/output/ae.csv'}, 'page_content': ',PATID,AETERM,AESEV,AESTDTC,AEENDTC,AETOXGR,AEOUT,AEACT,AESER,AESDTH 0,P00001,Dizziness,Moderate,2023-07-23,2023-08-04,Grade 2,Recovered,Drug withdrawn,N,Y 1,P00001,Rash,Mild,2022-12-13,2022-12-14,Grade 3,Improving,Dose not changed,N,Y 2,P00001,Fever,Severe,2023-06-22,2023-07-09,Grade 2,Unchanged,Dose reduced,N,Y 3,P00001,Nausea,Mild,2023-09-10,2023-10-06,Grade 1,Recovered,Dose not changed,N,Y 4,P00001,Nausea,Mild,2023-02-21,2023-02-25,Grade 3,Unchanged,Dose reduced,Y,N 5,P00002,Rash,Severe,2023-01-17,2023-01-18,Grade 1,Recovered,Drug withdrawn,N,Y 6,P00002,Headache,Severe,2023-08-22,2023-09-19,Grade 3,Worsened,Dose not changed,N,Y 7,P00002,Fatigue,Severe,2023-01-16,2023-01-26,Grade 4,Unchanged,Dose reduced,N,N 8,P00002,Fatigue,Moderate,2023-06-25,2023-07-13,Grade 2,Unchanged,Dose reduced,N,Y 9,P00003,Dizziness,Mild,2023-01-07,2023-01-17,Grade 3,Death,Drug withdrawn,N,Y 10,P00003,Fever,Severe,2022-11-21,2022-12-06,Grade 1,Recovered,Drug withdrawn,N,Y 11,P00003,Headache,Severe,2022-12-23,2023-01-01,Grade 3,Worsened,Drug withdrawn,N,N 12,P00003,Fatigue,Mild,2022-12-25,2023-01-17,Grade 4,Death,Dose reduced,N,Y 13,P00003,Fever,Mild,2023-09-20,2023-10-14,Grade 1,Worsened,Drug withdrawn,Y,N 14,P00004,Fatigue,Moderate,2023-09-26,2023-10-25,Grade 3,Unchanged,Drug withdrawn,Y,Y 15,P00004,Nausea,Severe,2023-01-01,2023-01-19,Grade 4,Worsened,Drug withdrawn,Y,Y 16,P00005,Fever,Severe,2023-02-06,2023-02-14,Grade 3,Recovered,Dose reduced,N,Y 17,P00005,Fever,Mild,2023-08-27,2023-09-22,Grade 3,Unchanged,Dose reduced,Y,Y 18,P00005,Fever,Severe,2022-12-21,2023-01-01,Grade 1,Recovered,Drug withdrawn,Y,N 19,P00006,Headache,Moderate,2023-08-05,2023-08-28,Grade 3,Worsened,Dose reduced,N,N 20,P00006,Nausea,Moderate,2023-04-09,2023-05-02,Grade 3,Death,Dose reduced,N,N 21,P00006,Rash,Severe,2023-09-06,2023-09-20,Grade 1,Worsened,Dose reduced,Y,N 22,P00006,Fever,Moderate,2023-08-04,2023-08-19,Grade 2,Improving,Dose not changed,N,N 23,P00006,Fatigue,Mild,2023-03-11,2023-03-26,Grade 3,Death,Dose reduced,N,Y', 'type': 'Document'}]",18
3,How many patients died??,Three patients died.,How many patients died??,"[{'lc_attributes': {}, 'lc_secrets': {}, 'metadata': {'source': 'https://raw.githubusercontent.com/AksChougule/gen-sdtm/main/testing/output/ae.csv'}, 'page_content': ',PATID,AETERM,AESEV,AESTDTC,AEENDTC,AETOXGR,AEOUT,AEACT,AESER,AESDTH 0,P00001,Dizziness,Moderate,2023-07-23,2023-08-04,Grade 2,Recovered,Drug withdrawn,N,Y 1,P00001,Rash,Mild,2022-12-13,2022-12-14,Grade 3,Improving,Dose not changed,N,Y 2,P00001,Fever,Severe,2023-06-22,2023-07-09,Grade 2,Unchanged,Dose reduced,N,Y 3,P00001,Nausea,Mild,2023-09-10,2023-10-06,Grade 1,Recovered,Dose not changed,N,Y 4,P00001,Nausea,Mild,2023-02-21,2023-02-25,Grade 3,Unchanged,Dose reduced,Y,N 5,P00002,Rash,Severe,2023-01-17,2023-01-18,Grade 1,Recovered,Drug withdrawn,N,Y 6,P00002,Headache,Severe,2023-08-22,2023-09-19,Grade 3,Worsened,Dose not changed,N,Y 7,P00002,Fatigue,Severe,2023-01-16,2023-01-26,Grade 4,Unchanged,Dose reduced,N,N 8,P00002,Fatigue,Moderate,2023-06-25,2023-07-13,Grade 2,Unchanged,Dose reduced,N,Y 9,P00003,Dizziness,Mild,2023-01-07,2023-01-17,Grade 3,Death,Drug withdrawn,N,Y 10,P00003,Fever,Severe,2022-11-21,2022-12-06,Grade 1,Recovered,Drug withdrawn,N,Y 11,P00003,Headache,Severe,2022-12-23,2023-01-01,Grade 3,Worsened,Drug withdrawn,N,N 12,P00003,Fatigue,Mild,2022-12-25,2023-01-17,Grade 4,Death,Dose reduced,N,Y 13,P00003,Fever,Mild,2023-09-20,2023-10-14,Grade 1,Worsened,Drug withdrawn,Y,N 14,P00004,Fatigue,Moderate,2023-09-26,2023-10-25,Grade 3,Unchanged,Drug withdrawn,Y,Y 15,P00004,Nausea,Severe,2023-01-01,2023-01-19,Grade 4,Worsened,Drug withdrawn,Y,Y 16,P00005,Fever,Severe,2023-02-06,2023-02-14,Grade 3,Recovered,Dose reduced,N,Y 17,P00005,Fever,Mild,2023-08-27,2023-09-22,Grade 3,Unchanged,Dose reduced,Y,Y 18,P00005,Fever,Severe,2022-12-21,2023-01-01,Grade 1,Recovered,Drug withdrawn,Y,N 19,P00006,Headache,Moderate,2023-08-05,2023-08-28,Grade 3,Worsened,Dose reduced,N,N 20,P00006,Nausea,Moderate,2023-04-09,2023-05-02,Grade 3,Death,Dose reduced,N,N 21,P00006,Rash,Severe,2023-09-06,2023-09-20,Grade 1,Worsened,Dose reduced,Y,N 22,P00006,Fever,Moderate,2023-08-04,2023-08-19,Grade 2,Improving,Dose not changed,N,N 23,P00006,Fatigue,Mild,2023-03-11,2023-03-26,Grade 3,Death,Dose reduced,N,Y', 'type': 'Document'}]",4


Let's create a subset of this table

In [15]:
results.tables["eval_results_table"][['questions','outputs','token_count']]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,questions,outputs,token_count
0,How many patients (PATID) are there?,There are 6 patients (PATID).,9
1,How many servere instances of fever happened?,Two instances of severe fever happened.,7
2,Which PATIDs had severe instances of fever?,"P00001, P00003, and P00005 had severe instances of fever.",18
3,How many patients died??,Three patients died.,4


In [19]:
# The source table
df = pd.read_table("https://raw.githubusercontent.com/AksChougule/gen-sdtm/main/testing/output/ae.csv", 
                   delimiter =",")
df

Unnamed: 0.1,Unnamed: 0,PATID,AETERM,AESEV,AESTDTC,AEENDTC,AETOXGR,AEOUT,AEACT,AESER,AESDTH
0,0,P00001,Dizziness,Moderate,2023-07-23,2023-08-04,Grade 2,Recovered,Drug withdrawn,N,Y
1,1,P00001,Rash,Mild,2022-12-13,2022-12-14,Grade 3,Improving,Dose not changed,N,Y
2,2,P00001,Fever,Severe,2023-06-22,2023-07-09,Grade 2,Unchanged,Dose reduced,N,Y
3,3,P00001,Nausea,Mild,2023-09-10,2023-10-06,Grade 1,Recovered,Dose not changed,N,Y
4,4,P00001,Nausea,Mild,2023-02-21,2023-02-25,Grade 3,Unchanged,Dose reduced,Y,N
5,5,P00002,Rash,Severe,2023-01-17,2023-01-18,Grade 1,Recovered,Drug withdrawn,N,Y
6,6,P00002,Headache,Severe,2023-08-22,2023-09-19,Grade 3,Worsened,Dose not changed,N,Y
7,7,P00002,Fatigue,Severe,2023-01-16,2023-01-26,Grade 4,Unchanged,Dose reduced,N,N
8,8,P00002,Fatigue,Moderate,2023-06-25,2023-07-13,Grade 2,Unchanged,Dose reduced,N,Y
9,9,P00003,Dizziness,Mild,2023-01-07,2023-01-17,Grade 3,Death,Drug withdrawn,N,Y


There is some discussion I see about using JSON format to create the appropriate input. 

Will need to resume when I find a better source