**LLM-enabled Patient-centric Clinical Trial Matching.**

This notebook includes the initial attempt an exploration for an application to match patients to clinical trials using an LLM

### The steps that are carried out to perform the matching of the patients with the clinical trials are shown below:

- Obtain the data for the clinical trials, in this notebook the clinical trials we focus on are the trials that concern patients with breast cancer

- Establish a prompt for the model that enables the model use the data to answer a query about matching a patient to a clinical trial. To achieve this goal we try two approaches:

    - **Approach 1** Include the clinical trial data in the prompt of the model to enable the model to address the query to match a patient to a trial
    - **Approach 2** Use a RAG-enabled LLM to enable the model to address the query to match a patient to a trial
    
- Establish a synthetic patient on which to check the responses of the model. To this purpose, two approaches are currently being purused:

    - Prompt the LLM to generate synthetic data about a patient
    - Use an existing synthetic patient data file and select synthetic data from it

- Ask the model to match a synthetic patient with a clinical trial using each approach
    - Also prompt the model to assign a score to each recommended clinical trial and prompt the model to choose one trial that is suited for the patient
- Display the result

The cells in this notebook describe the activities, and each cell is preceded by a short description of its contents and output as applicable.

### The cells shown below use the API for clinicalTrial.gov to get the data on breast cancer patients

In [None]:
!pip install requests pandas



#### Use ClinicalTrial.gov API to retrieve clinical trial information related to Breast Cancer

In [None]:
import requests
import pandas as pd

# Initial URL for the first API call
base_url = "https://clinicaltrials.gov/api/v2/studies"
params = {
    "query.titles": "Breast Cancer",
    "pageSize": 100
}

# Initialize an empty list to store the data
data_list = []

# Loop until there is no nextPageToken
while True:
    # Print the current URL (for debugging purposes)
    print("Fetching data from:", base_url + '?' + '&'.join([f"{k}={v}" for k, v in params.items()]))

    # Send a GET request to the API
    response = requests.get(base_url, params=params)

    # Check if the request was successful
    if response.status_code == 200:
        data = response.json()  # Parse JSON response
        studies = data.get('studies', [])  # Extract the list of studies

        # Loop through each study and extract specific information
        for study in studies:
            # Safely access nested keys
            nctId = study['protocolSection']['identificationModule'].get('nctId', 'Unknown')
            overallStatus = study['protocolSection']['statusModule'].get('overallStatus', 'Unknown')
            startDate = study['protocolSection']['statusModule'].get('startDateStruct', {}).get('date', 'Unknown Date')
            conditions = ', '.join(study['protocolSection']['conditionsModule'].get('conditions', ['No conditions listed']))
            acronym = study['protocolSection']['identificationModule'].get('acronym', 'Unknown')

            # Extract interventions safely
            interventions_list = study['protocolSection'].get('armsInterventionsModule', {}).get('interventions', [])
            interventions = ', '.join([intervention.get('name', 'No intervention name listed') for intervention in interventions_list]) if interventions_list else "No interventions listed"

            # Extract locations safely
            locations_list = study['protocolSection'].get('contactsLocationsModule', {}).get('locations', [])
            locations = ', '.join([f"{location.get('city', 'No City')} - {location.get('country', 'No Country')}" for location in locations_list]) if locations_list else "No locations listed"

            # Extract dates and phases
            primaryCompletionDate = study['protocolSection']['statusModule'].get('primaryCompletionDateStruct', {}).get('date', 'Unknown Date')
            studyFirstPostDate = study['protocolSection']['statusModule'].get('studyFirstPostDateStruct', {}).get('date', 'Unknown Date')
            lastUpdatePostDate = study['protocolSection']['statusModule'].get('lastUpdatePostDateStruct', {}).get('date', 'Unknown Date')
            studyType = study['protocolSection']['designModule'].get('studyType', 'Unknown')
            phases = ', '.join(study['protocolSection']['designModule'].get('phases', ['Not Available']))

            # Extract eligibility criteria
            #eligibilities = ', '.join(study['protocolSection']['eligibilityModule'].get('eligibilityCriteria', ['No criteria listed']))

            eligibility = study['protocolSection']['eligibilityModule'].get('eligibilityCriteria', 'Unknown')

            #eligibilities_list = study['protocolSection'].get('eligibilityModule', {}).get('eligibilityCriteria', [])
            #eligibilities = ', '.join([eligiblity.get('name', 'No criteria listed') for eligiblity in eligibilities_list]) if eligibilities_list else "No eligibility listed"

            # Append the data to the list as a dictionary
            data_list.append({
                "NCT ID": nctId,
                "Acronym": acronym,
                "Overall Status": overallStatus,
                "Start Date": startDate,
                "Conditions": conditions,
                "Interventions": interventions,
                "Locations": locations,
                "Primary Completion Date": primaryCompletionDate,
                "Study First Post Date": studyFirstPostDate,
                "Last Update Post Date": lastUpdatePostDate,
                "Study Type": studyType,
                "Phases": phases,
                "Eligibility": eligibility
            })

        # Check for nextPageToken and update the params or break the loop
        nextPageToken = data.get('nextPageToken')
        if nextPageToken:
            params['pageToken'] = nextPageToken  # Set the pageToken for the next request
        else:
            break  # Exit the loop if no nextPageToken is present
    else:
        print("Failed to fetch data. Status code:", response.status_code)
        break

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data_list)

# Print the DataFrame
print(df)

# Optionally, save the DataFrame to a CSV file
df.to_csv("clinical_trials_data_complete.csv", index=False)

Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100
Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100&pageToken=NF0g5JCPk_Et
Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100&pageToken=NF0g5JSGkvEo
Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100&pageToken=NF0g5JKHkfksyQ
Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100&pageToken=NF0g5JKDkvEqxQ
Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100&pageToken=NF0g5JKAkfQtxg
Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100&pageToken=NF0g5JGHl_coxQ
Fetching data from: https://clinicaltrials.gov/api/v2/studies?query.titles=Breast Cancer&pageSize=100&pageToken=NF0g5JGDkPYqxA
Fetching data

#### Filter the Clinical Trial Data for Trials Actively Recruiting Patients

In [None]:
df = df[df['Overall Status'] == 'RECRUITING']

In [None]:
df

Unnamed: 0,NCT ID,Acronym,Overall Status,Start Date,Conditions,Interventions,Locations,Primary Completion Date,Study First Post Date,Last Update Post Date,Study Type,Phases,Eligibility
32,NCT05696626,ELAINEIII,RECRUITING,2023-10-31,Metastatic Breast Cancer,"Lasofoxifene in combination with abemaciclib, ...","Phoenix - United States, Tucson - United State...",2025-06,2023-01-25,2024-06-18,INTERVENTIONAL,PHASE3,Inclusion Criteria:\n\n1. Pre- or postmenopaus...
164,NCT05896566,PREcoopERA,RECRUITING,2024-01-23,Breast Cancer,"Giredestrant, Triptorelin, Anastrozole","Villejuif - France, Berlin - Germany, Berlin -...",2026-06-01,2023-06-09,2024-06-26,INTERVENTIONAL,PHASE2,Inclusion Criteria:\n\n* Premenopausal women a...
174,NCT04965766,ICARUS-BREAST,RECRUITING,2021-05-11,Metastatic Breast Cancer,U3-1402,Villejuif - France,2024-01-11,2021-07-16,2023-05-18,INTERVENTIONAL,PHASE2,Inclusion Criteria:\n\n* Adults with histologi...
179,NCT03924466,VUBAR,RECRUITING,2019-04-01,"Metastatic Breast Carcinoma, Locally Advanced ...",68GaNOTA-Anti-HER2 VHH1,Brussels - Belgium,2024-11,2019-04-23,2024-02-16,INTERVENTIONAL,PHASE2,COHORT SPECIFIC INCLUSION CRITERIA:\n\nCOHORT ...
225,NCT05076682,Renaissance,RECRUITING,2022-06-30,Triple-negative Breast Cancer,"Choline, anti-PD-1 antibody and chemotherapy, ...",Shanghai - China,2022-12,2021-10-13,2022-10-03,INTERVENTIONAL,PHASE2,Inclusion Criteria:\n\n* ECOG Performance Stat...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10151,NCT05288777,Breast53,RECRUITING,2022-07-11,"Breast Cancer, Breast Cancer Stage I, Breast C...","T-DM1, Capecitabine, External Beam Radiation T...",Charlottesville - United States,2024-12,2022-03-21,2023-07-17,INTERVENTIONAL,"PHASE2, PHASE3",Inclusion Criteria:\n\n1. Provision of signed ...
10308,NCT05732051,NARNIA,RECRUITING,2023-03-16,"Breast Cancer, Metastatic Breast Cancer, Cance...","Nicotinamide Riboside, Placebo",Lørenskog - Norway,2025-08-01,2023-02-16,2023-03-17,INTERVENTIONAL,PHASE2,Inclusion Criteria:\n\n* Women with metastatic...
10398,NCT05774951,CAMBRIA-1,RECRUITING,2023-03-31,"Breast Cancer, Early Breast Cancer","Camizestrant, Tamoxifen, Anastrozole, Letrozol...","Birmingham - United States, Dothan - United St...",2027-04-19,2023-03-20,2024-07-09,INTERVENTIONAL,PHASE3,"Inclusion Criteria:\n\n* Women and Men, ≥18 ye..."
10493,NCT05514054,EMBER-4,RECRUITING,2022-10-04,Breast Neoplasms,"Imlunestrant, Tamoxifen, Anastrozole, Letrozol...","Daphne - United States, Huntsville - United St...",2027-10-15,2022-08-24,2024-06-25,INTERVENTIONAL,PHASE3,Inclusion Criteria:\n\n* Have a diagnosis of E...


In [None]:
pip install openai --upgrade

Collecting openai
  Downloading openai-1.36.1-py3-none-any.whl (328 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.8/328.8 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

# The cells shown below use langchain to prompt the model to match patients to clinical trials

Install langchain

In [None]:
!pip install langchain_openai

Collecting langchain_openai
  Downloading langchain_openai-0.1.17-py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.7/46.7 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.3.0,>=0.2.20 (from langchain_openai)
  Downloading langchain_core-0.2.22-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.5/373.5 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m32.0 MB/s[0m eta [36m0:00:00[0m
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.3.0,>=0.2.20->langchain_openai)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.2.0,>=0.1.75 (from langchain-core<0.3.0,>=0.2.20->langchain_openai)
  Downloading langsmith-0.1.9

### Select the model to use with langchain

In [None]:
import os
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = 'XXXXXXXX' # Please use own API if you want to run
# model
# model = "gpt-3.5-turbo" # cheapest gpt model
model = "gpt-4o" # most powerful gpt model
llm = ChatOpenAI(temperature=0.1, model=model)

In [None]:
from openai import OpenAI

client = OpenAI(
  api_key=os.environ["OPENAI_API_KEY"],  # this is also the default, it can be omitted
)

### Use the model to generate synthetic data for patients

In [None]:
problem = "breast cancer"

query = f"""can you create synthetic medical conditions histories, and profiles for
four fictitious patients who suffer from more than one medical condition including but not limited to
{problem} using data from https://synthea.mitre.org?
"""

print(query)

llm.invoke(query)

can you create synthetic medical conditions histories, and profiles for
four fictitious patients who suffer from more than one medical condition including but not limited to
breast cancer using data from https://synthea.mitre.org?



AIMessage(content='Patient 1:\nName: Sarah Johnson\nAge: 55\nGender: Female\nMedical Conditions: Breast cancer, hypertension, diabetes\nHistory: Sarah was diagnosed with breast cancer at the age of 50. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which are managed with medication and lifestyle changes. Sarah continues to see her oncologist regularly for follow-up appointments and screenings.\n\nPatient 2:\nName: John Smith\nAge: 68\nGender: Male\nMedical Conditions: Breast cancer, heart disease, osteoporosis\nHistory: John was diagnosed with breast cancer at the age of 65. He underwent a mastectomy and is currently on hormone therapy to prevent recurrence. In addition to breast cancer, John also has a history of heart disease and osteoporosis. He takes medication for his heart condition and supplements for his bone health. John sees multiple specialists to manage his various medical conditions.\n\nP

# This is a prinout of the synthetic data for four patients

In [None]:
print('Patient 1:\nName: Sarah Johnson\nAge: 55\nGender: Female\nMedical Conditions: Breast cancer, hypertension, diabetes\nHistory: Sarah was diagnosed with breast cancer at the age of 50. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which she manages with medication and lifestyle changes. Sarah is currently in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments.\n\nPatient 2:\nName: John Smith\nAge: 62\nGender: Male\nMedical Conditions: Breast cancer, heart disease, depression\nHistory: John was diagnosed with breast cancer at the age of 58. He underwent surgery and hormone therapy to treat the cancer. He also has a history of heart disease, for which he takes medication and follows a healthy diet and exercise regimen. John has struggled with depression since his cancer diagnosis and sees a therapist regularly for support.\n\nPatient 3:\nName: Maria Rodriguez\nAge: 45\nGender: Female\nMedical Conditions: Breast cancer, asthma, anxiety\nHistory: Maria was diagnosed with breast cancer at the age of 42. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of asthma, which she manages with inhalers and regular check-ups with her pulmonologist. Maria has been experiencing anxiety since her cancer diagnosis and sees a psychiatrist for medication management and therapy.\n\nPatient 4:\nName: Michael Thompson\nAge: 60\nGender: Male\nMedical Conditions: Breast cancer, arthritis, high cholesterol\nHistory: Michael was diagnosed with breast cancer at the age of 55. He underwent surgery and radiation therapy to treat the cancer. He also has a history of arthritis, which he manages with medication, physical therapy, and lifestyle modifications. Michael has high cholesterol, for which he takes medication and follows a heart-healthy diet. He continues to see his oncologist for regular follow-up appointments.')

Patient 1:
Name: Sarah Johnson
Age: 55
Gender: Female
Medical Conditions: Breast cancer, hypertension, diabetes
History: Sarah was diagnosed with breast cancer at the age of 50. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which she manages with medication and lifestyle changes. Sarah is currently in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments.

Patient 2:
Name: John Smith
Age: 62
Gender: Male
Medical Conditions: Breast cancer, heart disease, depression
History: John was diagnosed with breast cancer at the age of 58. He underwent surgery and hormone therapy to treat the cancer. He also has a history of heart disease, for which he takes medication and follows a healthy diet and exercise regimen. John has struggled with depression since his cancer diagnosis and sees a therapist regularly for support.

Patient 3:
Name: Maria Rodriguez
Age: 45
Ge

# Approach 1: Use langchain to ask the model to answer the following questions for a synthetic patient while including the clinical trial data in the prompt:

- State the NCT ID numbers of the 2 clinical trials most suited for a selected synthetic patient,

- For each clinical trial, list the overall status, start date, completion date, conditions, interventions, type and phase.

- For each clinical trial, list the inclusion criteria for that clinical trial including age, condition and medical history.

- For each clinical trial, list the exclusion criteria for that clinical trial including age, condition and medical history.

- Explain why each clinical trial is suitable for the patient using four reasons paying attention to patient age, medical conditions, and medical history. The reasons must consider the patient medical history, conditions, gender, age, lifestyle.

- For each trial explain why the patient's medical conditions and age qualify the patient for inclusion within the clinical trial.

- For each trial, explain why the patient was not excluded from the trial using the exclusion criteria and the data on the patient's medical history and conditions.

- For the patient, use the trial inclusion and exclusion criteria and patient medical history to explain why one clinical trial is better suited for the patient than the other.

In [None]:
#problem = "45 year old female suffering from Invasive Ductal Carcinoma (Stage II)"

problem = "55 year old femaie, Sarah was diagnosed with breast cancer at the age of 50. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which she manages with medication and lifestyle changes. Sarah is currently in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments."

query = f"""Problem statement: based on the following information about a patient {problem}

Using the inclusion and exclusion criteria in {df}, can you find the NCT ID number of two clinical trials that have 'Overall Status' is 'RECRUITING' are most suited for the patient?

For each clinical trial, list the overall status, start date, completion date, conditions, interventions, type and phase.

For each clinical trial, list the inclusion criteria for that clinical trial including age, condition and medical history.

For each clinical trial, list the exclusion criteria for that clinical trial including age, condition and medical history.

Explain why each clinical trial is suitable for the patient using four reasons paying attention to patient age, medical conditions, and medical history. The reasons must consider the patient medical history, conditions, gender, age, lifestyle.

For each trial explain why the patient's medical conditions and age qualify the patient for inclusion within the clinical trial.

For each trial, explain why the patient was not excluded from the trial using the exclusion criteria and the data on the patient's medical history and conditions.

For the patient, use the trial inclusion and exclusion criteria and patient medical history to explain why one clinical trial is better suited for the patient than the other.

"""

print(query)

llm.invoke(query)

Problem statement: based on the following information about a patient 55 year old femaie, Sarah was diagnosed with breast cancer at the age of 50. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which she manages with medication and lifestyle changes. Sarah is currently in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments.

Using the inclusion and exclusion criteria in             NCT ID        Acronym Overall Status  Start Date  \
32     NCT05696626      ELAINEIII     RECRUITING  2023-10-31   
164    NCT05896566     PREcoopERA     RECRUITING  2024-01-23   
174    NCT04965766  ICARUS-BREAST     RECRUITING  2021-05-11   
179    NCT03924466          VUBAR     RECRUITING  2019-04-01   
225    NCT05076682    Renaissance     RECRUITING  2022-06-30   
...            ...            ...            ...         ...   
10151  NCT05288777       Breast53     RECRU

AIMessage(content="Based on the inclusion and exclusion criteria provided, two clinical trials that are most suited for the patient, Sarah, are:\n\n1. Clinical Trial: NCT05696626 (ELAINEIII)\n   - Overall Status: RECRUITING\n   - Start Date: 2023-10-31\n   - Completion Date: 2025-06\n   - Conditions: Metastatic Breast Cancer\n   - Interventions: Lasofoxifene in combination with abemaciclib, Goserelin, and Palbociclib\n   - Study Type: INTERVENTIONAL\n   - Phases: PHASE3\n   - Inclusion Criteria: Pre- or postmenopausal women with hormone receptor-positive, HER2-negative metastatic breast cancer who have received no more than one prior line of endocrine therapy in the metastatic setting.\n   - Exclusion Criteria: Men, patients with active brain metastases, and patients with a history of other malignancies within the past 5 years.\n   \n   Reasons why this trial is suitable for Sarah:\n   1. Sarah is a 55-year-old female with a history of breast cancer, making her eligible for the trial f

### Print out the output of the questioning

In [None]:
print("Based on the provided information, the two clinical trials that are most suited for the patient are:\n\n1. Clinical Trial 1:\n   - NCT ID: NCT04567890\n   - Overall Status: RECRUITING\n   - Start Date: 2021-01\n   - Completion Date: 2023-06\n   - Conditions: Breast Cancer\n   - Interventions: Chemotherapy, Radiation Therapy\n   - Study Type: INTERVENTIONAL\n   - Phases: PHASE 2\n\n   Inclusion Criteria:\n   - Female patients aged 50-60\n   - Histologically confirmed breast cancer\n   - Previous treatment with surgery, chemotherapy, and radiation therapy\n   - No evidence of active disease\n   - Adequate organ function\n\n   Exclusion Criteria:\n   - Age below 50 or above 60\n   - Other active malignancies\n   - Severe comorbidities affecting treatment tolerance\n\n   Reasons why this trial is suitable for the patient:\n   - The patient's age falls within the specified range for inclusion criteria.\n   - The patient has a history of breast cancer and has undergone the required treatments.\n   - The patient is currently in remission with no evidence of active disease.\n   - The patient's medical history aligns with the trial's focus on breast cancer treatment.\n\n2. Clinical Trial 2:\n   - NCT ID: NCT05234567\n   - Overall Status: RECRUITING\n   - Start Date: 2022-03\n   - Completion Date: 2024-09\n   - Conditions: Breast Cancer, Hypertension, Diabetes\n   - Interventions: Immunotherapy, Targeted Therapy\n   - Study Type: INTERVENTIONAL\n   - Phases: PHASE 3\n\n   Inclusion Criteria:\n   - Female patients aged 50-70\n   - Histologically confirmed breast cancer\n   - Controlled hypertension and diabetes\n   - No prior immunotherapy or targeted therapy\n   - Adequate performance status\n\n   Exclusion Criteria:\n   - Age below 50 or above 70\n   - Uncontrolled hypertension or diabetes\n   - Previous immunotherapy or targeted therapy\n   - Significant cardiac or renal dysfunction\n\n   Reasons why this trial is suitable for the patient:\n   - The patient's age falls within the specified range for inclusion criteria.\n   - The patient has a history of breast cancer and controlled hypertension and diabetes.\n   - The trial focuses on immunotherapy and targeted therapy, which could benefit the patient.\n   - The patient's medical history aligns with the trial's inclusion criteria.\n\nIn comparing the two clinical trials, Clinical Trial 2 may be better suited for the patient as it specifically addresses the patient's medical conditions of hypertension and diabetes in addition to breast cancer. The trial's focus on immunotherapy and targeted therapy aligns with the patient's treatment history and could potentially offer new treatment options. Additionally, the patient's age falls within the specified range, and her medical history meets the inclusion criteria for the trial.")

Based on the provided information, the two clinical trials that are most suited for the patient are:

1. Clinical Trial 1:
   - NCT ID: NCT04567890
   - Overall Status: RECRUITING
   - Start Date: 2021-01
   - Completion Date: 2023-06
   - Conditions: Breast Cancer
   - Interventions: Chemotherapy, Radiation Therapy
   - Study Type: INTERVENTIONAL
   - Phases: PHASE 2

   Inclusion Criteria:
   - Female patients aged 50-60
   - Histologically confirmed breast cancer
   - Previous treatment with surgery, chemotherapy, and radiation therapy
   - No evidence of active disease
   - Adequate organ function

   Exclusion Criteria:
   - Age below 50 or above 60
   - Other active malignancies
   - Severe comorbidities affecting treatment tolerance

   Reasons why this trial is suitable for the patient:
   - The patient's age falls within the specified range for inclusion criteria.
   - The patient has a history of breast cancer and has undergone the required treatments.
   - The patient is cur

### Prompt the model to also include a scoring of the clinical trials, to see what the model can do regarding scoring the trials

In [None]:
#problem = "45 year old female suffering from Invasive Ductal Carcinoma (Stage II)"

problem = "55 year old femaie, Sarah was diagnosed with breast cancer at the age of 50. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which she manages with medication and lifestyle changes. Sarah is currently in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments."

query = f"""Problem statement: based on the following information about a patient {problem}

Using the inclusion and exclusion criteria in {df}, can you find the NCT ID number of two clinical trials that have 'Overall Status' is 'RECRUITING' are most suited for the patient?

For each clinical trial, list the overall status, start date, completion date, conditions, interventions, type and phase.

For each clinical trial, list the inclusion criteria for that clinical trial including age, condition and medical history.

For each clinical trial, list the exclusion criteria for that clinical trial including age, condition and medical history.

Explain why each clinical trial is suitable for the patient using four reasons paying attention to patient age, medical conditions, and medical history. The reasons must consider the patient medical history, conditions, gender, age, lifestyle.

For each trial explain why the patient's medical conditions and age qualify the patient for inclusion within the clinical trial.

For each trial, explain why the patient was not excluded from the trial using the exclusion criteria and the data on the patient's medical history and conditions.

For the patient, use the trial inclusion and exclusion criteria and patient medical history to explain why one clinical trial is better suited for the patient than the other.

Then assign a seven point score based on three criteria, to each clinical trial to determine which trial is better suited to the patient and explain how the scores are calculated for each criteria.

"""

print(query)

llm.invoke(query)

Problem statement: based on the following information about a patient 55 year old femaie, Sarah was diagnosed with breast cancer at the age of 50. She underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which she manages with medication and lifestyle changes. Sarah is currently in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments.

Using the inclusion and exclusion criteria in             NCT ID        Acronym Overall Status  Start Date  \
32     NCT05696626      ELAINEIII     RECRUITING  2023-10-31   
164    NCT05896566     PREcoopERA     RECRUITING  2024-01-23   
174    NCT04965766  ICARUS-BREAST     RECRUITING  2021-05-11   
179    NCT03924466          VUBAR     RECRUITING  2019-04-01   
225    NCT05076682    Renaissance     RECRUITING  2022-06-30   
...            ...            ...            ...         ...   
10151  NCT05288777       Breast53     RECRU

AIMessage(content="Clinical Trial 1:\n- NCT ID: NCT05696626\n- Overall Status: RECRUITING\n- Start Date: 2023-10-31\n- Completion Date: 2025-06-01\n- Conditions: Metastatic Breast Cancer\n- Interventions: Lasofoxifene in combination with abemaciclib\n- Type: INTERVENTIONAL\n- Phase: PHASE3\n\nInclusion Criteria:\n- Pre- or postmenopausal women aged 18 years and older\n- Histologically or cytologically confirmed metastatic breast cancer\n- Adequate organ function\n- ECOG Performance Status of 0-1\n\nExclusion Criteria:\n- Prior treatment with CDK4/6 inhibitors\n- Active infection\n- Uncontrolled intercurrent illness\n\nReasons why this trial is suitable for the patient:\n1. The patient's history of breast cancer aligns with the trial's focus on metastatic breast cancer.\n2. The patient's age (55 years) falls within the age range specified for inclusion criteria.\n3. The patient's medical history of managing hypertension and diabetes does not exclude her from the trial.\n4. The patient's

# Print out the output of the questioning with the scoring



In [None]:
print("Based on the information provided, two clinical trials that are most suited for the patient are:\n\n1. Clinical Trial 1:\n   - NCT ID: NCT05696626\n   - Overall Status: RECRUITING\n   - Start Date: 2023-10-31\n   - Primary Completion Date: 2025-06\n   - Conditions: Metastatic Breast Cancer\n   - Interventions: Lasofoxifene in combination with abemaciclib\n   - Study Type: INTERVENTIONAL\n   - Phases: PHASE3\n   - Inclusion Criteria: Pre- or postmenopausal women with metastatic breast cancer\n   - Exclusion Criteria: Men, pregnant or lactating women, other malignancies within the past 5 years\n   - Reasons for suitability:\n     1. The patient is a 55-year-old female with a history of breast cancer, which aligns with the trial's focus on metastatic breast cancer.\n     2. The patient's age falls within the typical age range for menopausal women with breast cancer.\n     3. The patient's medical history of breast cancer makes her a suitable candidate for this trial.\n     4. The patient's current remission status indicates she may benefit from further treatment for metastatic breast cancer.\n\n2. Clinical Trial 2:\n   - NCT ID: NCT05896566\n   - Overall Status: RECRUITING\n   - Start Date: 2024-01-23\n   - Primary Completion Date: 2026-06-01\n   - Conditions: Breast Cancer\n   - Interventions: Giredestrant, Triptorelin, Anastrozole\n   - Study Type: INTERVENTIONAL\n   - Phases: PHASE2\n   - Inclusion Criteria: Premenopausal women with breast cancer\n   - Exclusion Criteria: Men, postmenopausal women, other malignancies within the past 5 years\n   - Reasons for suitability:\n     1. The patient is a 55-year-old female with a history of breast cancer, which aligns with the trial's focus on breast cancer.\n     2. The patient's age falls within the typical age range for premenopausal women with breast cancer.\n     3. The patient's medical history of breast cancer makes her a suitable candidate for this trial.\n     4. The patient's current remission status indicates she may benefit from further treatment for breast cancer.\n\nBased on the criteria of patient age, medical conditions, and medical history, Clinical Trial 1 (NCT05696626) is better suited for the patient. \n\nScoring Criteria:\n1. Patient Age: Clinical Trial 1 - 3 points, Clinical Trial 2 - 2 points\n2. Medical Conditions: Clinical Trial 1 - 3 points, Clinical Trial 2 - 2 points\n3. Medical History: Clinical Trial 1 - 1 point, Clinical Trial 2 - 1 point\n\nTotal Score:\n- Clinical Trial 1: 7 points\n- Clinical Trial 2: 5 points\n\nTherefore, Clinical Trial 1 (NCT05696626) is better suited for the patient based on the scoring criteria.")

Based on the information provided, two clinical trials that are most suited for the patient are:

1. Clinical Trial 1:
   - NCT ID: NCT05696626
   - Overall Status: RECRUITING
   - Start Date: 2023-10-31
   - Primary Completion Date: 2025-06
   - Conditions: Metastatic Breast Cancer
   - Interventions: Lasofoxifene in combination with abemaciclib
   - Study Type: INTERVENTIONAL
   - Phases: PHASE3
   - Inclusion Criteria: Pre- or postmenopausal women with metastatic breast cancer
   - Exclusion Criteria: Men, pregnant or lactating women, other malignancies within the past 5 years
   - Reasons for suitability:
     1. The patient is a 55-year-old female with a history of breast cancer, which aligns with the trial's focus on metastatic breast cancer.
     2. The patient's age falls within the typical age range for menopausal women with breast cancer.
     3. The patient's medical history of breast cancer makes her a suitable candidate for this trial.
     4. The patient's current remissi

# Approach 2: Use RAG with the LLM to enable it to answer queries about the trials matching the patients

Use RAG with the clinical trial data to provide the model with the relevant information on clinical trials for breast cancer patients

In [None]:
import openai
import tiktoken
from scipy import spatial
import pandas as pd
import ast
import openai

df['text'] = 'The trial with NCT ID number ' + df['NCT ID'] + ' has the following eligbility criteria, for inclusion and exclusion ' + df['Eligibility']
df.head()['text']

df = df[df['Overall Status'] == 'RECRUITING']
df = df[df['Acronym'] != 'Unknown']
df = df[df['Acronym'] != '#Name?']
df = df[df['Phases'] != 'NA']
df = df[df['Phases'] != 'Not Available']

# Function to get text embeddings
def text_embedding(text):
    response = client.embeddings.create(model = "text-embedding-ada-002", input = text)
    return response.data[0].embedding

# Apply the text_embedding function to the 'text' column
df['embedding'] = df['text'].apply(lambda x: text_embedding(x))

df = df.assign(embedding = (df["text"].apply(lambda x : text_embedding(x))))
print(df.head())

def strings_ranked_by_relatedness(
    query: str,
    df: pd.DataFrame,
    relatedness_fn = lambda x, y: 1 - spatial.distance.cosine(x, y),
    top_n: int = 100
):

    EMBEDDING_MODEL = "text-embedding-ada-002"
    query_embedding_response = openai.embeddings.create(
        model = EMBEDDING_MODEL,
        input = query,
    )
    query_embedding = query_embedding_response.data[0].embedding
    strings_and_relatednesses = [
        (row["text"], relatedness_fn(query_embedding, row["embedding"]))
        for i, row in df.iterrows()
    ]
    strings_and_relatednesses.sort(key = lambda x: x[1], reverse = True)
    strings, relatednesses = zip(*strings_and_relatednesses)
    return strings[:top_n], relatednesses[:top_n]

strings, relatednesses = strings_ranked_by_relatedness("Clinical Trials", df, top_n = 3)
for string, relatedness in zip(strings, relatednesses):
    print(f"{relatedness = :.3f}")
    display(string)

def num_tokens(text: str) -> int:
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    return len(encoding.encode(text))

def query_message(
    query: str,
    df: pd.DataFrame,
    model: str,
    token_budget: int
) :
    strings, relatednesses = strings_ranked_by_relatedness(query, df)
    introduction = 'Use the below content related to clinical trials to answer the subsequent question. If the answer cannot be found in the articles, write "I could not find an answer."'
    question = f"\n\nQuestion: {query}"
    message = introduction
    for string in strings:
        next_row = f'\n\nClinical trial section:\n"""\n{string}\n"""'
        if (
            num_tokens(message + next_row + question)
            > token_budget
        ):
            break
        else:
            message += next_row
    return message + question

def ask(
    query: str,
    df: pd.DataFrame = df,
    model: str = "gpt-3.5-turbo",
    token_budget: int = 4096 - 500,
    print_message: bool = False,
) :
    message = query_message(query, df, model=model, token_budget = token_budget)
    if print_message:
        print(message)
    messages = [
        {"role": "system", "content": "You answer questions about clinical trials."},
        {"role": "user", "content": message},
    ]
    response = openai.chat.completions.create(
        model = model,
        messages = messages,
        temperature = 0
    )
    #response_message = response["choices"][0]["message"]["content"]
    response_message = response.choices[0].message.content
    return response_message

          NCT ID        Acronym Overall Status  Start Date  \
32   NCT05696626      ELAINEIII     RECRUITING  2023-10-31   
164  NCT05896566     PREcoopERA     RECRUITING  2024-01-23   
174  NCT04965766  ICARUS-BREAST     RECRUITING  2021-05-11   
179  NCT03924466          VUBAR     RECRUITING  2019-04-01   
225  NCT05076682    Renaissance     RECRUITING  2022-06-30   

                                            Conditions  \
32                            Metastatic Breast Cancer   
164                                      Breast Cancer   
174                           Metastatic Breast Cancer   
179  Metastatic Breast Carcinoma, Locally Advanced ...   
225                      Triple-negative Breast Cancer   

                                         Interventions  \
32   Lasofoxifene in combination with abemaciclib, ...   
164             Giredestrant, Triptorelin, Anastrozole   
174                                            U3-1402   
179                            68GaNOTA-Anti-H

"The trial with NCT ID number NCT03971409 has the following eligbility criteria, for inclusion and exclusion Inclusion Criteria:\n\n1. Signed and dated written informed consent\n2. Subjects \\>= 18 years of age\n3. Eastern Cooperative Oncology Group (ECOG) performance status 0 or 1\n4. Clinical stage IV invasive breast cancer or unresectable locoregional recurrence of invasive breast cancer meeting the following criteria:\n\n   * Estrogen receptor (ER)/progesterone receptor (PR)-negative (=\\< 5% cells) by immunohistochemistry (IHC) and human epidermal grow (HER2) negative (by IHC or fluorescence in situ hybridization (FISH))\n   * Measurable disease as defined by Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1 criteria and which can be followed by computed tomography (CT) or magnetic resonance imaging (MRI). A measurable lytic bone lesion(s) and/or skin lesion(s) are allowed. Skin lesions must also be followed by photography with measuring tools within the photograph

relatedness = 0.806


'The trial with NCT ID number NCT04079049 has the following eligbility criteria, for inclusion and exclusion Inclusion Criteria:\n\n* Signed informed consent\n* \\>18 years old\n* ECOG 0-1\n* Breast cancer history\n* Breast cancer liver metastasis verified by biopsy\n* Patient amendable for liver surgery and pre- and postoperative oncological treatment\n* 1-4 liver metastasis amendable to surgery with functional liver remnant volume \\>30%\n* Liver metastasis (and skeletal metastasis) stable or responding to preoperative oncological treatment\n\nExclusion Criteria:\n\n* Non-skeletal extrahepatic disease\n* Non-resected primary tumour\n* Pregnancy\n* Progression of disease upon oncological treatment'

relatedness = 0.805


"The trial with NCT ID number NCT03740893 has the following eligbility criteria, for inclusion and exclusion Inclusion Criteria for Trial Registration:\n\n1. Signed Informed Consent Form (ICF) for Trial Registration;\n2. Aged ≥18 years old;\n3. Histologically confirmed invasive triple negative breast cancer (TNBC). TNBC defined as ER negative, PgR negative (ER and PgR negative as defined by Allred score 0/8, 1/8 or 2/8 or stain in \\<1% of cancer cells) or PgR unavailable, and HER2 negative (immunohistochemistry 0/1+ or negative in situ hybridization) as determined by local laboratory and recorded in the patients notes;\n4. Planned definitive surgical treatment after at least 6 cycles of neoadjuvant chemotherapy (NACT);\n5. Radiographically measurable tumour mass assessable for new distinct radio-opaque marker insertion and repeated biopsies on the NACT mid-assessment standard of care imaging modality (MRI or USS); or clinically thought to be \\>5cm in diameter (T3);\n6. Eastern Oncolo

# Check that the RAG-enabled LLM is able to read in and state the inclusion and exclusion criteria for clinical trials

In [None]:
print(ask('What are the eligibility criteria for the trial with NCT ID number NCT05949333?'))

The eligibility criteria for the trial with NCT ID number NCT05949333 are as follows:

Inclusion Criteria:
- Women aged 18 to 75 years old as of the date of study registration.
- Patients with histologically confirmed invasive adenocarcinoma.
- Patients with confirmed estrogen receptor, progesterone receptor, and Her2 receptor status.
- Patients with Eastern Cooperative Oncology Group (ECOG) performance status 0-1.
- Patients with a left ventricular ejection fraction (LVEF) ≥55%.
- Patients who have agreed to participate in this trial and have provided written consent.

Exclusion Criteria:
- Patients with a history of breast cancer treatment
- Patients with a history of chemotherapy, radiation therapy, immunotherapy, or biotherapy for malignancies other than breast cancer
- Patients with infectious diseases
- Patients with serious illnesses that may affect this clinical trial: cardiovascular disease, kidney disease, liver disease, endocrine disease, tumors, or diabetes
- Other individu

# Prompt the RAG enabled LLM to match patients with clinical trials


### Get data about the patient and place it in the variable called "patient_statement"

In [None]:
patient_statement = "a 55 year old female, diagnosed with breast cancer at the age of 50, underwent surgery, chemotherapy, and radiation therapy to treat the cancer. She also has a history of hypertension and diabetes, which she manages with medication and lifestyle changes and is currently in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments. "

### Store information about the RAG LLM response in "resp_0"

In [None]:
resp_0 = ask("can you find the NCT ID number of two clinical trials that are most relevant to patient who is " + patient_statement + "And explain with four reasons (match the patient age with the inclusion criteria), why each trial is relevant, and show eligibility criteria (inclusion and exclusion) for each trial?")

### Print that response from the RAG LLM

In [None]:
print(resp_0)

The NCT ID numbers of two clinical trials that have the status of 'recruiting' and are most relevant to the patient described are NCT05949333 and NCT05978648.

**NCT05949333:**
- **Relevance to the patient:**
  1. The patient is within the age range specified (55 years old).
  2. The patient has a history of breast cancer and has undergone surgery, chemotherapy, and radiation therapy.
  3. The patient has a history of hypertension and diabetes, which are common comorbidities in breast cancer survivors.
  4. The patient is in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments.

- **Eligibility Criteria:**
  - **Inclusion Criteria:**
    * Women aged 18 to 75 years old.
    * Patients with histologically confirmed invasive adenocarcinoma.
    * Patients with confirmed estrogen receptor, progesterone receptor, and Her2 receptor status.
    * Patients with Eastern Cooperative Oncology Group (ECOG) performance status 0-1.
    * Patients wit

### Assign a score to each of the critical trials and explain that score (prompt engineering is used here to mitigate lack of deterministic response, we constrain the output to that from the previous prompt)

In [None]:
print(ask("Can you assign a score (by using five criteria) to each trial in" + resp_0 + "in terms of how well each trial matches the patient? Can you describe in detail how that score was calculated? Can you state in detail why one trial is better than the other?"))

To assign a score to each trial based on their relevance to the patient described, we will consider the following criteria:

1. Age eligibility: The patient is 55 years old.
2. History of breast cancer treatment: The patient has a history of breast cancer and has undergone surgery, chemotherapy, and radiation therapy.
3. Comorbidities: The patient has a history of hypertension and diabetes, common comorbidities in breast cancer survivors.
4. Current status: The patient is in remission from breast cancer but continues to see her oncologist regularly for follow-up appointments.

Let's evaluate each trial based on these criteria:

**NCT05949333:**
1. Age eligibility: The patient falls within the age range specified (18 to 75 years old). Score: 1 point.
2. History of breast cancer treatment: The patient has a history of breast cancer treatment, which is an exclusion criterion for this trial. Score: 0 points.
3. Comorbidities: The patient has hypertension and diabetes, which are considered 

**Pseudo-Algorithm for Human Expert's Manual Matching Process**


Pseudo-algorithm for Human expert's manual matching process

1. Read patient summary:

Using NIST Trek Clinical Trials Competition synthetic patients data.
https://www.trec-cds.org/topics2021.xml
https://www.trec-cds.org/topics2022.xml

from trec-cds-2021, topic number="25"

The patient is a 42 year-old postmenopausal woman who had a screening sonogram which revealed an abnormality in the right breast. She had no palpable masses on breast exam. Core biopsy was done and revealed a 1.8 cm infiltrating ductal breast carcinoma in the left upper outer quadrant. Lumpectomy was done and the surgical margins were clear. The tumor was HER2-positive and ER/PR negative. Axillary sampling revealed 1 positive lymph node out of 12 sampled. CXR was unremarkable. She is using “well women” multivitamins daily and no other medication. She smokes frequently and consumes alcohol occasionally. She is in a relation with only one partner and has a history of 3 pregnancies and live births. She breastfed all three children.

2. Use clinicaltrials.gov user interface to filter for clinical trials based on patient summary

3. Review the detailed records of each search result and determine whether the patient should be included or excluded from the trial.




**GPT-4 Based Approach: Patient #1**

In [None]:
# synthetic patient using provided topics in the TREK 2021. topic number 25, a breast cancer patient.
problem = "The patient is a 42 year-old postmenopausal woman who had a screening sonogram which revealed an abnormality in the right breast. She had no palpable masses on breast exam. Core biopsy was done and revealed a 1.8 cm infiltrating ductal breast carcinoma in the left upper outer quadrant. Lumpectomy was done and the surgical margins were clear. The tumor was HER2-positive and ER/PR negative. Axillary sampling revealed 1 positive lymph node out of 12 sampled. CXR was unremarkable. She is using “well women” multivitamins daily and no other medication. She smokes frequently and consumes alcohol occasionally. She is in a relation with only one partner and has a history of 3 pregnancies and live births. She breastfed all three children."

query = f"""Problem statement: based on the following information about a patient {problem}

Using the inclusion and exclusion criteria in {df}, can you find the NCT ID number of two clinical trials that have 'Overall Status' is 'RECRUITING' are most suited for the patient?

For each clinical trial, list the overall status, start date, completion date, conditions, interventions, type and phase.

For each clinical trial, list the inclusion criteria for that clinical trial including age, condition and medical history.

For each clinical trial, list the exclusion criteria for that clinical trial including age, condition and medical history.

Explain why each clinical trial is suitable for the patient using four reasons paying attention to patient age, medical conditions, and medical history. The reasons must consider the patient medical history, conditions, gender, age, lifestyle.

For each trial explain why the patient's medical conditions and age qualify the patient for inclusion within the clinical trial.

For each trial, explain why the patient was not excluded from the trial using the exclusion criteria and the data on the patient's medical history and conditions.

For the patient, use the trial inclusion and exclusion criteria and patient medical history to explain why one clinical trial is better suited for the patient than the other.

Then assign a seven point score based on three criteria, to each clinical trial to determine which trial is better suited to the patient and explain how the scores are calculated for each criteria.

"""

print(query)

llm.invoke(query)

Problem statement: based on the following information about a patient The patient is a 42 year-old postmenopausal woman who had a screening sonogram which revealed an abnormality in the right breast. She had no palpable masses on breast exam. Core biopsy was done and revealed a 1.8 cm infiltrating ductal breast carcinoma in the left upper outer quadrant. Lumpectomy was done and the surgical margins were clear. The tumor was HER2-positive and ER/PR negative. Axillary sampling revealed 1 positive lymph node out of 12 sampled. CXR was unremarkable. She is using “well women” multivitamins daily and no other medication. She smokes frequently and consumes alcohol occasionally. She is in a relation with only one partner and has a history of 3 pregnancies and live births. She breastfed all three children.

Using the inclusion and exclusion criteria in             NCT ID     Acronym Overall Status  Start Date  \
4      NCT04443348     Unknown     RECRUITING  2020-12-16   
5      NCT05417048    

AIMessage(content="To determine the most suitable clinical trials for the patient, we will first identify two clinical trials that are recruiting and match the patient's profile. We will then evaluate the inclusion and exclusion criteria for each trial and explain why each trial is suitable for the patient. Finally, we will compare the two trials and assign a score based on three criteria.\n\n### Clinical Trial 1: NCT04759248 (ATREZZO)\n\n**Overall Status:** RECRUITING  \n**Start Date:** 2021-03-15  \n**Primary Completion Date:** 2024-12-01  \n**Conditions:** Breast Cancer  \n**Interventions:** Atezolizumab + Trastuzumab + Vinorelbine  \n**Study Type:** INTERVENTIONAL  \n**Phase:** PHASE2  \n\n**Inclusion Criteria:**\n- Age ≥ 18 years\n- Male or female (Premenopausal or Postmenopausal)\n- Histologically confirmed HER2-positive breast cancer\n- Measurable disease as per RECIST 1.1 criteria\n- ECOG performance status 0-1\n\n**Exclusion Criteria:**\n- Prior treatment with anti-HER2 therap

In [None]:
from IPython.display import display, Markdown

message_content = """
To determine the most suitable clinical trials for the patient, we will first identify two clinical trials that are recruiting and match the patient's profile. We will then evaluate the inclusion and exclusion criteria for each trial and explain why each trial is suitable for the patient. Finally, we will compare the two trials and assign a score based on three criteria.

### Clinical Trial 1: NCT04759248 (ATREZZO)

**Overall Status:** RECRUITING
**Start Date:** 2021-03-15
**Primary Completion Date:** 2024-12-01
**Conditions:** Breast Cancer
**Interventions:** Atezolizumab + Trastuzumab + Vinorelbine
**Study Type:** INTERVENTIONAL
**Phase:** PHASE2

**Inclusion Criteria:**
- Age ≥ 18 years
- Male or female (Premenopausal or Postmenopausal)
- Histologically confirmed HER2-positive breast cancer
- Measurable disease as per RECIST 1.1 criteria
- ECOG performance status 0-1

**Exclusion Criteria:**
- Prior treatment with anti-HER2 therapy
- Active or untreated CNS metastases
- History of autoimmune disease
- Concurrent use of other investigational agents

### Clinical Trial 2: NCT05417048

**Overall Status:** RECRUITING
**Start Date:** 2022-07-14
**Primary Completion Date:** 2023-12-15
**Conditions:** Breast Cancer
**Interventions:** Blood Sample
**Study Type:** OBSERVATIONAL
**Phase:** Not Available

**Inclusion Criteria:**
- Patients ≥ 18 years old
- Histologically confirmed breast cancer
- No prior systemic therapy for metastatic disease
- ECOG performance status 0-2

**Exclusion Criteria:**
- Prior systemic therapy for metastatic disease
- Active infection requiring systemic therapy
- Pregnant or breastfeeding women
- Known HIV or hepatitis B/C infection

### Suitability Analysis

#### Clinical Trial 1: NCT04759248 (ATREZZO)

**Reasons for Suitability:**
1. **Age:** The patient is 42 years old, which meets the inclusion criterion of being ≥ 18 years old.
2. **Medical Condition:** The patient has HER2-positive breast cancer, which matches the trial's requirement for histologically confirmed HER2-positive breast cancer.
3. **Gender:** The trial includes both male and female patients, and the patient is female.
4. **Medical History:** The patient has not received prior anti-HER2 therapy, which aligns with the exclusion criteria.

**Inclusion Qualification:**
- The patient's age (42) qualifies her for inclusion.
- The patient's HER2-positive breast cancer condition qualifies her for inclusion.

**Exclusion Qualification:**
- The patient has no history of prior anti-HER2 therapy.
- The patient has no active or untreated CNS metastases.
- The patient has no history of autoimmune disease.
- The patient is not using other investigational agents.

#### Clinical Trial 2: NCT05417048

**Reasons for Suitability:**
1. **Age:** The patient is 42 years old, which meets the inclusion criterion of being ≥ 18 years old.
2. **Medical Condition:** The patient has histologically confirmed breast cancer, which matches the trial's requirement.
3. **Gender:** The trial includes both male and female patients, and the patient is female.
4. **Medical History:** The patient has not received prior systemic therapy for metastatic disease, which aligns with the exclusion criteria.

**Inclusion Qualification:**
- The patient's age (42) qualifies her for inclusion.
- The patient's histologically confirmed breast cancer condition qualifies her for inclusion.

**Exclusion Qualification:**
- The patient has no history of prior systemic therapy for metastatic disease.
- The patient has no active infection requiring systemic therapy.
- The patient is not pregnant or breastfeeding.
- The patient has no known HIV or hepatitis B/C infection.

### Comparison and Scoring

**Criteria for Scoring:**
1. **Relevance to Medical Condition (3 points):** How well the trial matches the patient's specific medical condition.
2. **Inclusion/Exclusion Criteria Fit (2 points):** How well the patient fits the inclusion and exclusion criteria.
3. **Intervention Suitability (2 points):** How suitable the intervention is for the patient's current treatment plan and medical history.

**Clinical Trial 1: NCT04759248 (ATREZZO)**
- **Relevance to Medical Condition:** 3/3 (Specific to HER2-positive breast cancer)
- **Inclusion/Exclusion Criteria Fit:** 2/2 (Patient fits all criteria)
- **Intervention Suitability:** 2/2 (Atezolizumab + Trastuzumab + Vinorelbine is suitable for HER2-positive breast cancer)

**Total Score:** 7/7

**Clinical Trial 2: NCT05417048**
- **Relevance to Medical Condition:** 2/3 (General breast cancer, not specific to HER2-positive)
- **Inclusion/Exclusion Criteria Fit:** 2/2 (Patient fits all criteria)
- **Intervention Suitability:** 1/2 (Observational study, less direct intervention)

**Total Score:** 5/7

### Conclusion

**Better Suited Clinical Trial:** NCT04759248 (ATREZZO)

**Reasoning:**
- **Specificity:** NCT04759248 is specifically designed for HER2-positive breast cancer, which is the patient's condition.
- **Intervention:** The interventions in NCT04759248 are more targeted and suitable for HER2-positive breast cancer.
- **Fit:** The patient fits all inclusion and exclusion criteria perfectly for NCT04759248.
- **Comprehensive Treatment:** The interventional nature of NCT04759248 provides a more comprehensive treatment approach compared to the observational nature of NCT05417048.

**Score Calculation:**
- **NCT04759248:** 7/7
- **NCT05417048:** 5/7

Thus, NCT04759248 (ATREZZO) is the better-suited clinical trial for the patient based on the given criteria.
"""

display(Markdown(message_content))



To determine the most suitable clinical trials for the patient, we will first identify two clinical trials that are recruiting and match the patient's profile. We will then evaluate the inclusion and exclusion criteria for each trial and explain why each trial is suitable for the patient. Finally, we will compare the two trials and assign a score based on three criteria.

### Clinical Trial 1: NCT04759248 (ATREZZO)

**Overall Status:** RECRUITING  
**Start Date:** 2021-03-15  
**Primary Completion Date:** 2024-12-01  
**Conditions:** Breast Cancer  
**Interventions:** Atezolizumab + Trastuzumab + Vinorelbine  
**Study Type:** INTERVENTIONAL  
**Phase:** PHASE2  

**Inclusion Criteria:**
- Age ≥ 18 years
- Male or female (Premenopausal or Postmenopausal)
- Histologically confirmed HER2-positive breast cancer
- Measurable disease as per RECIST 1.1 criteria
- ECOG performance status 0-1

**Exclusion Criteria:**
- Prior treatment with anti-HER2 therapy
- Active or untreated CNS metastases
- History of autoimmune disease
- Concurrent use of other investigational agents

### Clinical Trial 2: NCT05417048

**Overall Status:** RECRUITING  
**Start Date:** 2022-07-14  
**Primary Completion Date:** 2023-12-15  
**Conditions:** Breast Cancer  
**Interventions:** Blood Sample  
**Study Type:** OBSERVATIONAL  
**Phase:** Not Available  

**Inclusion Criteria:**
- Patients ≥ 18 years old
- Histologically confirmed breast cancer
- No prior systemic therapy for metastatic disease
- ECOG performance status 0-2

**Exclusion Criteria:**
- Prior systemic therapy for metastatic disease
- Active infection requiring systemic therapy
- Pregnant or breastfeeding women
- Known HIV or hepatitis B/C infection

### Suitability Analysis

#### Clinical Trial 1: NCT04759248 (ATREZZO)

**Reasons for Suitability:**
1. **Age:** The patient is 42 years old, which meets the inclusion criterion of being ≥ 18 years old.
2. **Medical Condition:** The patient has HER2-positive breast cancer, which matches the trial's requirement for histologically confirmed HER2-positive breast cancer.
3. **Gender:** The trial includes both male and female patients, and the patient is female.
4. **Medical History:** The patient has not received prior anti-HER2 therapy, which aligns with the exclusion criteria.

**Inclusion Qualification:**
- The patient's age (42) qualifies her for inclusion.
- The patient's HER2-positive breast cancer condition qualifies her for inclusion.

**Exclusion Qualification:**
- The patient has no history of prior anti-HER2 therapy.
- The patient has no active or untreated CNS metastases.
- The patient has no history of autoimmune disease.
- The patient is not using other investigational agents.

#### Clinical Trial 2: NCT05417048

**Reasons for Suitability:**
1. **Age:** The patient is 42 years old, which meets the inclusion criterion of being ≥ 18 years old.
2. **Medical Condition:** The patient has histologically confirmed breast cancer, which matches the trial's requirement.
3. **Gender:** The trial includes both male and female patients, and the patient is female.
4. **Medical History:** The patient has not received prior systemic therapy for metastatic disease, which aligns with the exclusion criteria.

**Inclusion Qualification:**
- The patient's age (42) qualifies her for inclusion.
- The patient's histologically confirmed breast cancer condition qualifies her for inclusion.

**Exclusion Qualification:**
- The patient has no history of prior systemic therapy for metastatic disease.
- The patient has no active infection requiring systemic therapy.
- The patient is not pregnant or breastfeeding.
- The patient has no known HIV or hepatitis B/C infection.

### Comparison and Scoring

**Criteria for Scoring:**
1. **Relevance to Medical Condition (3 points):** How well the trial matches the patient's specific medical condition.
2. **Inclusion/Exclusion Criteria Fit (2 points):** How well the patient fits the inclusion and exclusion criteria.
3. **Intervention Suitability (2 points):** How suitable the intervention is for the patient's current treatment plan and medical history.

**Clinical Trial 1: NCT04759248 (ATREZZO)**
- **Relevance to Medical Condition:** 3/3 (Specific to HER2-positive breast cancer)
- **Inclusion/Exclusion Criteria Fit:** 2/2 (Patient fits all criteria)
- **Intervention Suitability:** 2/2 (Atezolizumab + Trastuzumab + Vinorelbine is suitable for HER2-positive breast cancer)

**Total Score:** 7/7

**Clinical Trial 2: NCT05417048**
- **Relevance to Medical Condition:** 2/3 (General breast cancer, not specific to HER2-positive)
- **Inclusion/Exclusion Criteria Fit:** 2/2 (Patient fits all criteria)
- **Intervention Suitability:** 1/2 (Observational study, less direct intervention)

**Total Score:** 5/7

### Conclusion

**Better Suited Clinical Trial:** NCT04759248 (ATREZZO)

**Reasoning:**
- **Specificity:** NCT04759248 is specifically designed for HER2-positive breast cancer, which is the patient's condition.
- **Intervention:** The interventions in NCT04759248 are more targeted and suitable for HER2-positive breast cancer.
- **Fit:** The patient fits all inclusion and exclusion criteria perfectly for NCT04759248.
- **Comprehensive Treatment:** The interventional nature of NCT04759248 provides a more comprehensive treatment approach compared to the observational nature of NCT05417048.

**Score Calculation:**
- **NCT04759248:** 7/7
- **NCT05417048:** 5/7

Thus, NCT04759248 (ATREZZO) is the better-suited clinical trial for the patient based on the given criteria.


**GPT-4 Based Approach: Patient #2**

In [None]:
# synthetic patient using provided topics in the TREK 2021. topic number 61, a breast cancer patient.
problem = "The patient is a 45-year-old postmenopausal woman with cytologically confirmed breast cancer. A core biopsy revealed a 3 cm invasive ductal breast carcinoma in the left upper outer quadrant. The tumor is HER2-positive and ER/PR negative. Axillary sampling revealed 5 positive lymph nodes. CXR was remarkable for metastatic lesions. The patient is using multivitamins and iron supplements. She does not smoke or consume alcohol. She is not sexually active and has no children. She is a candidate for tumor resection and agrees to do so prior to chemotherapy."

query = f"""Problem statement: based on the following information about a patient {problem}

Using the inclusion and exclusion criteria in {df}, can you find the NCT ID number of two clinical trials that have 'Overall Status' is 'RECRUITING' are most suited for the patient?

For each clinical trial, list the overall status, start date, completion date, conditions, interventions, type and phase.

For each clinical trial, list the inclusion criteria for that clinical trial including age, condition and medical history.

For each clinical trial, list the exclusion criteria for that clinical trial including age, condition and medical history.

Explain why each clinical trial is suitable for the patient using four reasons paying attention to patient age, medical conditions, and medical history. The reasons must consider the patient medical history, conditions, gender, age, lifestyle.

For each trial explain why the patient's medical conditions and age qualify the patient for inclusion within the clinical trial.

For each trial, explain why the patient was not excluded from the trial using the exclusion criteria and the data on the patient's medical history and conditions.

For the patient, use the trial inclusion and exclusion criteria and patient medical history to explain why one clinical trial is better suited for the patient than the other.

Then assign a seven point score based on three criteria, to each clinical trial to determine which trial is better suited to the patient and explain how the scores are calculated for each criteria.

"""

print(query)

llm.invoke(query)

Problem statement: based on the following information about a patient The patient is a 45-year-old postmenopausal woman with cytologically confirmed breast cancer. A core biopsy revealed a 3 cm invasive ductal breast carcinoma in the left upper outer quadrant. The tumor is HER2-positive and ER/PR negative. Axillary sampling revealed 5 positive lymph nodes. CXR was remarkable for metastatic lesions. The patient is using multivitamins and iron supplements. She does not smoke or consume alcohol. She is not sexually active and has no children. She is a candidate for tumor resection and agrees to do so prior to chemotherapy.

Using the inclusion and exclusion criteria in             NCT ID     Acronym Overall Status  Start Date  \
4      NCT04443348     Unknown     RECRUITING  2020-12-16   
5      NCT05417048     Unknown     RECRUITING  2022-07-14   
6      NCT04759248     ATREZZO     RECRUITING  2021-03-15   
13     NCT04009148     Unknown     RECRUITING  2019-03-01   
17     NCT02610426  

AIMessage(content="To determine the most suitable clinical trials for the patient, we need to analyze the inclusion and exclusion criteria of the trials that are currently recruiting and match them with the patient's medical history, conditions, gender, age, and lifestyle. Here are the details for two clinical trials that are most suited for the patient:\n\n### Clinical Trial 1: NCT04759248 (ATREZZO)\n\n#### Overall Status:\n- Recruiting\n\n#### Start Date:\n- 2021-03-15\n\n#### Primary Completion Date:\n- 2024-12-01\n\n#### Conditions:\n- Breast Cancer\n\n#### Interventions:\n- Atezolizumab + Trastuzumab + Vinorelbine\n\n#### Study Type:\n- Interventional\n\n#### Phase:\n- Phase 2\n\n#### Inclusion Criteria:\n- Age: Male or female (Premenopausal or Postmenopausal) ≥ 18 years old\n- Condition: HER2-positive breast cancer\n- Medical History: Must have measurable disease as per RECIST 1.1 criteria\n\n#### Exclusion Criteria:\n- Age: None specified\n- Condition: No active or untreated bra

In [None]:
from IPython.display import display, Markdown

message_content = """
To determine the most suitable clinical trials for the patient, we need to analyze the inclusion and exclusion criteria of the trials that are currently recruiting and match them with the patient's medical history, conditions, gender, age, and lifestyle. Here are the details for two clinical trials that are most suited for the patient:\n\n### Clinical Trial 1: NCT04759248 (ATREZZO)\n\n#### Overall Status:\n- Recruiting\n\n#### Start Date:\n- 2021-03-15\n\n#### Primary Completion Date:\n- 2024-12-01\n\n#### Conditions:\n- Breast Cancer\n\n#### Interventions:\n- Atezolizumab + Trastuzumab + Vinorelbine\n\n#### Study Type:\n- Interventional\n\n#### Phase:\n- Phase 2\n\n#### Inclusion Criteria:\n- Age: Male or female (Premenopausal or Postmenopausal) ≥ 18 years old\n- Condition: HER2-positive breast cancer\n- Medical History: Must have measurable disease as per RECIST 1.1 criteria\n\n#### Exclusion Criteria:\n- Age: None specified\n- Condition: No active or untreated brain metastases\n- Medical History: No history of autoimmune disease requiring systemic therapy in the past 2 years\n\n#### Suitability for the Patient:\n1. **Age**: The patient is 45 years old, which meets the age criterion of ≥ 18 years.\n2. **Medical Conditions**: The patient has HER2-positive breast cancer, which matches the condition requirement.\n3. **Medical History**: The patient has measurable disease as confirmed by the core biopsy and axillary sampling.\n4. **Lifestyle**: The patient does not have any lifestyle factors (e.g., smoking, alcohol) that would exclude her from the trial.\n\n#### Qualification for Inclusion:\n- The patient's age (45 years) qualifies her for inclusion.\n- The patient's HER2-positive breast cancer condition qualifies her for inclusion.\n\n#### Non-Exclusion:\n- The patient does not have active or untreated brain metastases.\n- The patient does not have a history of autoimmune disease requiring systemic therapy in the past 2 years.\n\n### Clinical Trial 2: NCT04656353\n\n#### Overall Status:\n- Recruiting\n\n#### Start Date:\n- 2020-10-01\n\n#### Primary Completion Date:\n- 2024-12-31\n\n#### Conditions:\n- Breast Cancer\n\n#### Interventions:\n- Intervention Arm, Control Group\n\n#### Study Type:\n- Interventional\n\n#### Phase:\n- Not Applicable\n\n#### Inclusion Criteria:\n- Age: Foreign-born Chinese women aged ≥ 18 years\n- Condition: Breast cancer\n- Medical History: Must be able to provide informed consent\n\n#### Exclusion Criteria:\n- Age: None specified\n- Condition: None specified\n- Medical History: None specified\n\n#### Suitability for the Patient:\n1. **Age**: The patient is 45 years old, which meets the age criterion of ≥ 18 years.\n2. **Medical Conditions**: The patient has breast cancer, which matches the condition requirement.\n3. **Medical History**: The patient can provide informed consent.\n4. **Lifestyle**: The patient does not have any lifestyle factors (e.g., smoking, alcohol) that would exclude her from the trial.\n\n#### Qualification for Inclusion:\n- The patient's age (45 years) qualifies her for inclusion.\n- The patient's breast cancer condition qualifies her for inclusion.\n\n#### Non-Exclusion:\n- The patient does not have any conditions or medical history that would exclude her from the trial.\n\n### Comparison and Suitability:\n\n#### Clinical Trial 1 (NCT04759248) vs. Clinical Trial 2 (NCT04656353):\n\n- **Specificity to HER2-positive Breast Cancer**: Clinical Trial 1 is specifically designed for HER2-positive breast cancer, which is the patient's condition. Clinical Trial 2 is more general and does not specify HER2 status.\n- **Intervention**: Clinical Trial 1 offers a targeted therapy (Atezolizumab + Trastuzumab + Vinorelbine) which is more suited for HER2-positive breast cancer. Clinical Trial 2 does not specify the intervention in detail.\n- **Inclusion Criteria**: Clinical Trial 1 has more specific inclusion criteria that match the patient's condition closely. Clinical Trial 2 has broader inclusion criteria.\n- **Exclusion Criteria**: Clinical Trial 1 has specific exclusion criteria that the patient does not meet, ensuring a more tailored approach. Clinical Trial 2 has no specific exclusion criteria.\n\n### Scoring:\n\n#### Criteria:\n1. **Specificity to Condition (0-3 points)**:\n   - Clinical Trial 1: 3 points (Specific to HER2-positive breast cancer)\n   - Clinical Trial 2: 1 point (General breast cancer)\n\n2. **Intervention Suitability (0-2 points)**:\n   - Clinical Trial 1: 2 points (Targeted therapy for HER2-positive)\n   - Clinical Trial 2: 1 point (General intervention)\n\n3. **Inclusion/Exclusion Criteria Match (0-2 points)**:\n   - Clinical Trial 1: 2 points (Specific and matched criteria)\n   - Clinical Trial 2: 1 point (Broad criteria)\n\n#### Total Score:\n- **Clinical Trial 1 (NCT04759248)**: 3 + 2 + 2 = 7 points\n- **Clinical Trial 2 (NCT04656353)**: 1 + 1 + 1 = 3 points\n\n### Conclusion:\nClinical Trial 1 (NCT04759248) is better suited for the patient based on the specificity to HER2-positive breast cancer, targeted intervention, and well-matched inclusion/exclusion criteria. The patient qualifies for this trial without being excluded based on her medical history and conditions.
"""

display(Markdown(message_content))



To determine the most suitable clinical trials for the patient, we need to analyze the inclusion and exclusion criteria of the trials that are currently recruiting and match them with the patient's medical history, conditions, gender, age, and lifestyle. Here are the details for two clinical trials that are most suited for the patient:

### Clinical Trial 1: NCT04759248 (ATREZZO)

#### Overall Status:
- Recruiting

#### Start Date:
- 2021-03-15

#### Primary Completion Date:
- 2024-12-01

#### Conditions:
- Breast Cancer

#### Interventions:
- Atezolizumab + Trastuzumab + Vinorelbine

#### Study Type:
- Interventional

#### Phase:
- Phase 2

#### Inclusion Criteria:
- Age: Male or female (Premenopausal or Postmenopausal) ≥ 18 years old
- Condition: HER2-positive breast cancer
- Medical History: Must have measurable disease as per RECIST 1.1 criteria

#### Exclusion Criteria:
- Age: None specified
- Condition: No active or untreated brain metastases
- Medical History: No history of autoimmune disease requiring systemic therapy in the past 2 years

#### Suitability for the Patient:
1. **Age**: The patient is 45 years old, which meets the age criterion of ≥ 18 years.
2. **Medical Conditions**: The patient has HER2-positive breast cancer, which matches the condition requirement.
3. **Medical History**: The patient has measurable disease as confirmed by the core biopsy and axillary sampling.
4. **Lifestyle**: The patient does not have any lifestyle factors (e.g., smoking, alcohol) that would exclude her from the trial.

#### Qualification for Inclusion:
- The patient's age (45 years) qualifies her for inclusion.
- The patient's HER2-positive breast cancer condition qualifies her for inclusion.

#### Non-Exclusion:
- The patient does not have active or untreated brain metastases.
- The patient does not have a history of autoimmune disease requiring systemic therapy in the past 2 years.

### Clinical Trial 2: NCT04656353

#### Overall Status:
- Recruiting

#### Start Date:
- 2020-10-01

#### Primary Completion Date:
- 2024-12-31

#### Conditions:
- Breast Cancer

#### Interventions:
- Intervention Arm, Control Group

#### Study Type:
- Interventional

#### Phase:
- Not Applicable

#### Inclusion Criteria:
- Age: Foreign-born Chinese women aged ≥ 18 years
- Condition: Breast cancer
- Medical History: Must be able to provide informed consent

#### Exclusion Criteria:
- Age: None specified
- Condition: None specified
- Medical History: None specified

#### Suitability for the Patient:
1. **Age**: The patient is 45 years old, which meets the age criterion of ≥ 18 years.
2. **Medical Conditions**: The patient has breast cancer, which matches the condition requirement.
3. **Medical History**: The patient can provide informed consent.
4. **Lifestyle**: The patient does not have any lifestyle factors (e.g., smoking, alcohol) that would exclude her from the trial.

#### Qualification for Inclusion:
- The patient's age (45 years) qualifies her for inclusion.
- The patient's breast cancer condition qualifies her for inclusion.

#### Non-Exclusion:
- The patient does not have any conditions or medical history that would exclude her from the trial.

### Comparison and Suitability:

#### Clinical Trial 1 (NCT04759248) vs. Clinical Trial 2 (NCT04656353):

- **Specificity to HER2-positive Breast Cancer**: Clinical Trial 1 is specifically designed for HER2-positive breast cancer, which is the patient's condition. Clinical Trial 2 is more general and does not specify HER2 status.
- **Intervention**: Clinical Trial 1 offers a targeted therapy (Atezolizumab + Trastuzumab + Vinorelbine) which is more suited for HER2-positive breast cancer. Clinical Trial 2 does not specify the intervention in detail.
- **Inclusion Criteria**: Clinical Trial 1 has more specific inclusion criteria that match the patient's condition closely. Clinical Trial 2 has broader inclusion criteria.
- **Exclusion Criteria**: Clinical Trial 1 has specific exclusion criteria that the patient does not meet, ensuring a more tailored approach. Clinical Trial 2 has no specific exclusion criteria.

### Scoring:

#### Criteria:
1. **Specificity to Condition (0-3 points)**:
   - Clinical Trial 1: 3 points (Specific to HER2-positive breast cancer)
   - Clinical Trial 2: 1 point (General breast cancer)

2. **Intervention Suitability (0-2 points)**:
   - Clinical Trial 1: 2 points (Targeted therapy for HER2-positive)
   - Clinical Trial 2: 1 point (General intervention)

3. **Inclusion/Exclusion Criteria Match (0-2 points)**:
   - Clinical Trial 1: 2 points (Specific and matched criteria)
   - Clinical Trial 2: 1 point (Broad criteria)

#### Total Score:
- **Clinical Trial 1 (NCT04759248)**: 3 + 2 + 2 = 7 points
- **Clinical Trial 2 (NCT04656353)**: 1 + 1 + 1 = 3 points

### Conclusion:
Clinical Trial 1 (NCT04759248) is better suited for the patient based on the specificity to HER2-positive breast cancer, targeted intervention, and well-matched inclusion/exclusion criteria. The patient qualifies for this trial without being excluded based on her medical history and conditions.


**Evaluations (Work in Progress)**

**MVP Phase Evaluation Metrics (that we will be using):**

In the MVP phase, in the interest of time, we are only going to compare different methods on accuracy (True positive rate) alone. Ground truth is the human expert consensus by 3 human experts (teammates).

We use 10 synthesia mass patients and match a maximum of 10 candidate trials per patient, we will ask LLM to rank the candidate trials, then human expert evaluation of the matching result. Human experts will evaluate and give binary determinations with comments.(checking clinicaltrial.gov manually using the LLM output NCT number and determine whether it's correct vs. incorrect match)


**Aspirational Evaluations done by benchmarking paper (not used)**

Jin et. al. Paper

1. Criterion Level Eligibility:

Included
Not included (why? discuss)
Excluded
Not excluded (why? discuss)
Not enough information
Not applicable

Three human experts, consensus as ground truth vs. LLM

2. Finding and outputting relavent patient sentences of criterion-level eligibility

Three human experts, consensus as ground truth vs. LLM

3. Precision/Recall/F1

Three human experts vs. LLM, with expert consensus score as ground truth.
Make confusion matrices.

4. Jin et. al found that TrialGPT(ChatGTP-4) mostly makes reasoning type errors. They identified 4 major types of errors:

(E1) Incorrect reasoning where TrialGPT predicts "not enough information" but matching result can be implicitly inferred (30.7% of all errors)
(E2) Lack of medical knowledge "A" is "B" or "B" is a type of "A" (15.4% of all errors)
(E3) Ambiguous label definitions where TrialGPT predicts "not enough information" (26.9% of all errors)
(E4) Other unclassified errors (rest of the errors)

5. Trial Level Scores for Ranking

Aggregate criterion level predictions into trial-level scores
Two methods: Linear aggregation and LLM aggregation
Linear aggregation, 6 different metrics
LLM aggregation of general relavance score (0-100)

6. Compare LLM against other SOTA models on ranking candidate clincial trials and excluding ineligible trials for a given patient
Normalized Discounted Cumulative Gain at rank 10 (NDCG@10),
Precision at rank 10 (P@10),
and Area Under the Receiver Operating Characteristic curve (AUROC)

dual-encoder,cross-encoder, and encoder-decoder models models trained on different biomedical and clinical natural language inference (NLI)30 datasets.

The best baseline for ranking clinical trials is the cross-encoder BioLinkBERT31 trained on MedNLI32, which achieves the NDCG@10 of 0.5558
and the P@10 of 0.4663.

The most effective features of TrialGPT for ranking are the LLM aggregated
scores. They achieve NDCG@10 of 0.7339 (by Relevance) and P@10 of 0.5660
(by Eligibility), which are much higher than other aggregations. Combining both linear and LLM aggregations yields the highest-ranking performance, with the NDCG@10 of 0.8165 and the P@10 of 0.7328.



**References:**

J4NN0. (2024). LLM-RAG. Github. Accessed on July 3, 2024, Available from: https://github.com/J4NN0/llm-rag

Teo, Sheilia. (2023). How I Won Singapore's GPT-4 Prompt Engineering Competition. Accessed on July 3, 2024. Available from: https://medium.com/p/34c195a93d41

What is a Rest API? (2021). IBM Technology. Accessed on July 3, 2024, Available from: https://www.youtube.com/watch?v=lsMQRaeKNDk

Mitre Corporation. (2024). Synthea. Accessed on https://synthea.mitre.org/ and https://github.com/synthetichealth/synthea

OpenAI et. al. (2023). GPT-4 Technical Report. https://arxiv.org/abs/2303.08774

OpenAI. (2024). Hello GPT-4o. https://openai.com/index/hello-gpt-4o/
Luo et. al. (2022). BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. https://arxiv.org/abs/2210.10341
Clinicaltrials.gov.

NIH: User Guide for Clinicaltrials.gov Website: https://clinicaltrials.gov/submit-studies/prs-help/user-guide#intro

Jin, Q., Wang, Z., Floudas, C. S., Chen, F., Gong, C., Bracken-Clarke, D., ... & Lu, Z. (2023). Matching patients to clinical trials with large language models. ArXiv.arXiv:2307.15051v4. Accessed on August 2, 2024. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418514/
Koopman, B., & Zuccon, G. (2016, July). A test collection for matching patients to clinical trials. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (pp. 669-672).

Msv, J. (2023, July 21). Tutorial: Build a Q&A bot for Academy Awards based on ChatGPT. The New Stack. https://thenewstack.io/tutorial-build-a-qa-bot-for-academy-awards-based-on-chatgpt/
Stack OverFlow, Accessed on August 4th 2024. https://stackoverflow.com/questions/78415818/how-to-get-full-results-with-clinicaltrials-gov-api-in-python

Roberts, K., Demner-Fushman, D., Voorhees, E.M., Bedrick, S. & Hersh, W.R. Overview of the TREC 2021 Clinical Trials Track. in Proceedings of the Thirtieth Text REtrieval Conference (TREC 2021) (2021).

Walonoski, J., Kramer, M., Nichols, J., Quina, A., Moesel, C., Hall, D., ... & McLachlan, S. (2018). Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. Journal of the American Medical Informatics Association, 25(3), 230-238.

Ceylan, B., & Özerdoğan, N. (2015). Factors affecting age of onset of menopause and determination of quality of life in menopause. Turkish journal of obstetrics and gynecology, 12(1), 43–49. https://doi.org/10.4274/tjod.79836
Swanner, K. D., & Richmond, L. B. (2023). A 65-Year-Old Woman With No Menopause History: A Case Report. Cureus, 15(9), e44792. https://doi.org/10.7759/cureus.44792

