# Prompting for Classification Tutorial by Yichuan v25.4.25

This notebook provides a minimal working example demonstrating how to use prompts for classifying research papers into six different categories.  
The goal is to show how prompt engineering and supporting tools can be combined to build an efficient classification workflow.

This tutorial is designed for developers who want a quick start in prompt-based classification methods.


## Environment Setup

In [1]:
import os
from google.colab import drive

if not os.path.ismount('/content/drive'):
    drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Github login and clone
!cp -r /content/drive/MyDrive/.ssh/ ~/
!ls ~/.ssh/ -a
!ssh -T git@github.com
!git clone git@github.com:Yichuan0712/MedSort.git

.  ..  github_id_rsa  hellbender_id_rsa  id_rsa  id_rsa.pub  known_hosts
Hi Yichuan0712! You've successfully authenticated, but GitHub does not provide shell access.
Cloning into 'MedSort'...
remote: Enumerating objects: 3174, done.[K
remote: Counting objects: 100% (349/349), done.[K
remote: Compressing objects: 100% (235/235), done.[K
remote: Total 3174 (delta 188), reused 267 (delta 109), pack-reused 2825 (from 2)[K
Receiving objects: 100% (3174/3174), 640.65 MiB | 20.90 MiB/s, done.
Resolving deltas: 100% (1756/1756), done.
Updating files: 100% (1601/1601), done.


In [4]:
!cp /content/drive/MyDrive/OSU/xiaofu_gpt.env /content/.env

In [8]:
!pip install python-dotenv -q
!pip install langchain-openai -q

In [9]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [12]:
# Test Azure GPT4o API
from langchain_openai import AzureChatOpenAI, ChatOpenAI
from langchain_core.messages import HumanMessage

client_4o = AzureChatOpenAI(
    api_key=os.environ.get("OPENAI_API_KEY", None),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT", None),
    api_version=os.environ.get("OPENAI_API_VERSION", None),
    azure_deployment=os.environ.get("OPENAI_DEPLOYMENT_NAME", None),
    model=os.environ.get("OPENAI_MODEL", None),
    max_retries=5,
    temperature=0.0,
    max_tokens=os.environ.get("OPENAI_MAX_OUTPUT_TOKENS", 4096),
    top_p=0.95,
    frequency_penalty=0,
    presence_penalty=0,
)

messages = ["Hello", "How are you"]
prompt_list = [HumanMessage(content=msg) for msg in messages]

res = client_4o.generate(messages=[prompt_list])
res.generations[0][0].text

"Hello! I'm just a virtual assistant, so I don't have feelings, but I'm here and ready to help you with whatever you need. How can I assist you today? 😊"

## Dataset Preparation

"Calibration" refers to the manual verification and correction of inconsistent prediction results produced by the **BioBERT** model.


In [15]:
# Dataset before the calibration: '_bef'
import pandas as pd

CATEGORY_list = ['CT', 'PE', 'PK', 'VC', 'FBNSTP', 'Biomarker']
category_list = [item.lower() for item in CATEGORY_list]

for cat_upper, cat_lower in zip(CATEGORY_list, category_list):
    df = pd.read_csv(f'/content/MedSort/manual_prompt_tuning/week4/before_calibration/final/{cat_lower}_no_calibration.tsv', sep='\t')
    df = df[['PMID', 'title', 'abstract', 'MeSH', cat_upper]].rename(columns={cat_upper: 'label'})
    output_path = f'{cat_lower}_bef.tsv'
    df.to_csv(output_path, sep='\t', index=False)
    print(f"Saved {output_path} with {len(df)} rows.")

Saved ct_bef.tsv with 2625 rows.
Saved pe_bef.tsv with 3257 rows.
Saved pk_bef.tsv with 2409 rows.
Saved vc_bef.tsv with 2153 rows.
Saved fbnstp_bef.tsv with 2404 rows.
Saved biomarker_bef.tsv with 2587 rows.


In [16]:
# Dataset after the calibration: '_aft'
import pandas as pd

CATEGORY_list = ['CT', 'PE', 'PK', 'VC', 'FBNSTP', 'Biomarker']
category_list = [item.lower() for item in CATEGORY_list]

for cat_upper, cat_lower in zip(CATEGORY_list, category_list):
    df = pd.read_csv(f'/content/MedSort/manual_prompt_tuning/week4/after_calibration/{cat_lower}_step2.tsv', sep='\t')
    df = df[['PMID', 'title', 'abstract', 'MeSH', cat_upper]].rename(columns={cat_upper: 'label'})
    output_path = f'{cat_lower}_aft.tsv'
    df.to_csv(output_path, sep='\t', index=False)
    print(f"Saved {output_path} with {len(df)} rows.")

Saved ct_aft.tsv with 2588 rows.
Saved pe_aft.tsv with 3244 rows.
Saved pk_aft.tsv with 2365 rows.
Saved vc_aft.tsv with 2117 rows.
Saved fbnstp_aft.tsv with 2379 rows.
Saved biomarker_aft.tsv with 2550 rows.


## Prompts Preparation
'_questionnaire' & '_shots'

#### CT
No shots are needed in v25.4.25.

In [17]:
ct_questionnaire = """
Title: {title}
Abstract: {abstract}
MeSH Terms: {mesh}

Please answer the following two questions with True or False:

**Question 1:** Should this article be included based on my inclusion or exclusion criteria?
The paper should be excluded if it meets **any** exclusion criteria or if it fails to meet **either** inclusion criterion.
**Inclusion criteria:**
A. Research participants should be human maternal, pediatric, or adolescents patients, or the abstract should directly measure tissue samples from maternal/pediatric patients.
We define maternal patients as:
1) Patients in stages between the onset of pregnancy to the end of lactation or pregnancy termination; or
2) Patients undergoing IVF treatment or other infertility treatments.
We define pediatric patients as patients under 18 years old.

**Exclusion criteria:**
A. The paper is a review or meta-analysis.
B. The paper focuses exclusively on applying drugs to tissues/cells isolated from the human body.
C. The paper focuses exclusively on animal studies or samples and does not involve humans.
D. The paper is a health system usability or health policy study on medication usage.

**Question 2:** Does this study fall under the definition of an vaccine or cell therapy study?
Does this study involve vaccines or cell therapy?
This includes not only conventional cell therapy but also any therapeutic procedure that uses human cells for therapeutic purposes, excluding transplantation. Additionally, epidemiological studies examining the use, prescription patterns, or frequency of vaccines or cell therapy in maternal or pediatric populations are also included under this category.
**Note:**
1) Procedures such as stem cell rescue and the infusion of viable cells into the human body (including components like red blood cells or platelet transfusion) may be broadly considered forms of cell therapy, as they involve the therapeutic use of living cells. Therefore, plasma transfusion is considered as a kind of cell therapy.
2) If the study includes manipulation of sperm, oocytes, or fertilized embryos (e.g., in vitro fertilization or intracytoplasmic sperm injection), it may also be considered a form of cell therapy, as it involves the therapeutic use of human cells outside the body to achieve a medical outcome.

**Return Format Example:**
{
  "Question 1": True or False,
  "Question 2": True or False
}
"""

ct_shots = None

# ct_shots = """
# To help you understand how to answer, I will provide a few examples.

# **Example 1:**

# Title: Economic value to parents of reducing the pain and emotional distress of childhood vaccine injections.
# Abstract: One Reason That Recommended Childhood Immunizations Due At Child Health Visits Are Deferred Is To Avoid The Pain And Emotional Distress Associated With The Increasing Number Of Injections Required. This Deferral Leads To Additional Visits And Costs And Reduced Immunoprotection Against Vaccine-Preventable Illnesses. To Assess The Economic Value Of Combination Vaccines That Address This Problem, We Surveyed Parents To Determine The Amount They Would Be Willing To Pay To Avoid The Pain And Emotional Distress Experienced By Their Infants From Injections. A Self-Administered Questionnaire Was Completed Within 24 H Of The Vaccinations By 294 Parents Of Children Ages 11/2 To 7 Months Receiving Vaccine Injections At 26 Outpatient Child Health Centers. The Willingness-To-Pay (Wtp) Method Was Used To Estimate The Intangible Cost Of The Pain And Emotional Distress Of The 1 To 4 Injections Their Child Had Received. Parents Were Asked How Much Of Their Own Money They Would Have Paid To Avoid These Injections, Without Any Compromise In The Safety And Efficacy Of The Vaccinations. Wide Variations In Wtp Amounts Were Observed, Ranging From Median Values Of $10 To $25 And Average Values Of $57.06 To $79.28 To Avoid The Pain And Emotional Distress Associated With Eliminating All Injections At Visits In Which One To Four Injections Were Administered. Parents Placed Greater Value On Reductions That Avoided All Injections Than On Reductions That Avoided Only Some Injections. Overall The Median Cost Per Injection Avoided Was $8.14, And The Mean Was $30.28. Parents Have Strong Preferences For Limiting Vaccine Injections. The Economic Cost Of The Pain And Distress Associated With Such Injections, Reflected In The Amounts They Report They Would Be Willing To Pay To Avoid Them, Represents A Substantial Component Of The Cost Of Disease Control Through Immunization.
# MeSH Terms:  ||| Adult ||| Cost-Benefit Analysis ||| Female ||| Health Care Surveys ||| Humans ||| Immunization Schedule ||| Infant ||| Injections ||| Male ||| Pain ||| Parents ||| Stress, Psychological ||| Surveys And Questionnaires ||| Vaccination ||| Vaccines, Combined/Adverse Effects/Economics/Economics/Etiology/Economics/Etiology/Adverse Effects/Economics/Psychology/Administration & Dosage/Economics

# Analysis:
# Question 1: This study should be included because it involves pediatric patients and does not meet any exclusion criteria.
# Question 2: However, it should not be classified as a vaccine or cell therapy study, as it focuses on cost-benefit analysis related to parental preferences rather than evaluating the use, efficacy, or frequency of vaccines or cell therapies.

# Answer:
# {
#   "Question 1": True,
#   "Question 2": False
# }

# **Example 2:**

# Title: Graft rejection and hyperacute graft-versus-host disease in stem cell transplantation from non-inherited maternal-antigen-complementary HLA-mismatched siblings.
# Abstract: Human Leukocyte Antigen (Hla)-Mismatched Stem Cell Transplantation From Non-Inherited Maternal Antigen (Nima)-Complementary Donors Is Known To Produce Stable Engraftment Without Inducing Severe Graft-Versus-Host Disease (Gvhd). We Treated Two Patients With Acute Myeloid Leukemia (Aml) And One Patient With Severe Aplastic Anemia (Saa) With Hla-Mismatched Stem Cell Transplantation (Sct) From Nima-Complementary Donors (Nima-Mismatched Sct). The Presence Of Donor And Recipient-Derived Blood Cells In The Peripheral Blood Of Recipient (Donor Microchimerism) And Donor Was Documented Respectively By Amplifying Nima-Derived Dna In Two Of The Three Patients. Graft Rejection Occurred In The Saa Patient Who Was Conditioned With A Fludarabine-Based Regimen. Grade Iii And Grade Iv Acute Gvhd Developed In Patients With Aml On Day 8 And Day 11 Respectively, And Became A Direct Cause Of Death In One Patient. The Findings Suggest That Intensive Conditioning And Immunosuppression After Stem Cell Transplantation Are Needed In Nima-Mismatched Sct Even If Donor And Recipient Microchimerisms Is Detectable In The Donor And Recipient Before Sct.
# MeSH Terms:   ||| Acute Disease ||| Adolescent ||| Adult ||| Anemia, Aplastic ||| Blast Crisis ||| Chimera ||| Cord Blood Stem Cell Transplantation ||| Disease Progression ||| Fatal Outcome ||| Female ||| Graft Rejection ||| Graft Vs Host Disease ||| Hla Antigens ||| Histocompatibility ||| Humans ||| Immunity, Maternally-Acquired ||| Isoantigens ||| Leukemia, Myelogenous, Chronic, Bcr-Abl Positive ||| Leukemia, Myeloid ||| Male ||| Peripheral Blood Stem Cell Transplantation ||| Precursor Cell Lymphoblastic Leukemia-Lymphoma ||| Remission Induction ||| Siblings ||| Tissue Donors ||| Transplantation Conditioning ||| Vidarabine ||| Immunology ||| Pathology ||| Surgery ||| Immunology ||| Pathology ||| Surgery ||| Genetics ||| Immunology ||| Genetics ||| Immunology ||| Genetics ||| Immunology ||| Prevention & Control ||| Genetics ||| Immunology ||| Immunology ||| Immunology ||| Pathology ||| Surgery ||| Drug Therapy ||| Pathology ||| Adverse Effects ||| Immunology ||| Pathology ||| Surgery ||| Methods ||| Administration & Dosage ||| Analogs & Derivatives

# Analysis:
# Question 1: This study should be excluded because it does not clearly include pediatric or adolescent patients, which is a requirement for inclusion. Although "adolescent" appears in the MeSH terms, the abstract does not specify whether any of the actual study participants fall within this group.
# Question 2: However, it should be classified as a cell therapy study, as it involves stem cell transplantation, which fits the definition of therapeutic use of human cells.

# Answer:
# {
#   "Question 1": False,
#   "Question 2": True
# }

# **Example 3:**

# Title: Effects of early maternal cancer and fertility treatment on the risk of adverse birth outcomes.
# Abstract: Early Maternal Cancer And Fertility Treatment Each Increase The Risk For Adverse Birth Outcomes, But The Joint Effect Of These Outcomes Has Not Yet Been Reported. Thus, The Aim Was To Assess The Individual And Joint Effect Of Maternal Cancer And Fertility Treatment On The Risk For Adverse Birth Outcomes. This Population-Based Cohort Study Included 5487 Live-Born Singletons Identified In The Danish Medical Birth Register (1994-2016) Of Mothers With Previous Cancer (<40 Years) Recorded In The Danish Cancer Registry (1955-2014). We Randomly Selected 80,262 Live-Born Singletons Of Mothers With No Cancer <40 Years Matched To Mothers With Cancer By Birth Year And Month. We Calculated Odds Ratios (Ors) For Preterm Birth, Low Birth Weight (Lbw) (<2500\xa0G) And Small For Gestational Age (Sga), Mean Differences In Birth Weight In Grams, And Additional Cases Of Preterm Birth (Gestational Age<259 Days) Per 100,000 Person-Years. Multiplicative And Additive Interaction Of Maternal Cancer And Fertility Treatment Was Compared With Outcomes Of Children Conceived Naturally To Mothers With No Maternal Cancer (Reference Group). Among 84,332 Live-Born Singletons, Increased Ors For Preterm Birth Were Observed Among Children Born To Mothers With Previous Cancer (1路48, 95% Confidence Interval [Ci] 1路33-1.65) Or After Fertility Treatment (1路43, 95% 1路28-1-61), With 22 Additional Cases Of Preterm Birth Among Both Group Of Children (95% Ci 15-29; 95% Ci 14-30). In The Joint Analyses, The Or For Sga For Children Born After Fertility Treatment To Mothers With Previous Cancer Was Similar To That Of The Reference Group (Or 1路02, 95% Ci 0路72-1路44, P For Interaction=0路52). Children With Both Exposures Had Increased Ors For Lbw (1路86, 95% Ci 1路17-2路96, P For Interaction=0路06) And Preterm Birth (2路31, 955 Ci 1路66-3路20, P For Interaction\xa0=\xa00路56), With 61 Additional Cases Of Preterm Birth (95% Ci 27-95, P For Interaction=0.26) Over That Of Children In The Reference Group. The Mean Birth Weight Was Also Lower In Children Born To Mothers With Both Exposures (-140\xa0G, 95% Ci -215; -65) (P For Interaction=0.06) But Decreased To -22\xa0G (95% Ci -76; 31) After Adjustment For Ga. Although We Did Not Find Any Statistically Significant Additive Interaction Between Maternal Cancer And Fertility Treatment, Children Born After Fertility Treatment Of Mothers With Previous Cancer Were At Increased Risk For Adverse Birth Outcomes. Thus, Pregnant Women With Both Exposures Need Close Follow-Up During Pregnancy. The Danish Cancer Society And The Danish Childhood Cancer Foundation.
# MeSH Terms: No MeSH Term.

# Analysis:
# Question 1: This study should be included because it involves maternal patients who underwent fertility treatment and assesses outcomes in their children, satisfying the inclusion criteria. It does not meet any exclusion criteria.
# Question 2: However, it should not be classified as a vaccine or cell therapy study, as the specific methods of fertility treatment—such as IVF or embryo manipulation—are not detailed in the abstract.

# Answer:
# {
#   "Question 1": True,
#   "Question 2": False
# }

# **Example 4:**

# Title: Delayed presentation of severe combined immunodeficiency due to prolonged maternal T cell engraftment.
# Abstract: Severe Combined Immunodeficiency (Scid) Is A Primary Immunodeficiency Disorder With Heterogenous Genetic Etiologies. We Describe A Typical Case In A 9-Year-Old Boy That Was Masked By A Clinically Functional Maternal T Cell Engraftment Leading To Late Presentation With Pneumocystis Jiroveci Pneumonia And Cytomegalovirus Infection, Probably Following Exhaustion Of Maternally Engrafted Cells. Based On Immunological Findings, He Had A T- B+Scid Phenotype.This Report Suggests That In Rare Cases, Engrafted Maternal T Cell Might Persist For Long Time Leading To Partial Constitution Of Immune Function And Delayed Clinical Presentation Of Scid.
# MeSH Terms:  ||| Child ||| Cytomegalovirus Infections ||| Female ||| Humans ||| Lymphocyte Activation ||| Phenotype ||| Pneumocystis Carinii ||| Pneumonia, Pneumocystis ||| Pregnancy ||| Prenatal Exposure Delayed Effects ||| Severe Combined Immunodeficiency ||| T-Lymphocytes ||| Time Factors ||| X-Linked Combined Immunodeficiency Diseases ||| Immunology ||| Isolation & Purification ||| Diagnosis ||| Etiology ||| Microbiology ||| Diagnosis ||| Etiology ||| Genetics ||| Immunology ||| Diagnosis ||| Etiology ||| Genetics

# Analysis:
# Question 1: This study should be included because it involves a pediatric patient and does not meet any exclusion criteria.
# Question 2: However, it should not be classified as a cell therapy study, as the maternal T cells entered the fetus naturally through the placenta during pregnancy. This passive transfer is a biological phenomenon, not a therapeutic intervention involving the intentional use of human cells.
# Answer:
# {
#   "Question 1": True,
#   "Question 2": False
# }
# """

#### PE

In [18]:
pe_questionnaire = """
Title: {title}
Abstract: {abstract}
MeSH Terms: {mesh}

Please answer the following two questions with True or False:

**Question 1:** Should this article be included based on my inclusion or exclusion criteria?
The paper should be excluded if it meets **any** exclusion criteria or if it fails to meet **either** inclusion criterion.
**Inclusion criteria:**
A. Research participants should be human maternal, pediatric, or adolescents patients, or the abstract should directly measure tissue samples from maternal/pediatric patients.
We define maternal patients as:
1) Patients in stages between the onset of pregnancy to the end of lactation or pregnancy termination; or
2) Patients undergoing treatment for infertility or restoration of reproductive function (e.g., ovulation, menstruation) may be considered maternal even if not currently pregnant.
3) Mothers of dependent children, including postpartum mothers, particularly when the study examines maternal behaviors, psychological or physiological conditions, or substance use in the context of caregiving or child outcomes.
We define pediatric patients as patients under 18 years old.
A study may be included if any subgroup of participants meets the maternal or pediatric definitions, even if the study also includes other populations.

**Exclusion criteria:**
A. The paper is a review or meta-analysis.
B. The paper focuses exclusively on applying drugs to tissues/cells isolated from the human body.
C. The paper focuses exclusively on animal studies or samples and does not involve humans.
D. The paper is a health system usability or health policy study on medication usage.

**Question 2:** Does this paper primarily or partly study the drug use, drug effectiveness, or drug safety of maternal or pediatric patients?

**A drug** is defined as a substance that is **administered or used by participants** to **prevent, treat, or cure diseases or medical conditions**. This definition includes:
- **Prescribed medications**
- **Non-prescription drugs**
- **Biologics**, such as **interferons** and **fusion proteins**
- **Substances of abuse**, including **opioids**, **stimulants**, **cannabis**, etc.
- **Drugs used in medical procedures**, for example, **those used in the induction of labor**

**Note:** This definition **excludes vaccines**.

This question includes the following study types:
- Studies evaluating **drug effectiveness, safety, or outcomes** in maternal or pediatric patients;
- **Epidemiological surveys** on medication prescriptions in maternal or pediatric patients;
- Studies reporting **medication use frequency** in maternal or pediatric populations;
- Studies that make **direct reference to drug/substance use** in these populations.

This question **excludes**:
- Studies that only generate evidence from **in vitro (test tube/lab) experiments**;
- Studies that are already categorized as **clinical trials of drugs** (those are captured elsewhere).

The pharmacological evaluation must pertain to **maternal or pediatric populations** as defined earlier.

**Return Format Example:**
{
  "Question 1": True or False,
  "Question 2": True or False
}
"""

pe_shots = """
To help you understand how to answer, I will provide a few examples.

**Example 1:**

Title: Early Onset Sepsis.
Abstract: Early Onset Sepsis (Eos) Is A Worrisome, Life-Threatening Condition In Newborns With Onset During The First Week Of Life. Evaluation Can Be Challenging Due To The Dynamic Nature Of The Condition As The Infant Transitions To Life Ex-Utero. Symptoms/Signs Can Be Nonspecific, Thus, A High Index Of Suspicion Is Warranted For Subtle Changes In Condition Including Poor Feeding, Respiratory Distress, Or Decreased Activity. Common Risk Factors Include Chorioamnionitis, Maternal Fever, Group B Strep (Gbs) Colonization And Preterm Delivery. Despite Universal Screening And Intrapartum Antibiotic Prophylaxis (Iap), Gbs Remains The Most Frequent Cause Of Eos Followed By Escherichia Coli (E. Coli). While The Gold Standard For Diagnosis Remains A Positive Blood Culture, Lab Evaluation Frequently Involves Complete Blood Count (Cbc) With Differential, C-Reactive Protein (Crp), And Evaluation Of Spinal Fluid If The Infant Is Stable. Unfortunately, There Is Not A Lab Test That Is Rapidly Diagnostic For Sepsis, So Treatment Should Be Empirically Started Until It Is Clear That The Infant Is Not Infected. Treatment Often Includes Ampicillin And Gentamicin For Coverage Of The Most Frequent Pathogens. There Is Much Debate About Timing Of Discontinuation Of Antibiotics. Frequently, Antibiotics Can Be Discontinued After 48 Hours In Well Appearing, Asymptomatic Infants With Negative Blood Cultures And Either Normal Cbc Analysis Or Normal Crp Values.
MeSH Terms: ||| Antibiotic Prophylaxis ||| Global Health ||| Humans ||| Incidence ||| Infant, Newborn ||| Risk Factors ||| Sepsis ||| Methods ||| Epidemiology ||| Etiology ||| Prevention & Control

Analysis:
This article should not be included because it is a review article. The abstract summarizes and discusses early onset sepsis (EOS) in newborns, including clinical presentation, risk factors, diagnostic methods, and treatment strategies, but does not present original research, specific study participants, or data collection. This meets exclusion criterion A: The paper is a review or meta-analysis.

Answer:
{
  "Question 1": False,
  "Question 2": False
}

**Example 2:**

Title: Maternal substance abuse: protecting the child, preserving the family.
Abstract: Maternal Substance Abuse Has Sparked An Intense Debate Among Child Welfare Practitioners And Policymakers. At Issue Is How Best To Work Effectively With Substance-Abusing Mothers While Protecting Children From Harm That Might Result From The Parents' Addiction And Preserving A Family In Which The Child Can Be Reared. This Article Evaluates The Scope Of The Problem, Its Impact On The Delivery Of Child Welfare Services, And Implications For Service Provision.
MeSH Terms:   ||| Child ||| Child Welfare ||| Family ||| Female ||| Humans ||| Maternal Behavior ||| Mothers ||| Substance-Related Disorders ||| Psychology ||| Psychology ||| Psychology

Analysis:
Although it discusses maternal substance abuse, it does not specify any particular drug or assess the use, effectiveness, or safety of a drug in maternal or pediatric populations.
The focus is more on mental health, psychological aspects, and child welfare policy, rather than on pharmacological outcomes or drug use patterns.
Therefore, it does not meet the inclusion criteria, and it does not qualify as a study of drug use in maternal or pediatric patients.

Answer:
{
  "Question 1": False,
  "Question 2": False
}

**Example 3:**

Title: [Antibiotic sensitivity of pneumonia pathogens in newborns and problems of antibacterial therapy of the pathologic process].
Abstract: The Results Of The Bacteriological Investigation Of The Secretion From The Trachea, Large Bronchi And Fauces Of 36 Newborns (Including 27 Preterms) With Severe Pneumonia Were Analyzed. 20 Of Them Were Born Of Women With Complicating Somatic, Obstetric And Gynecologic Histories: Candidiasis, Herpes Genitalis, Chronic Endometritis, Adnexitis Or Chronic Pyelonephritis That Could Be The Risk Of The Fetus Intranatal Infection. During The Acute Period Of Pneumonia In The Newborns Within The First 4-8 Days Of Life Mainly Pseudomonas Aeruginosa Was Isolated (51.3 Per Cent), Staphylococcus Epidermidis, S. Haemolyticus And Enterococcus Faecalis Were Less Frequent (18.9, 8.1 And 5.4 Per Cent, Respectively). Klebsiella Pneumoniae, Streptococcus Anhaemolyticus And Other Organisms Were Extremely Rare. On The Whole The Gramnegative Microflora Predominated. The Study Of The Antibiotic Susceptibility Showed That The Majority Of The P. Aeruginosa Isolates Were Susceptible To Amikacin And Polymyxin B, The Isolates Susceptible To Ceftazidime Were Less Frequent, 20-25 Per Cent Of The Isolates Were Susceptible To Ciprofloxacin, Cefoperazone And Imipenem And Practically No Isolates Were Susceptible To Gentamicin. The S.Epidermidis Isolates Were Susceptible To Rifampicin And Vancomycin And In Rare Cases To Fusidin And Amikacin And Resistant To Oxacillin. When The Treatment Course Was More Than 15 Days, The Isolates Proved To Be Susceptible To 1/3 Of The Presently Available Antibiotics. Because Of The Host Low Protective Forces, Peculiarities Of The Infection Pathways And High Frequency Of The Resistant Strains It Is Valid To Include Netilmicin, Imipenem, Cefoperazone And Ceftriaxone To The Complex Therapy Of The Newborns Along With The Substitution Immunotherapy.
MeSH Terms: ||| Female ||| Humans ||| Infant, Newborn ||| Microbial Sensitivity Tests ||| Pneumonia, Bacterial ||| Respiratory System ||| Species Specificity ||| Drug Therapy ||| Microbiology ||| Drug Effects ||| Metabolism ||| Microbiology

Analysis:
Question 1: Should this article be included?
False — because the study is exclusively in vitro (testing antibiotic sensitivity of pathogens isolated from newborns), it fails to meet the inclusion criterion that requires studying human participants or their tissue samples in a way that links to treatment or outcomes.
It also meets Exclusion Criterion B: "The paper focuses exclusively on applying drugs to tissues/cells isolated from the human body."
Question 2: Does it study drug use, effectiveness, or safety in maternal/pediatric patients?
False — even though it deals with antibiotics, the study does not involve actual administration to patients, nor does it assess drug safety, use, or clinical effectiveness in those patients.

Answer:
{
  "Question 1": False,
  "Question 2": False
}
"""

#### PK

In [19]:
pk_questionnaire = """
Title: {title}
Abstract: {abstract}
MeSH Terms: {mesh}

---

You are evaluating whether a medical article should be included in a dataset based on very specific inclusion and exclusion criteria.

Please read the article carefully and answer the following two questions **strictly following the rules below**.

---

## **Question 1: Should this article be included?**
**Return `True`** if the article meets **all inclusion criteria** and does **not meet any exclusion criteria**.
**Return `False`** otherwise.

### Inclusion Criteria:
The article **must meet at least one** of the following:

1. The study includes **human maternal, pediatric, or adolescent participants** (as defined below); **OR**
2. The study directly measures **tissue samples from maternal or pediatric patients**.

### Definitions:
- **Maternal patients** are defined as:
  - Individuals who are **pregnant**, **postpartum**, **lactating**, or undergoing **pregnancy termination**.
  - Patients undergoing **IVF** or **other infertility treatments**.
  - **Patients receiving treatment involving the uterus, ovaries, or fallopian tubes**, in a context consistent with **reproductive health or infertility**.
  - **Healthy non-postpartum women** who receive **hormonal treatment (e.g., prolactin)** to induce lactation, **or who develop drug-induced galactorrhea**, are considered **maternal** due to their pharmacologically induced lactation state.

- **Pediatric patients** are individuals **under 18 years old**.

---

### Exclusion Criteria:
The article should be excluded **if any** of the following are true:

- It is a **review article** or **meta-analysis**.
- It focuses **only on isolated tissues or cells**, not involving living human participants.
- It involves **only animal** models or samples.
- It is a **health system usability** or **policy** study not involving human biological outcomes.

---

## **Question 2: Does this paper involve pharmacokinetics (PK) in maternal or pediatric patients?**
**Return `True`** if the article involves pharmacokinetic (PK) or pharmacodynamic (PD) study of a drug in **maternal or pediatric patients** (as defined above).
**Return `False`** otherwise.

### A **PK/PD study** includes **any** of the following:

- Measuring drug or metabolite levels in biological fluids (e.g., blood, urine, breastmilk).
- Studying **absorption**, **distribution**, **metabolism**, or **excretion** (ADME).
- Performing PK/PD **modeling**.
- Developing **analytical methods** to quantify drugs/metabolites in humans.

> **Note:** Measuring drug concentration in human biological fluids after administration (even without full PK modeling) is sufficient to qualify.
> **Additional Note:** Studies evaluating changes in physiological or metabolic parameters (e.g., electrolytes, acid-base status, blood glucose) in response to drug treatment—such as insulin in DKA—should be considered pharmacodynamic (PD) studies, even if drug concentrations are not directly measured.

---

## Examples for Inclusion (maternal context):

- Women receiving **misoprostol** for medical abortion or labor induction.
- Women treated for **endometritis**, **adnexitis**, or **pelvic inflammatory disease**, if these involve fertility-related organs.
- Healthy women who **develop galactorrhea** or produce milk after **prolactin** or similar hormonal treatment.
- Non-pregnant women given **prolactin** for milk induction = maternal.

---

### Return Format:

{
  "Question 1": True or False,
  "Question 2": True or False
}


---

### VERY IMPORTANT:
Base your decision **only** on the information provided in the title, abstract, and MeSH terms.
**Do not assume** pregnancy, lactation, or infertility unless it is clearly implied or directly stated.
"""

pk_shots = """
To help you understand how to answer, I will provide a few examples.

**Example 1:**

Title: In vitro validation of a method for neonatal urine collection and analysis.
Abstract: Urine Collection And Analysis Is Important For Diagnosis, Monitoring Of Clinical Progress, And Research In Neonates. This Study Aims To Validate A Novel Methodology For Neonatal Urine Collection, Which Combines The Convenience Of Cotton Ball Collection With Accurate Timing Via A Urine Continence Monitor. Laboratory Model Using A Combined Cotton Ball And Urinary Incontinence Monitor Method With And Without The Presence Of An Impermeable Membrane To Prevent Desiccation. Accuracy, Bias And Precision In Measurement Of Urine Volume, Electrolytes (Sodium, Potassium, Chloride), Creatinine And Gentamicin. Changes In Analyte Concentration Over Time, And Evaporative Loss Of Water, Were Tested Using Analysis Of Variance. The Effects Of Time, Temperature And Humidity Were Explored Using Multivariate Analysis Of Variance. With The Use Of An Impermeable Membrane, Sodium Concentration Increased From A Mean (Sd) Of 3.57% (0.68) At 1鈥毭劽in To 5.03% (0.74) At 120鈥毭劽in. There Was No Significant Change In Potassium, Chloride Or Creatinine Concentrations. Gentamicin Concentration Decreased By A Mean (Sd) Of 9.05% (1.37) By 30鈥毭劽in. Multivariate Analysis Found That Absolute Change In Weight, Sodium And Chloride Were Only Dependent On Duration. Gentamicin Concentration Was Affected By Duration, Humidity And Temperature. Relative Evaporative Loss Was Minimal At -0.58% (0.31), And The Urinary Continence Monitor Was 100% Successful At Detecting Urination For All Time Points. This Novel Methodology Provides A Standardisable And Practical Method To Collect Small Volumes Of Neonatal Urine For Accurate Measurement Of Both Urine Output And Analyte Concentrations.
MeSH Terms: No MeSH Term.

Analysis:
Question 1: Should this article be included? - True
Inclusion Criteria Met: The study is about neonates, who are clearly within the definition of pediatric patients (under 18 years old).
Although it is an in vitro validation, the method is specifically designed for urine collection in neonates—real human patients. The study is directly relevant to the care of living pediatric patients, so it satisfies the inclusion criteria.
Exclusion Criteria Not Met: It is not a review, does not involve only animals, and it is not solely a policy/usability study. While it uses a lab model, it relates to human biological outcomes, so it doesn't fall under the “isolated tissues only” exclusion.

Question 2: Does this paper involve pharmacokinetics (PK)? - False
While gentamicin concentration is measured, the purpose is not to study drug kinetics in human patients, but to assess how well the urine collection method preserves analyte stability over time in a simulated environment.
There is no administration of gentamicin to pediatric patients, and no analysis of absorption, distribution, metabolism, or excretion (ADME) in humans.
Therefore, this does not qualify as a PK/PD study under the given definitions.

Answer:
{
  "Question 1": True,
  "Question 2": False
}

**Example 2:**

Title: Modulation of S. epidermidis-induced innate immune responses in neonatal whole blood.
Abstract: Coagulase-Negative Staphylococci (Cons) Such As Staphylococcus Epidermidis Are Highly Prevalent Pathogens For Sepsis In Neonates. The Interaction Between Host, Environment And Pathogenic Factors Of S. Epidermidis Are Still Poorly Understood. Our Objective Was To Address The Role Of Several Pathogenic Factors Of S. Epidermidis On Neonatal Cytokine Responses And To Characterize The Influence Of Three Immunomodulatory Drugs. We Performed An Ex-Vivo Model Of S. Epidermidis Sepsis By Assessment Of Blood Cytokine Production In Neonatal Whole Blood Stimulation Assays (Elisa). S. Epidermidis Strains With Different Characteristics Were Added As Full Pathogen To Umbilical Cord Blood Cultures And The Influence Of Indomethacin, Ibuprofen And Furosemide On Neonatal Immune Response To S. Epidermidis Was Evaluated (Flow Cytometry). Stimulation With S. Epidermidis Sepsis Strains Induced Higher Il-6 And Il-10 Expression Than Stimulation With Colonization Strains. Biofilm Formation In Clinical Isolates Was Associated With Increased Il-10 But Not Il-6 Levels. In Contrast, Stimulation With Mutant Strains For Biofilm Formation And Extracellular Virulence Factors Had No Major Effect On Cytokine Expression. Notably, Addition Of Ibuprofen Or Indomethacin To S. Epidermidis Inoculated Whole Blood Resulted In Mildly Increased Expression Of Tnf-艗卤 But Not Il-6, While Frusemide Decreased The Production Of Pro-Inflammatory Cytokines, I.E. Il-6 And Il-8. The Virulence Of Sepsis Strains Is Coherent With Increased Cytokine Production In Our Whole-Blood In-Vitro Sepsis Model. Biofilm Formation And Expression Of Extracellular Virulence Factors Had No Major Influence On Readouts In Our Setting. It Is Important To Acknowledge That Several Drugs Used In Neonatal Care Have Immunomodulatory Potential.
MeSH Terms:  ||| Amidohydrolases ||| Bacterial Proteins ||| Cytokines ||| Humans ||| Immunity, Innate ||| Immunomodulation ||| Infant, Newborn ||| Interleukins ||| Sepsis ||| Staphylococcal Infections ||| Staphylococcus Epidermidis ||| Virulence Factors ||| Genetics ||| Genetics ||| Metabolism ||| Metabolism ||| Immunology ||| Microbiology ||| Immunology ||| Genetics ||| Immunology ||| Isolation & Purification ||| Immunology

Analysis:
Question 1: Should this article be included? - True
The study uses neonatal whole blood and umbilical cord blood, which qualifies under the inclusion criteria as a pediatric tissue sample.
Although it is conducted in vitro, it involves human-derived biological material from neonates, which is acceptable under the inclusion rule.
The study does not meet any exclusion criteria: it is not a review, does not involve only animals, and is related to human biological outcomes.
Question 2: Does this paper involve pharmacokinetics (PK)? - False
This study is entirely in vitro.
Drugs such as ibuprofen, indomethacin, and furosemide are added to blood samples outside the body, not administered to living pediatric patients.
There is no measurement of drug concentrations in human fluids, no analysis of absorption, distribution, metabolism, or excretion (ADME), and no pharmacokinetic or pharmacodynamic modeling.
According to the prompt’s definitions, in vitro drug exposure without drug level measurement in living patients does not qualify as a pharmacokinetic study.
The key exclusion point is that because this is an in vitro model, it must be excluded from being classified as a PK or PD study, even though drugs are involved.

Answer:
{
  "Question 1": True,
  "Question 2": False
}

**Example 3:**

Title: The Pharmacogenetics of Efavirenz Metabolism in Children: The Potential Genetic and Medical Contributions to Child Development in the Context of Long-Term ARV Treatment.
Abstract: Efavirenz (Efv) Is A Well-Known, Effective Anti-Retroviral Drug Long Used In First-Line Treatment For Children And Adults With Hiv And Hiv/Aids. Due To Its Narrow Window Of Effective Concentrations, Between 1 And 4 螠g/Ml, And Neurological Side Effects At Supratherapeutic Levels, Several Investigations Into The Pharmacokinetics Of The Drug And Its Genetic Underpinnings Have Been Carried Out, Primarily With Adult Samples. A Number Of Studies, However, Have Examined The Genetic Influences On The Metabolism Of Efv In Children. Their Primary Goal Has Been To Shed Light On Issues Of Appropriate Pediatric Dosing, As Well As The Manifestation Of Neurotoxic Effects Of Efv In Some Children. Although Efv Is Currently Being Phased Out Of Use For The Treatment Of Both Adults And Children, We Share This Line Of Research To Highlight An Important Aspect Of Medical Treatment That Is Relevant To Understanding The Development Of Children Diagnosed With Hiv.
MeSH Terms: ||| Alkynes ||| Anti-Hiv Agents ||| Benzoxazines ||| Child ||| Child Development ||| Child, Preschool ||| Cyclopropanes ||| Cytochrome P-450 Cyp2B6 ||| Hiv Infections ||| Humans ||| Pharmacogenetics ||| Reverse Transcriptase Inhibitors ||| Administration & Dosage ||| Metabolism ||| Toxicity ||| Administration & Dosage ||| Metabolism ||| Toxicity ||| Administration & Dosage ||| Metabolism ||| Toxicity ||| Drug Effects ||| Administration & Dosage ||| Metabolism ||| Toxicity ||| Genetics ||| Drug Therapy ||| Administration & Dosage ||| Metabolism ||| Toxicity

Analysis:
Question 1: Should this article be included? - False
Although the topic focuses on children and discusses efavirenz pharmacokinetics and pharmacogenetics, the abstract does not describe a new study or original data collection.
Instead, it summarizes the findings of a number of studies and notes that “we share this line of research” to highlight its relevance. This language strongly suggests it is a narrative or literature review, not primary research.
Therefore, the article meets an exclusion criterion: it is a review article, which disqualifies it regardless of the population or topic.
Question 2: Does this paper involve pharmacokinetics (PK)? - False
While the content clearly discusses PK in pediatric patients, the prompt specifies that reviews must be excluded from consideration, including for PK/PD classification.
Since this article is a review and not a primary PK study involving drug measurements in children, it does not qualify under the PK rule either.

Answer:
{
  "Question 1": False,
  "Question 2": False
}
"""

#### VC

In [20]:
vc_questionnaire = """
Title: {title}
Abstract: {abstract}
MeSH Terms: {mesh}

Please answer the following two questions with True or False:

**Question 1:** Should this article be included based on my inclusion or exclusion criteria?
The paper should be excluded if it meets **any** exclusion criteria or if it fails to meet **either** inclusion criterion.
**Inclusion criteria:**
A. Research participants should be human maternal, pediatric, or adolescents patients, or the abstract should directly measure tissue samples from maternal/pediatric patients.
We define maternal patients as:
1) Patients in stages between the onset of pregnancy to the end of lactation or pregnancy termination; or
2) Patients undergoing IVF treatment or other infertility treatments.
We define pediatric patients as patients under 18 years old.

**Exclusion criteria:**
A. The paper is a review or meta-analysis.
B. The paper focuses exclusively on applying drugs to tissues/cells isolated from the human body.
C. The paper focuses exclusively on animal studies or samples and does not involve humans.
D. The paper is a health system usability or health policy study on medication usage.

**Question 2:** Does this study fall under the definition of an vaccine or cell therapy study?
Does this study involve vaccines or cell therapy?
This includes not only conventional cell therapy but also any therapeutic procedure that uses human cells for therapeutic purposes, excluding transplantation. Additionally, epidemiological studies examining the use, prescription patterns, or frequency of vaccines or cell therapy in maternal or pediatric populations are also included under this category.
**Note:**
1) Procedures such as stem cell rescue and the infusion of viable cells into the human body (including components like red blood cells or platelet transfusion) may be broadly considered forms of cell therapy, as they involve the therapeutic use of living cells. Therefore, plasma transfusion is considered as a kind of cell therapy.
2) If the study includes manipulation of sperm, oocytes, or fertilized embryos (e.g., in vitro fertilization or intracytoplasmic sperm injection), it may also be considered a form of cell therapy, as it involves the therapeutic use of human cells outside the body to achieve a medical outcome.

**Return Format Example:**
{
  "Question 1": True or False,
  "Question 2": True or False
}
"""

vc_shots = """
To help you understand how to answer, I will provide a few examples.

**Example 1:**

Title: Economic value to parents of reducing the pain and emotional distress of childhood vaccine injections.
Abstract: One Reason That Recommended Childhood Immunizations Due At Child Health Visits Are Deferred Is To Avoid The Pain And Emotional Distress Associated With The Increasing Number Of Injections Required. This Deferral Leads To Additional Visits And Costs And Reduced Immunoprotection Against Vaccine-Preventable Illnesses. To Assess The Economic Value Of Combination Vaccines That Address This Problem, We Surveyed Parents To Determine The Amount They Would Be Willing To Pay To Avoid The Pain And Emotional Distress Experienced By Their Infants From Injections. A Self-Administered Questionnaire Was Completed Within 24 H Of The Vaccinations By 294 Parents Of Children Ages 11/2 To 7 Months Receiving Vaccine Injections At 26 Outpatient Child Health Centers. The Willingness-To-Pay (Wtp) Method Was Used To Estimate The Intangible Cost Of The Pain And Emotional Distress Of The 1 To 4 Injections Their Child Had Received. Parents Were Asked How Much Of Their Own Money They Would Have Paid To Avoid These Injections, Without Any Compromise In The Safety And Efficacy Of The Vaccinations. Wide Variations In Wtp Amounts Were Observed, Ranging From Median Values Of $10 To $25 And Average Values Of $57.06 To $79.28 To Avoid The Pain And Emotional Distress Associated With Eliminating All Injections At Visits In Which One To Four Injections Were Administered. Parents Placed Greater Value On Reductions That Avoided All Injections Than On Reductions That Avoided Only Some Injections. Overall The Median Cost Per Injection Avoided Was $8.14, And The Mean Was $30.28. Parents Have Strong Preferences For Limiting Vaccine Injections. The Economic Cost Of The Pain And Distress Associated With Such Injections, Reflected In The Amounts They Report They Would Be Willing To Pay To Avoid Them, Represents A Substantial Component Of The Cost Of Disease Control Through Immunization.
MeSH Terms:  ||| Adult ||| Cost-Benefit Analysis ||| Female ||| Health Care Surveys ||| Humans ||| Immunization Schedule ||| Infant ||| Injections ||| Male ||| Pain ||| Parents ||| Stress, Psychological ||| Surveys And Questionnaires ||| Vaccination ||| Vaccines, Combined/Adverse Effects/Economics/Economics/Etiology/Economics/Etiology/Adverse Effects/Economics/Psychology/Administration & Dosage/Economics

Analysis:
Question 1: This study should be included because it involves pediatric patients and does not meet any exclusion criteria.
Question 2: However, it should not be classified as a vaccine or cell therapy study, as it focuses on cost-benefit analysis related to parental preferences rather than evaluating the use, efficacy, or frequency of vaccines or cell therapies.

Answer:
{
  "Question 1": True,
  "Question 2": False
}

**Example 2:**

Title: Graft rejection and hyperacute graft-versus-host disease in stem cell transplantation from non-inherited maternal-antigen-complementary HLA-mismatched siblings.
Abstract: Human Leukocyte Antigen (Hla)-Mismatched Stem Cell Transplantation From Non-Inherited Maternal Antigen (Nima)-Complementary Donors Is Known To Produce Stable Engraftment Without Inducing Severe Graft-Versus-Host Disease (Gvhd). We Treated Two Patients With Acute Myeloid Leukemia (Aml) And One Patient With Severe Aplastic Anemia (Saa) With Hla-Mismatched Stem Cell Transplantation (Sct) From Nima-Complementary Donors (Nima-Mismatched Sct). The Presence Of Donor And Recipient-Derived Blood Cells In The Peripheral Blood Of Recipient (Donor Microchimerism) And Donor Was Documented Respectively By Amplifying Nima-Derived Dna In Two Of The Three Patients. Graft Rejection Occurred In The Saa Patient Who Was Conditioned With A Fludarabine-Based Regimen. Grade Iii And Grade Iv Acute Gvhd Developed In Patients With Aml On Day 8 And Day 11 Respectively, And Became A Direct Cause Of Death In One Patient. The Findings Suggest That Intensive Conditioning And Immunosuppression After Stem Cell Transplantation Are Needed In Nima-Mismatched Sct Even If Donor And Recipient Microchimerisms Is Detectable In The Donor And Recipient Before Sct.
MeSH Terms:   ||| Acute Disease ||| Adolescent ||| Adult ||| Anemia, Aplastic ||| Blast Crisis ||| Chimera ||| Cord Blood Stem Cell Transplantation ||| Disease Progression ||| Fatal Outcome ||| Female ||| Graft Rejection ||| Graft Vs Host Disease ||| Hla Antigens ||| Histocompatibility ||| Humans ||| Immunity, Maternally-Acquired ||| Isoantigens ||| Leukemia, Myelogenous, Chronic, Bcr-Abl Positive ||| Leukemia, Myeloid ||| Male ||| Peripheral Blood Stem Cell Transplantation ||| Precursor Cell Lymphoblastic Leukemia-Lymphoma ||| Remission Induction ||| Siblings ||| Tissue Donors ||| Transplantation Conditioning ||| Vidarabine ||| Immunology ||| Pathology ||| Surgery ||| Immunology ||| Pathology ||| Surgery ||| Genetics ||| Immunology ||| Genetics ||| Immunology ||| Genetics ||| Immunology ||| Prevention & Control ||| Genetics ||| Immunology ||| Immunology ||| Immunology ||| Pathology ||| Surgery ||| Drug Therapy ||| Pathology ||| Adverse Effects ||| Immunology ||| Pathology ||| Surgery ||| Methods ||| Administration & Dosage ||| Analogs & Derivatives

Analysis:
Question 1: This study should be excluded because it does not clearly include pediatric or adolescent patients, which is a requirement for inclusion. Although "adolescent" appears in the MeSH terms, the abstract does not specify whether any of the actual study participants fall within this group.
Question 2: However, it should be classified as a cell therapy study, as it involves stem cell transplantation, which fits the definition of therapeutic use of human cells.

Answer:
{
  "Question 1": False,
  "Question 2": True
}

**Example 3:**

Title: Effects of early maternal cancer and fertility treatment on the risk of adverse birth outcomes.
Abstract: Early Maternal Cancer And Fertility Treatment Each Increase The Risk For Adverse Birth Outcomes, But The Joint Effect Of These Outcomes Has Not Yet Been Reported. Thus, The Aim Was To Assess The Individual And Joint Effect Of Maternal Cancer And Fertility Treatment On The Risk For Adverse Birth Outcomes. This Population-Based Cohort Study Included 5487 Live-Born Singletons Identified In The Danish Medical Birth Register (1994-2016) Of Mothers With Previous Cancer (<40 Years) Recorded In The Danish Cancer Registry (1955-2014). We Randomly Selected 80,262 Live-Born Singletons Of Mothers With No Cancer <40 Years Matched To Mothers With Cancer By Birth Year And Month. We Calculated Odds Ratios (Ors) For Preterm Birth, Low Birth Weight (Lbw) (<2500\xa0G) And Small For Gestational Age (Sga), Mean Differences In Birth Weight In Grams, And Additional Cases Of Preterm Birth (Gestational Age<259 Days) Per 100,000 Person-Years. Multiplicative And Additive Interaction Of Maternal Cancer And Fertility Treatment Was Compared With Outcomes Of Children Conceived Naturally To Mothers With No Maternal Cancer (Reference Group). Among 84,332 Live-Born Singletons, Increased Ors For Preterm Birth Were Observed Among Children Born To Mothers With Previous Cancer (1路48, 95% Confidence Interval [Ci] 1路33-1.65) Or After Fertility Treatment (1路43, 95% 1路28-1-61), With 22 Additional Cases Of Preterm Birth Among Both Group Of Children (95% Ci 15-29; 95% Ci 14-30). In The Joint Analyses, The Or For Sga For Children Born After Fertility Treatment To Mothers With Previous Cancer Was Similar To That Of The Reference Group (Or 1路02, 95% Ci 0路72-1路44, P For Interaction=0路52). Children With Both Exposures Had Increased Ors For Lbw (1路86, 95% Ci 1路17-2路96, P For Interaction=0路06) And Preterm Birth (2路31, 955 Ci 1路66-3路20, P For Interaction\xa0=\xa00路56), With 61 Additional Cases Of Preterm Birth (95% Ci 27-95, P For Interaction=0.26) Over That Of Children In The Reference Group. The Mean Birth Weight Was Also Lower In Children Born To Mothers With Both Exposures (-140\xa0G, 95% Ci -215; -65) (P For Interaction=0.06) But Decreased To -22\xa0G (95% Ci -76; 31) After Adjustment For Ga. Although We Did Not Find Any Statistically Significant Additive Interaction Between Maternal Cancer And Fertility Treatment, Children Born After Fertility Treatment Of Mothers With Previous Cancer Were At Increased Risk For Adverse Birth Outcomes. Thus, Pregnant Women With Both Exposures Need Close Follow-Up During Pregnancy. The Danish Cancer Society And The Danish Childhood Cancer Foundation.
MeSH Terms: No MeSH Term.

Analysis:
Question 1: This study should be included because it involves maternal patients who underwent fertility treatment and assesses outcomes in their children, satisfying the inclusion criteria. It does not meet any exclusion criteria.
Question 2: However, it should not be classified as a vaccine or cell therapy study, as the specific methods of fertility treatment—such as IVF or embryo manipulation—are not detailed in the abstract.

Answer:
{
  "Question 1": True,
  "Question 2": False
}

**Example 4:**

Title: Delayed presentation of severe combined immunodeficiency due to prolonged maternal T cell engraftment.
Abstract: Severe Combined Immunodeficiency (Scid) Is A Primary Immunodeficiency Disorder With Heterogenous Genetic Etiologies. We Describe A Typical Case In A 9-Year-Old Boy That Was Masked By A Clinically Functional Maternal T Cell Engraftment Leading To Late Presentation With Pneumocystis Jiroveci Pneumonia And Cytomegalovirus Infection, Probably Following Exhaustion Of Maternally Engrafted Cells. Based On Immunological Findings, He Had A T- B+Scid Phenotype.This Report Suggests That In Rare Cases, Engrafted Maternal T Cell Might Persist For Long Time Leading To Partial Constitution Of Immune Function And Delayed Clinical Presentation Of Scid.
MeSH Terms:  ||| Child ||| Cytomegalovirus Infections ||| Female ||| Humans ||| Lymphocyte Activation ||| Phenotype ||| Pneumocystis Carinii ||| Pneumonia, Pneumocystis ||| Pregnancy ||| Prenatal Exposure Delayed Effects ||| Severe Combined Immunodeficiency ||| T-Lymphocytes ||| Time Factors ||| X-Linked Combined Immunodeficiency Diseases ||| Immunology ||| Isolation & Purification ||| Diagnosis ||| Etiology ||| Microbiology ||| Diagnosis ||| Etiology ||| Genetics ||| Immunology ||| Diagnosis ||| Etiology ||| Genetics

Analysis:
Question 1: This study should be included because it involves a pediatric patient and does not meet any exclusion criteria.
Question 2: However, it should not be classified as a cell therapy study, as the maternal T cells entered the fetus naturally through the placenta during pregnancy. This passive transfer is a biological phenomenon, not a therapeutic intervention involving the intentional use of human cells.
Answer:
{
  "Question 1": True,
  "Question 2": False
}
"""

#### FBNSTP

In [21]:
fbnstp_questionnaire = """
Title: {title}
Abstract: {abstract}
MeSH Terms: {mesh}

Please answer the following two questions with True or False:

**Question 1:** Should this article be included based on my inclusion or exclusion criteria?
The paper should be excluded if it meets **any** exclusion criteria or if it fails to meet **either** inclusion criterion.
**Inclusion criteria:**
A. Research participants should be human maternal or pediatric patients, or the abstract should directly measure tissue samples from maternal/pediatric patients.
We define maternal patients as:
1) Patients in stages between the onset of pregnancy to the end of lactation or pregnancy termination; or
2) Patients undergoing IVF treatment or other infertility treatments.
We define pediatric patients as patients under 18 years old.
**Note:** If a study investigates exposures that occurred *in utero* or during childhood (pediatric stage), it may still meet inclusion criteria **even if the study population is now composed of adults**. The key consideration is whether the exposure of interest happened during the fetal or pediatric period.
B. Or the study evaluates exposures that occurred during fetal or pediatric stages, even if the health outcomes are measured in adulthood;
**Exclusion criteria:**
A. The involves diseases caused by RNA or DNA mutations.
B. The paper is a review or meta-analysis.
Tip: If the abstract uses language such as “this article explores,” “we review,” “current evidence-based recommendations,” or similar phrasing — and it does not describe original data collection or a defined study population — the paper is likely a review and should be excluded.
C. The paper focuses exclusively on applying drugs to tissues/cells isolated from the human body.
D. The paper focuses exclusively on animal studies or samples and does not involve humans.
E. The paper is a health system usability or health policy study on medication usage.

**Question 2:** Does this study fall under the definition of an FBNSTP study?
FBNSTP includes studies that evaluate the effect of the following factors on maternal or pediatric populations (or their tissue samples), either as a primary or secondary objective:
A. Specific **food** ingredients
B. **Breastfeeding** (only if there is a comparator group, such as formula)
C. Specific **nutrition** elements (e.g., calcium, vitamins, TPN, etc.)
D. **Smoking**
E. Environmental **toxins**
F. **Pollutants**
G. **Oxygen exposure**

**Return Format Example:**
{
"Question 1": True or False,
"Question 2": True or False
}
"""

fbnstp_shots = """
To help you understand how to answer, I will provide a few examples.

**Example 1:**

Title: Maternal and offspring xenobiotic metabolism haplotypes and the risk of childhood acute lymphoblastic leukemia.
Abstract: Discovering Genetic Predictors Of Childhood Acute Lymphoblastic Leukemia (All) Necessitates The Evaluation Of Novel Factors Including Maternal Genetic Effects, Which Are A Proxy For The Intrauterine Environment, And Robust Epidemiologic Study Designs. Therefore, We Evaluated Five Maternal And Offspring Xenobiotic Metabolism Haplotypes And The Risk Of Childhood All Among 120 Case-Parent Triads. Two Of The Five Haplotypes Were Significantly Associated With Risk: Gstm3/Gstm4 (P=0.01) And Gstp1 (P=0.02). The Ephx1 Haplotype Was Marginally Associated With Risk (P=0.05), Whereas Haplotypes In Cyp1B1 And Gsta4 Were Not. Our Results Suggest Genetic Variation In Xenobiotic Metabolism Is Important In Childhood All Etiology.
MeSH Terms:  ||| Adolescent ||| Adult ||| Aryl Hydrocarbon Hydroxylases ||| Child ||| Child, Preschool ||| Cytochrome P-450 Cyp1B1 ||| Epoxide Hydrolases ||| Female ||| Genetic Variation ||| Glutathione Transferase ||| Haplotypes ||| Humans ||| Infant ||| Infant, Newborn ||| Male ||| Maternal-Fetal Exchange ||| Precursor Cell Lymphoblastic Leukemia-Lymphoma ||| Pregnancy ||| Retrospective Studies ||| Xenobiotics ||| Genetics ||| Metabolism ||| Genetics ||| Metabolism ||| Genetics ||| Metabolism ||| Genetics ||| Enzymology ||| Genetics ||| Metabolism

Analysis: This study investigates the association between maternal and offspring xenobiotic metabolism haplotypes and the risk of childhood acute lymphoblastic leukemia (ALL) using a case-parent triad design. The population includes children (pediatric patients) and their mothers, which meets Inclusion Criterion A. However, the study’s primary objective is to evaluate how genetic variation (haplotypes) in xenobiotic metabolism genes influences ALL risk. Because the disease in focus (ALL) is being studied through DNA-level genetic mutations, the study meets Exclusion Criterion A: diseases caused by RNA or DNA mutations, and should therefore be excluded.
Regarding FBNSTP classification, the study explores xenobiotic metabolism, which refers to the body’s ability to process and eliminate foreign chemical substances. These are often environmental toxins or pollutants. Therefore, this study falls under FBNSTP Category E (Environmental toxins) or F (Pollutants) because it assesses how detoxification pathways modulate disease risk, even though the exposure is inferred genetically.

Answer:
{
"Question 1": False,
"Question 2": True
}

**Example 2:**

Title: MAGAM II - prospective observational multicentre poisons centres study on eye exposures caused by cleaning products.
Abstract: Objective: Local Effects On The Eye Following Cleaning Product Exposures Are Frequently Reported. According To Eu Chemicals Legislation Many Cleaning Products Are Labelled With Hazard Phrase 318 Indicating Risk Of Irreversible Eye Damage. The Objectives Of This Study Were To Identify Cleaning Products With Potential For Irreversible Eye Damage By Collecting Human Exposure Data From Poisons Centres (Pc), And To Clarify To What Degree Exact Product Identification Is Possible During A Pc Telephone Call. Methods: Magam Ii Was A Multicentre Binational Prospective Observational Pc Study. All Human Eye Exposures To Detergents Or Maintenance Products Reported To Nine Pcs Taking Calls From The Public And Medical Professionals During An 18-Month Period Were Included. The Severity Of Eye Effects Was Rated According To The Who Poisoning Severity Score. Results: Five Hundred And Eighty-Six Cases Were Included. Product Identification By Name Leading To Formula Information Was Successful In 533 Cases (91%). Follow-Up Was Successful In 528 Exposures. Irrigation Was Performed In 94% Of Cases. Duration Of Symptoms Was 鈮?4\u2009Hours In 73 Patients (25%). 33 (6%) Patients Developed Moderate Eye Injury. Healing Was Reported In All Cases. The Percentage Of Moderate Cases Was Highest In The Group Of Drain Cleaners (25%), Toilet Cleaners (18%) And Oven Cleaners (15%). Products Intended For Professional Use Caused Relatively More Moderate Eye Injuries Than Products Also Intended For Consumer Use. Conclusion: Magam Ii Has Shown That Pcs Are Able To Identify Formulas In Sufficiently High Quality As Needed For Product-Directed Toxicovigilance. The Results Underline The Potential Of Pc Exposure Case Data For Product Safety Monitoring. The Results Indicate That Irreversible Eye Damage Is Very Rare After Cleaning Product Exposure.
MeSH Terms:   ||| Adolescent ||| Adult ||| Age Factors ||| Aged ||| Aged, 80 And Over ||| Child ||| Child, Preschool ||| Detergents ||| Eye Injuries ||| Female ||| Germany ||| Humans ||| Infant ||| Injury Severity Score ||| Male ||| Middle Aged ||| Poison Control Centers ||| Prospective Studies ||| Time Factors ||| Young Adult ||| Toxicity ||| Chemically Induced ||| Epidemiology ||| Epidemiology ||| Statistics & Numerical Data

Analysis:
This prospective observational study includes human participants of all ages, including children, and investigates eye injuries from exposure to household cleaning products. Since pediatric patients are part of the study population, and none of the exclusion criteria (e.g., genetic disease, animal-only studies, reviews) are met, it satisfies the inclusion criteria. Therefore, Question 1 is True.
Although the study involves chemical exposures, it does not assess the health effects of any specific environmental toxin or pollutant, nor does it analyze the impact on maternal or pediatric populations in a targeted way. It focuses on injury reporting and product identification. Therefore, Question 2 is False.

Answer:
{
"Question 1": True,
"Question 2": False
}

**Example 3:**

Title: Acute renal failure in neonates.
Abstract: Acute Renal Failure (Arf) Is A Common Condition Seen In Neonatal Intensive Care Units. It Is Broadly Classified Into Prerenal, Intrinsic Renal And Post Renal Failure. There Is No Consensus On The Definition Of Neonatal Arf. Of Utmost Importance Is To Differentiate Prerenal From Intrinsic Renal Failure. The Most Common Causes Of Neonatal Arf Are Hypovolemia, Hypotension And, Hypoxia. Among Several Indices That Are Available For Differentiating Prerenal Failure From Intrinsic Renal Failure, Fractional Excretion Of Sodium Is The Preferred Index. Diagnostic Fluid Challenge With Or Without Frusemide Is A Bed Side Method For Differentiating Prerenal Failure From Intrinsic Renal Failure. Babies With Arf Have To Be Monitored For Several Metabolic Derangements Like Hyponatremia, Hyperkalemia, Hypocalcemia, And Acidosis And Have To Be Managed Accordingly. Fluid Balance Should Be Precise In Order To Avoid Fluid Overload. It Is Difficult To Provide Adequate Calories Due To Fluid Restriction. Dialysis Has To Be Instituted To Preempt Complications. Peritoneal Dialysis Is The Easiest And Safest Modality. These Babies Need Long Term Follow Up As They Are Prone For Long Term Complications.
MeSH Terms:   ||| Acute Kidney Injury ||| Combined Modality Therapy ||| Drug Therapy, Combination ||| Female ||| Fluid Therapy ||| Glomerular Filtration Rate ||| Humans ||| Incidence ||| Infant, Newborn ||| Kidney Function Tests ||| Male ||| Prognosis ||| Renal Dialysis ||| Risk Assessment ||| Severity Of Illness Index ||| Survival Rate ||| Treatment Outcome ||| Water-Electrolyte Imbalance ||| Diagnosis ||| Epidemiology ||| Therapy ||| Methods ||| Methods ||| Diagnosis ||| Therapy

Analysis:
This study examines acute renal failure in neonates, focusing on causes, diagnosis, and management in a clearly defined pediatric population (newborns). It includes original data and does not meet any exclusion criteria like genetic disease focus or being a review. Therefore, Question 1 is True.
While the article references factors like hypoxia, hypotension, and hypovolemia, these are discussed as clinical conditions or diagnostic indicators, not as environmental exposures under investigation. They are not evaluated as FBNSTP-type exposures (e.g., oxygen as an external exposure variable). Therefore, Question 2 is False.

Answer:
{
"Question 1": False,
"Question 2": True
}
"""

#### Biomarker

In [22]:
biomarker_questionnaire = """
Title: {title}
Abstract: {abstract}
MeSH Terms: {mesh}

Please answer the following four questions with True or False:

**Question 1:** Should this article be included based on my inclusion or exclusion criteria?
The paper should be excluded if it meets **any** exclusion criteria or if it fails to meet **either** inclusion criterion.
**Inclusion criteria:**
A. Research participants should be human maternal or pediatric patients, or the abstract should directly measure tissue samples from maternal/pediatric patients.
We define maternal patients as:
1) Patients in stages between the onset of pregnancy to the end of lactation or pregnancy termination; or
2) Patients undergoing IVF treatment or other infertility treatments.
We define pediatric patients as patients under 18 years old.
B. The paper should study endogenous/exogenous compounds, vaccines, cell therapy, protein/peptide, specific nutritional elements, specific food ingredients, smoking, breast milk with no drug or drug metabolite, environmental toxins, pollutants, or oxygen, or compounds/cell-type biomarkers. Alternatively, the paper should focus on drug/metabolite assay development, epidemiological surveys on medication prescription or usage frequencies, comparisons of breastfeeding and formula feeding, or cells as biomarkers of disease diagnosis, prognosis, or progression.

**Exclusion criteria:**
A. The paper is a review or meta-analysis.
B. The paper focuses exclusively on applying drugs to tissues/cells isolated from the human body.
C. The paper focuses exclusively on animal studies or samples and does not involve humans.
D. The paper is a health system usability or health policy study on medication usage.

**Question 2:** Does this study involve any chemical substances used for **disease diagnosis, prognosis, or progression**?
**Inclusion criteria**
A. The substance could be any ions, endogenous or exogenous compounds, or cells.
**Exclusion criteria:**
A. The subtance must not be viruses, bacteria, mRNA, miRNA, DNA methylation, DNA mutations, or other epigenetic biomarkers.

**Question 3:** If a substance mentioned in Question 2 exists, is it in response to any drugs (substances used to prevent, treat, or cure diseases or medical conditions), vaccines, cell therapy, or FBNSTP?
Specifically, if the substance is the primary hypothesis and the drug is considered a secondary hypothesis as a treatment factor, you must return False for Question 3!

**Return Format Example:**
{
  "Question 1": True or False,
  "Question 2": True or False,
  "Question 3": True or False
}
"""

biomarker_shots = """
To help you understand how to answer, I will provide a few examples.

**Example 1:**

Title: [Peripheral primitive neuroectodermal tumors of the soft tissues and bones].
Abstract: Large Group Of Small-Round-Cell Tumours Of Soft Tissues And Bone Represents A Complex Diagnostic Problem For The Pathologists. Neuronal Nature Of Many Tumours From This Group Is Proven By Means Of New Methods--Immunophenotypic Analysis, Tissue Culture, Cytogenetics. Peripheral Neuroepithelioma, Ewing Tumour, Primitive Neuroectodermal Tumour (Pnet), Askin Tumour Belong To These Neoplasms. These Tumours Anatomically Have No Connection With The Structures Of The Central Nervous System Or Autonomous Sympathetic Nervous System.
MeSH Terms:  ||| Adolescent ||| Bone Neoplasms ||| Child ||| Diagnosis, Differential ||| Humans ||| Neuroectodermal Tumors, Primitive ||| Sarcoma, Ewing ||| Sarcoma, Small Cell ||| Soft Tissue Neoplasms/Pathology/Pathology/Pathology/Pathology/Pathology

Analysis:
Although the abstract mentions the use of methods like immunophenotypic analysis, tissue culture, and cytogenetics, it does not identify any specific chemical substances, such as ions, proteins, or compounds, that are used as biomarkers for disease diagnosis, prognosis, or progression. These methods are broad diagnostic tools, and without naming or focusing on a particular substance, they do not meet the inclusion criteria for Question 2.
Therefore, despite referencing diagnostic techniques, the study does not qualify as involving a specific biomarker under the definitions provided.

Answer:
{
  "Question 1": False,
  "Question 2": False,
  "Question 3": False
}

**Example 2:**

Title: [Latex allergy in 16 children].
Abstract: Latex Allergy Is Now Well-Known In Adults And Children. It Represents The First Cause Of Anaphylactic Operating Shock In Pediatrics. A Diagnosis Of Latex Allergy Was Made In 16 Children (Five Girls And 11 Boys), Aged 2 To 15 Years, Because Of Evoking Signs And Symptoms, From Simple Urticaria To Quincke Edema In Presence Of Latex. The Revealing Factor Was Wheezing In Balloons In 13 Out Of The 16 Patients. An Atopic Past History Was Frequent. Previous Eventually Sensitizing Surgical Operations Were Present In Five Patients; Associated Food Allergy Existed In Four. Skin Tests Were Positive In Nine Out Of 12 Patients, As Well As Latex Specific Ige (13 Out Of 16). The Diagnosis Was Made With A Labial Provocation Test In One Patient. Latex Allergy Can Be Severe And Requires That Patients Avoid Any Contact With Rubber Objects, Especially Gloves. A Detailed Medical Certificate Should Be Given To The Family In View Of Any Medical, Surgical Or Dental Intervention.
MeSH Terms:  ||| Adolescent ||| Child ||| Child, Preschool ||| Dermatitis, Allergic Contact ||| Female ||| Gloves, Protective ||| Humans ||| Hypersensitivity ||| Latex ||| Male/Etiology/Adverse Effects/Etiology/Adverse Effects/Immunology

Analysis:
This study involves pediatric patients and investigates allergic reactions to latex, an exogenous compound, which meets the inclusion criteria for Question 1. However, the substances measured—such as latex-specific IgE and skin test reactivity—are immune response products, not primary diagnostic substances like ions, compounds, or cells directly involved in disease mechanisms. These markers reflect the body's reaction rather than serving as diagnostic substances themselves under the definitions provided. Therefore, Question 2 is false. Since no qualifying substance exists under Question 2, Question 3 is also false.

Answer:
{
  "Question 1": True,
  "Question 2": False,
  "Question 3": False
}


**Example 3:**

Title: Comparing 36.5C with 37C for human embryo culture: a prospective randomized controlled trial.
Abstract: This Prospective, Double-Blind, Randomized Controlled Trial Was Designed To Evaluate The Efficacy Of A Culture Temperature Of 36.5C Versus 37C On Human Embryo Development In Vitro. A Total Of 412 Women Undergoing Ivf Were Randomized To Two Groups: The Oocytes And Embryos Of The Intervention Group Were Cultured At 36.5C; Those Of The Control Group Were Cultured At 37C. Although No Significant Effect Of Culture Temperature Was Observed On Pregnancy Or Implantation Rates, Differences Were Found In Embryo Development. Embryo Culture At 36.5C Was Associated With A Significantly Higher Cleavage Rate (Or 1.6, 95% Ci 1.03 To 2.51), But A Lower Fertilization Rate, Fewer High-Quality Embryos On Day 3, A Lower Blastocyst Formation Rate On Day 5, And Fewer High-Quality And Cryopreserved Blastocysts (Or 0.87, 95% Ci 0.78 To 0.98), (Or 0.60, 95% Ci 0.53 To 0.69), (Or 0.85, 95% Ci 0.75 To 0.97), (Or 0.5, 95% Ci 0.44 To 0.56) And (Or 0.77, 95% Ci 0.68 To 0.88), Respectively, Compared With 37C. On The Basis Of These Results, And In The Absence Of Data On The Optimal Temperature For Each Stage Of Embryo Development In Vitro, We Recommend Continuation Of The Use Of 37C For Human Embryo Culture.
MeSH Terms:  ||| Adult ||| Double-Blind Method ||| Embryo Culture Techniques ||| Embryo Transfer ||| Embryonic Development ||| Female ||| Fertilization In Vitro ||| Humans ||| Pregnancy ||| Pregnancy Rate ||| Prospective Studies ||| Temperature ||| Methods ||| Methods ||| Physiology

Analysis:
Question 1:
The study involves 412 adult women undergoing IVF, which meets inclusion criterion A since patients receiving IVF treatment are defined as maternal patients. The study evaluates the impact of temperature on embryo development in vitro—this falls under the category of research related to infertility treatment, which is explicitly mentioned in the inclusion criteria. There are no exclusion criteria met (e.g., it’s not a review, animal study, or purely a cell/tissue study).
Answer: True
Question 2:
The study examines the effect of temperature, not a chemical, ion, or biological compound. There are no substances used for diagnosing, predicting, or monitoring disease progression. Therefore, it does not involve any qualifying biomarker or substance.
Answer: False
Question 3:
Since no qualifying substance is identified in Question 2, this question is automatically False. Additionally, temperature is not a drug, vaccine, cell therapy, or nutritional/chemical substance.
Answer: False

Answer:
{
  "Question 1": True,
  "Question 2": False,
  "Question 3": False
}
"""

## Utilities

In [52]:
import pandas as pd
import re
import time
import os
from tqdm import tqdm
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from langchain_openai import AzureChatOpenAI
from langchain_core.messages import HumanMessage


def evaluate_predictions(file_path, pred_col='pred', true_col='label'):
    df = pd.read_csv(file_path, sep='\t')

    if pred_col not in df.columns or true_col not in df.columns:
        raise ValueError(f"Columns '{pred_col}' or '{true_col}' not found in the file.")

    y_pred = df[pred_col]
    y_true = df[true_col]

    acc = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, zero_division=0)
    recall = recall_score(y_true, y_pred, zero_division=0)
    f1 = f1_score(y_true, y_pred, zero_division=0)

    print("\nFinal Evaluation Metrics:")
    print(f"Accuracy:  {acc:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall:    {recall:.4f}")
    print(f"F1 Score:  {f1:.4f}")


def get_single_response(
    filled_prompt,
    question="Do not give the final result immediately. First, explain your thought process, then provide the answer.",
    max_retries=5
    ):
    retries = 0
    while retries < max_retries:
        try:
            client_4o = AzureChatOpenAI(
                api_key=os.environ.get("OPENAI_API_KEY", None),
                azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT", None),
                api_version=os.environ.get("OPENAI_API_VERSION", None),
                azure_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME", None),
                model=os.environ.get("OPENAI_MODEL", None),
                max_retries=5,
                temperature=0.0,
                max_tokens=os.environ.get("OPENAI_MAX_OUTPUT_TOKENS", 4096),
                top_p=0.95,
                frequency_penalty=0,
                presence_penalty=0,
            )

            prompt_list = [HumanMessage(content=msg) for msg in [filled_prompt, question]]
            res = client_4o.generate(messages=[prompt_list])
            response = res.generations[0][0].text

            # 2 questions
            match_2 = re.search(r'\{\s*"Question 1"\s*:\s*(True|False),\s*"Question 2"\s*:\s*(True|False)\s*\}', response)
            if match_2:
                q1 = match_2.group(1) == "True"
                q2 = match_2.group(2) == "True"
                label = 1 if (q1 and q2) else 0
                return label, response

            # 3 questions (biomarker)
            match_3 = re.search(r'\{\s*"Question 1"\s*:\s*(True|False),\s*"Question 2"\s*:\s*(True|False),\s*"Question 3"\s*:\s*(True|False)\s*\}', response)
            if match_3:
                q1 = match_3.group(1) == "True"
                q2 = match_3.group(2) == "True"
                q3 = match_3.group(3) == "True"
                label = 1 if (q1 and q2 and not q3) else 0
                return label, response

            raise ValueError("No valid answer format found.")

        except Exception as e:
            retries += 1
            print(f"Error: {e}, retrying ({retries}/{max_retries})...")
            time.sleep(1)

    return 2, f"[ERROR after {max_retries} retries]"


def process_file(input_path, questionnaire, shots=None):
    input_path = input_path.strip()
    assert os.path.isfile(input_path), f"Input file {input_path} does not exist!"

    file_name = os.path.basename(input_path)
    file_root, file_ext = os.path.splitext(file_name)
    output_path = f'{file_root}_out{file_ext}'

    max_retries = 5

    # Load the input file
    df = pd.read_csv(input_path, sep='\t')

    written_pmids = set()
    y_true = []
    y_pred = []

    # Resume from an existing output file if available
    try:
        existing_df = pd.read_csv(output_path, sep='\t')
        written_pmids = set(existing_df["PMID"].astype(str))

        for _, row in existing_df.iterrows():
            label = row["pred"]
            true_label = row["label"]
            if label in [0, 1]:
                y_pred.append(label)
                y_true.append(true_label)

    # If output file doesn't exist, create it with headers
    except FileNotFoundError:
        with open(output_path, 'w', encoding='utf-8') as f:
            f.write("PMID\ttitle\tabstract\tMeSH\tlabel\tpred1\treason1\tpred2\treason2\tpred\n")

    # Iterate through input rows
    progress = tqdm(df.iterrows(), total=len(df), desc=f"Processing {file_name}", ncols=150)

    for idx, row in progress:
        pmid = str(row['PMID'])
        if pmid in written_pmids:
            continue  # Skip already processed PMIDs

        title = row['title']
        abstract = row['abstract']
        mesh = row['MeSH']
        true_label = row['label']

        # Fill prompt template with article content
        filled_questionnaire = questionnaire.replace('{title}', str(title))\
                                           .replace('{abstract}', str(abstract))\
                                           .replace('{mesh}', str(mesh))

        # First prediction
        pred1, reason1 = get_single_response(filled_questionnaire, max_retries=max_retries)

        # Optional second prediction using shots if pred1 == 1
        pred2 = ""
        reason2 = ""
        if shots and pred1 == 1:
            second_prompt = filled_questionnaire + "\n\n" + shots
            pred2, reason2 = get_single_response(second_prompt, max_retries=max_retries)

        # Final prediction: use pred2 if available, otherwise use pred1
        final_pred = pred2 if pred2 in [0, 1] else pred1

        # Escape newlines/tabs in explanation responses
        safe_reason1 = reason1.replace("\n", "\\n").replace("\t", " ")
        safe_reason2 = reason2.replace("\n", "\\n").replace("\t", " ") if reason2 else ""

        # Write result to output file
        with open(output_path, 'a', encoding='utf-8') as f:
            line = f"{pmid}\t{title}\t{abstract}\t{mesh}\t{true_label}\t{pred1}\t{safe_reason1}\t{pred2}\t{safe_reason2}\t{final_pred}\n"
            f.write(line)

        # Evaluate model performance with final prediction
        if final_pred in [0, 1]:
            y_true.append(true_label)
            y_pred.append(final_pred)

            acc = accuracy_score(y_true, y_pred)
            prec = precision_score(y_true, y_pred, zero_division=0)
            rec = recall_score(y_true, y_pred, zero_division=0)
            f1 = f1_score(y_true, y_pred, zero_division=0)

            progress.set_postfix({
                'acc': f'{acc:.3f}',
                'prec': f'{prec:.3f}',
                'rec': f'{rec:.3f}',
                'f1': f'{f1:.3f}',
            })

    # Final evaluation results
    evaluate_predictions(output_path)


## Example Usage

In [None]:
process_file('/content/biomarker_aft.tsv', biomarker_questionnaire)  # without step 2
process_file('/content/fbnstp_aft.tsv', fbnstp_questionnaire, fbnstp_shots)  # with step 2

In [None]:
evaluate_predictions('/content/biomarker_aft_out.tsv')