### Connect with NCBI Pubmed
* Provide a valid email address

In [21]:
from Bio import Entrez

## Provide your email for NCBI
Entrez.email = "qwei@systemsbiology.org"

## Search PubMed for articles related to "Acute Myeloid Leukemia"
## API documents: https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch
## retstart: Sequential index of the first UID in the retrieved set to be shown in the XML output (default = 0)
## retmax: Total number of UIDs from the retrieved set to be shown in the XML output (default = 20)
## Note: For PubMed, ESearch can only retrieve the first 10,000 records matching the query. 
##To obtain more than 10,000 PubMed records, consider using <EDirect> that contains 
## additional logic to batch PubMed search results automatically so that an arbitrary number can be retrieved

search_term = "Acute Myeloid Leukemia[Title/Abstract] AND clinical trial[Publication Type]"
handle = Entrez.esearch(db="pubmed", term=search_term, retmax=2, restart=0)
record = Entrez.read(handle)
id_list = record["IdList"]

## Fetch the details (abstracts) for the articles
## retmode: Retrieval mode. This parameter specifies the data format of the records returned, such as plain text, 
## HMTL or XML. See Table 1 for a full list of allowed values for each database.
## Table 1: https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly

handle = Entrez.efetch(db="pubmed", id=id_list, rettype="abstract", retmode="text")
abstracts = handle.read()
print(abstracts)

1. Cancer. 2025 Apr 15;131(8):e35840. doi: 10.1002/cncr.35840.

Cladribine, idarubicin, and cytarabine (CLIA) for patients with relapsed and/or 
refractory acute myeloid leukemia: A single-center, single-arm, phase 2 trial.

Goulart H(1), Kantarjian H(2), Borthakur G(2), Daver N(2), DiNardo CD(2), 
Jabbour E(2), Pemmaraju N(2), Alvarado Y(2), Atluri H(1), Yilmaz M(2), Haddad 
FG(2), Marx KR(3), Rausch C(3), Loghavi S(4), Jain N(2), Garcia-Manero G(2), 
Ravandi-Kashani F(2), Kadia TM(2).

Author information:
(1)Division of Cancer Medicine, The University of Texas MD Anderson Cancer 
Center, Houston, Texas, USA.
(2)Department of Leukemia, The University of Texas MD Anderson Cancer Center, 
Houston, Texas, USA.
(3)Division of Pharmacy, The University of Texas MD Anderson Cancer Center, 
Houston, Texas, USA.
(4)Department of Hematopathology, The University of Texas MD Anderson Cancer 
Center, Houston, Texas, USA.

BACKGROUND: The treatment of relapsed and/or refractory (R/R) acute myeloid 

Load the total results (limited to only clinical trial) extracted using eDirect from Pubmed
* esearch -db pubmed -query "Acute Myeloid Leukemia[Title/Abstract] AND clinical trial[Publication Type]" | efetch -format abstract > AML_Pubmed_abstracts_only_clinical_trial.txt

In [7]:
## Notice!! Please change the file path of following codes into your own
extracted_pubmed_file_path = '/Users/Weiqi0/ISB_working/Ilya_lab/Translator/Pharmagenomics_KG/Pubmed_query_results/'


import os
file_path = 'AML_Pubmed_abstracts_only_clinical_trial.txt'
if os.path.exists(extracted_pubmed_file_path + file_path):
    with open(extracted_pubmed_file_path + file_path, 'r') as file:
        content = file.read()
#         print(content)
else:
    print(f"Error: File '{file_path}' not found.")

Load the pre-trained LLM model and try extracting info from the extracted abstract
* Based on Ollama platform
* selected a variant of DeepSeek
* Pre-requirements are here: https://docs.google.com/document/d/1qyxAB0aZqSJWEu04XcsgkfoMKUkZVBRsfbxyMGTU1Z8/edit?tab=t.0
* ollama.chat() parameters:
* model = : load the pulled LLM model
* messages = : the list of messages represents the history of a conversation between the user and the model
    * role = : 
    * what the user asks is represented by the messages with role = user
    * what the model answers is represented by the messages with role = assistant
    * finally, the first message is usually a message with role = system which describes how the model should behave when providing an answer. 
    * Notice that the chat endpoint does not have a dedicated System parameter: that’s because that’s the job of the first message you send)

In [2]:
## 2. Locally use deepseek LLM model, test case
## example case

import ollama
response = ollama.chat(
    model="deepseek-r1:14b",
    messages=[
        {"role": "user", "content": "Explain Newton's second law of motion"},
    ],
)
print(response["message"]["content"])

<think>
Okay, so I need to explain Newton's Second Law of Motion. Hmm, let me start by recalling what I know about it. I remember that Newton had three laws of motion, and the second one is often the most talked about. It has something to do with force, mass, and acceleration. 

Wait, wasn't there a formula related to this? Oh right, F equals m times a. So F = ma. That makes sense because force depends on both mass and acceleration. But I'm not entirely sure how all these concepts interact.

Let me break it down. Force is what causes an object to accelerate, but the acceleration also depends on the mass of the object. So if you have a heavy object, like a car, you need more force to make it accelerate compared to a lighter object, like a bicycle. That's why cars usually have engines with higher power than bicycles.

But wait, is it about the force causing acceleration or something else? I think it's specifically about how much force is needed to change an object's motion. So if an obje

In [12]:
## 1. Prompt engineering method
import ollama
response = ollama.chat(
    model="deepseek-r1:14b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant in biomedical research:\
                                        extracting potential edges (e.g. between protein, genes, diseases)\
                                        from given abstracts"},
        {"role": "user", "content": abstracts},
    ],
)
print(response["message"]["content"])

<think>
Okay, so I'm trying to understand these two studies about treatments for acute myeloid leukemia (AML). Let me read through them again and try to make sense of the key points.

The first study is about a treatment called BCL-2 inhibitor, specifically venetoclax combined with decitabine and azacitidine. They studied 85 patients who were either newly diagnosed or had relapsed or refractory AML. The results showed that 67% of these patients achieved remission, which is pretty good. Most of the responders were in the de novo group (newly diagnosed), while only 20% in the relapsed/refractory group responded. They also noted some common side effects like nausea and fatigue but no major treatment-related deaths.

The second study looks at bemcentinib, an AXL inhibitor, used either alone or with low-dose cytarabine in relapsed/refractory AML patients. In the monotherapy arm, 32 patients were treated, and they found that a dose of 400mg loading followed by 200mg maintenance was safe. Whe

In [13]:
## 1. Prompt engineering method
import ollama
response = ollama.chat(
    model="deepseek-r1:14b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant in biomedical research:\
                                        extracting potential edges (e.g. between protein, genes, diseases)\
                                        from given abstracts"},
        {"role": "system", "content": "Please provide a final answer in following format:\
                                        subject -> predicate -> object"},
        {"role": "user", "content": abstracts},
    ],
)
print(response["message"]["content"])

<think>
Okay, so I'm trying to understand these two clinical trials mentioned in the provided text. Both seem to be related to treating acute myeloid leukemia (AML), especially for patients who are not suitable for intensive chemotherapy. Let me break down each study and then compare them.

Starting with the first trial: It's about using a combination of palliative care and low-intensity therapy, specifically targeting patients with relapsed or refractory AML. The primary endpoint here is overall survival, which makes sense because that's a key measure in cancer treatment studies. They used something called "best supportive care," which usually includes managing symptoms and providing comfort. The intervention adds low-dose cytarabine and other palliative treatments. 

The results they found were interesting—adding these therapies didn't significantly improve survival, but there was a trend towards better outcomes with higher dose levels of cytarabine. Also, patients who had favorable 

In [2]:
## double check on the inputs
print(abstracts)

1. Cancer. 2025 Apr 15;131(8):e35840. doi: 10.1002/cncr.35840.

Cladribine, idarubicin, and cytarabine (CLIA) for patients with relapsed and/or 
refractory acute myeloid leukemia: A single-center, single-arm, phase 2 trial.

Goulart H(1), Kantarjian H(2), Borthakur G(2), Daver N(2), DiNardo CD(2), 
Jabbour E(2), Pemmaraju N(2), Alvarado Y(2), Atluri H(1), Yilmaz M(2), Haddad 
FG(2), Marx KR(3), Rausch C(3), Loghavi S(4), Jain N(2), Garcia-Manero G(2), 
Ravandi-Kashani F(2), Kadia TM(2).

Author information:
(1)Division of Cancer Medicine, The University of Texas MD Anderson Cancer 
Center, Houston, Texas, USA.
(2)Department of Leukemia, The University of Texas MD Anderson Cancer Center, 
Houston, Texas, USA.
(3)Division of Pharmacy, The University of Texas MD Anderson Cancer Center, 
Houston, Texas, USA.
(4)Department of Hematopathology, The University of Texas MD Anderson Cancer 
Center, Houston, Texas, USA.

BACKGROUND: The treatment of relapsed and/or refractory (R/R) acute myeloid 

In [7]:
## Ollama's notes on how to create structured output
## https://ollama.com/blog/structured-outputs
from pydantic import BaseModel
# from typing import List

class Edges(BaseModel):
    subject: list[str]
    predicate: list[str]
    object: list[str]
    

## 1. Prompt engineering method
import ollama
response = ollama.chat(
    model="deepseek-r1:14b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant in biomedical research:\
                                        extracting all potential edges (e.g. between protein, genes, diseases)\
                                        from given abstracts for each paper"},
        {"role": "system", "content": "Please provide a final answer in following format:\
                                        subject -> predicate -> object"},
        {"role": "user", "content": abstracts},
    ],
      format=Edges.model_json_schema(),
)
print(response["message"]["content"])

{ "subject": ["cancer", "acute myeloid leukemia"], "predicate": ["is treated with", "has"], "object": ["cladribine", "idarubicin", "cytarabine", "pegylated liposomal doxorubicin", "granulocyte colony-stimulating factor"] }
        												


In [8]:
structured_output = Edges.model_validate_json(response.message.content)
print(structured_output)

subject=['cancer', 'acute myeloid leukemia'] predicate=['is treated with', 'has'] object=['cladribine', 'idarubicin', 'cytarabine', 'pegylated liposomal doxorubicin', 'granulocyte colony-stimulating factor']


### Obtain a sample abstract
* take the first abstract of a clinical trial paper
* create a sample_abstract
* manually created an expected output for the LLM to use as an example

In [1]:
from Bio import Entrez

## Provide your email for NCBI
Entrez.email = "qwei@systemsbiology.org"

## Search PubMed for articles related to "Acute Myeloid Leukemia"
## API documents: https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch
## retstart: Sequential index of the first UID in the retrieved set to be shown in the XML output (default = 0)
## retmax: Total number of UIDs from the retrieved set to be shown in the XML output (default = 20)
## Note: For PubMed, ESearch can only retrieve the first 10,000 records matching the query. 
##To obtain more than 10,000 PubMed records, consider using <EDirect> that contains 
## additional logic to batch PubMed search results automatically so that an arbitrary number can be retrieved

search_term = "Acute Myeloid Leukemia[Title/Abstract] AND clinical trial[Publication Type]"
handle = Entrez.esearch(db="pubmed", term=search_term, retmax=10, restart=0)
record = Entrez.read(handle)
id_list = record["IdList"]

## Fetch the details (abstracts) for the articles
## retmode: Retrieval mode. This parameter specifies the data format of the records returned, such as plain text, 
## HMTL or XML. See Table 1 for a full list of allowed values for each database.
## Table 1: https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly

handle = Entrez.efetch(db="pubmed", id=id_list, rettype="abstract", retmode="text")
sample_abstract = handle.read()
print(sample_abstract)

1. Leuk Res. 2025 May;152:107690. doi: 10.1016/j.leukres.2025.107690. Epub 2025
Apr  2.

Safety, tolerability, and pharmacokinetics of ASP1235 in relapsed or refractory 
acute myeloid leukemia: A phase 1 study.

Al Malki MM(1), Minden MD(2), Rich ES(3), Hill JE(3), Gill SC(3), Fan A(3), 
Fredericks CE(3), Fathi AT(4), Abdul-Hay M(5).

Author information:
(1)Department of Hematology & Hematopoietic Cell Transplantation, City of Hope 
National Medical Center, Duarte, CA, USA. Electronic address: malmalki@coh.org.
(2)Princess Margaret Cancer Centre, Toronto, Ontario, Canada.
(3)Astellas Pharma Global Development, Inc., Northbrook, IL, USA.
(4)Massachusetts General Hospital, Boston, MA, USA.
(5)Laura and Isaac Perlmutter Cancer Center at NYU Langone, New York, NY, USA.

Acute myeloid leukemia (AML) is an aggressive hematologic malignancy. Although 
new agents including targeted therapies for relapsed or refractory (R/R) AML 
have been introduced, poor outcomes remain, requiring the need fo

In [8]:
## The expected output is:

## Option 1 response
# sample_response = "subject=['acute myeloid leukemia']\
# predicate=['is treated with'] object=['cladribine', 'idarubicin', 'cytarabine']"

## Option 2 response
# sample_response = (
#     "'edge': {'subject': 'acute myeloid leukemia', 'predicate': 'is treated with', 'object': 'cladribine'}\n"
#     "'edge': {'subject': 'acute myeloid leukemia', 'predicate': 'is treated with', 'object': 'idarubicin'}\n"
#     "'edge': {'subject': 'acute myeloid leukemia', 'predicate': 'is treated with', 'object': 'cytarabine'}"
# )

sample_response = (
    """{
    "CLIA_treatment_regimen": {
        "name": "CLIA treatment regimen",
        "attributes": {
            "drugs": ["cladribine", "idarubicin", "cytarabine"],
            "dosage": {
                "cladribine": "5 mg/m² intravenously (days 1-5)",
                "cytarabine": "1000 mg/m² intravenously (days 1-5)",
                "idarubicin": "10 mg/m² intravenously (days 1-3)"
            },
            "additional_treatment": {
                "drug": "sorafenib",
                "dosage": "400 mg twice daily (days 1-14)",
                "condition": "FLT3-mutated AML"
            }
        }
    },
    "AML": {
        "name": "Acute Myeloid Leukemia (AML)",
        "attributes": {
            "type": "relapsed and/or refractory"
        }
    },
    "edges": [
        {
            "subject": "CLIA_treatment_regimen",
            "predicate": "is used to treat",
            "object": "AML"
        },
        {
            "subject": "CLIA_treatment_regimen",
            "predicate": "includes drugs",
            "object": ["cladribine", "idarubicin", "cytarabine"]
        }
      ]
}"""
)
print(sample_response)

{
    "CLIA_treatment_regimen": {
        "name": "CLIA treatment regimen",
        "attributes": {
            "drugs": ["cladribine", "idarubicin", "cytarabine"],
            "dosage": {
                "cladribine": "5 mg/m² intravenously (days 1-5)",
                "cytarabine": "1000 mg/m² intravenously (days 1-5)",
                "idarubicin": "10 mg/m² intravenously (days 1-3)"
            },
            "additional_treatment": {
                "drug": "sorafenib",
                "dosage": "400 mg twice daily (days 1-14)",
                "condition": "FLT3-mutated AML"
            }
        }
    },
    "AML": {
        "name": "Acute Myeloid Leukemia (AML)",
        "attributes": {
            "type": "relapsed and/or refractory"
        }
    },
    "edges": [
        {
            "subject": "CLIA_treatment_regimen",
            "predicate": "is used to treat",
            "object": "AML"
        },
        {
            "subject": "CLIA_treatment_regimen",
            "

In [9]:
from Bio import Entrez

## Provide your email for NCBI
Entrez.email = "qwei@systemsbiology.org"

## Search PubMed for articles related to "Acute Myeloid Leukemia"
## API documents: https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch
## retstart: Sequential index of the first UID in the retrieved set to be shown in the XML output (default = 0)
## retmax: Total number of UIDs from the retrieved set to be shown in the XML output (default = 20)
## Note: For PubMed, ESearch can only retrieve the first 10,000 records matching the query. 
##To obtain more than 10,000 PubMed records, consider using <EDirect> that contains 
## additional logic to batch PubMed search results automatically so that an arbitrary number can be retrieved

search_term = "Acute Myeloid Leukemia[Title/Abstract] AND clinical trial[Publication Type]"
handle = Entrez.esearch(db="pubmed", term=search_term, retmax=71, restart=1)
record = Entrez.read(handle)
id_list = record["IdList"]

## Fetch the details (abstracts) for the articles
## retmode: Retrieval mode. This parameter specifies the data format of the records returned, such as plain text, 
## HMTL or XML. See Table 1 for a full list of allowed values for each database.
## Table 1: https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly

handle = Entrez.efetch(db="pubmed", id=id_list, rettype="abstract", retmode="text")
abstracts = handle.read()
# print(abstracts)

In [10]:
## Ollama's notes on how to create structured output
## https://ollama.com/blog/structured-outputs
from pydantic import BaseModel
# from typing import List

## Option 1
# class Edges(BaseModel):
#     subject: list[str]
#     predicate: list[str]
#     object: list[str]

## Option 2
## add a desired Pubmed id
## add a desired clinical trial id
class Edge(BaseModel):
    subject: str
    predicate: str
    object: str

class Edges(BaseModel):
    edge: Edge

## 1. Prompt engineering method
import ollama
## Set a fix number of potential responses
for i in range(70):
    response = ollama.chat(
        model="deepseek-r1:14b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant in biomedical research:\
                                        extracting all possible / potential edges (e.g. between protein, genes, diseases)\
                                        from given abstracts for each paper"},
            {"role": "system", "content": "Please provide answers satisfing following relationship:\
                                            subject -> predicate -> object"},
            {"role": "system", "content": "Q:Based on this abstract, " + sample_abstract + "A:" + sample_response},
            {"role": "user", "content": abstracts},
        ],
          format=Edges.model_json_schema(),
    #     options={'temperature': 70},  # temp = 0: Makes responses more deterministic
    )
    print(response["message"]["content"])

{ "edge": { "subject": "Hematology", "predicate": "is_relevant_for", "object": "AML" } }
       					   				
{ "edge": { "subject": "health", "predicate": "is_related_to", "object": "cancer" } } 
{ "edge": {"subject": "AML", "predicate": "is treated with", "object": "Haplo-cord HCT" }}
   							  								
{ "edge": { "subject": "AML", "predicate": "showed positive result in study about", "object": "Haplo-cord HCT" } }
        										  
{ "edge": { "subject": "Acute Myeloid Leukemia", "predicate": "is treated by", "object": "Haplo-cord hematopoietic cell transplantation" } }
  													  			
{ "edge": { "subject": "AML", "predicate": "is treated by", "object": "Haplo-cord HCT" } }
   		   							    	
{ "edge": {"subject":"Hematology","predicate":"is_about","object":"Acute Myeloid Leukemia"} }
    																
{ "edge": { "subject": "aml", "predicate": "better than", "object": "haplo-HCT" } } 
{ "edge": {"subject": "Blood Diseases and Disorders", "predicate": "isRelatedTo", 

KeyboardInterrupt: 

In [13]:
## Ollama's notes on how to create structured output
## https://ollama.com/blog/structured-outputs
from pydantic import BaseModel
# from typing import List

## Option 1
# class Edges(BaseModel):
#     subject: list[str]
#     predicate: list[str]
#     object: list[str]

## Option 2
## add a desired Pubmed id
## add a desired clinical trial id
class Attributes(BaseModel):
    Note:list[str]
    Type: str
    
class Node(BaseModel):
    name:str
    attributes: Attributes

class Edge(BaseModel):
    subject: Node
    predicate: str
    object: Node

class Edges(BaseModel):
    edge: Edge

## 1. Prompt engineering method
import ollama
## Set a fix number of potential responses
for i in range(10):
    response = ollama.chat(
        model="deepseek-r1:14b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant in biomedical research:\
                                        extracting all possible / potential edges (e.g. between protein, genes, diseases)\
                                        from given abstracts for each paper"},
            {"role": "system", "content": "Please provide answers satisfing following relationship:\
                                            subject -> predicate -> object"},
            {"role": "system", "content": "Q:Based on this abstract, " + sample_abstract + "A:" + sample_response},
            {"role": "user", "content": abstracts},
        ],
          format=Edges.model_json_schema(),
    #     options={'temperature': 70},  # temp = 0: Makes responses more deterministic
    )
    print(response["message"]["content"])

{ "edge": { "subject": { "name": "急性髓性白血病", "attributes": { "Drugs": ["达沙替尼"], "Type": "成人白血病" } }, "predicate": "与...相关", "object": { "name": "细胞免疫治疗", "attributes": { "Drugs": ["伊马替尼"] ,"Type": "儿童白血病" } } } }
  							   				
{ "edge": { "subject": { "name": "AML", "attributes": { "Drugs": ["Lenalidomide"] , "Type": "Acute leukemia" } }, "predicate": "IS_TREATED_WITH", "object": { "name": "Haplo-cord Hematopoietic Cell Transplantation (HCT)", "attributes": { "Drugs": ["Bortezomib"] , "Type": "Hematopoietic stem cell transplantation" } } } }
   							   							
{ "edge": {"subject": {"name": "AML", "attributes": {"Drugs": ["Immunosuppressive drugs"], "Type": "Acute myeloid leukemia"}}, "predicate": "treated_with", "object": {"name": "Immunosuppressive drugs", "attributes": {"Drugs": ["Cyclophosphamide", "Cyclosporine"] , "Type": "Immunosuppressant"}}} }
   										  			  
{ "edge" : { "subject" : { "name" : "急性白血病", "attributes": { "Drugs": ["吉西他滨","伊马替尼"], "Type": "AML" } }, "pre

In [14]:
## Ollama's notes on how to create structured output
## https://ollama.com/blog/structured-outputs
from pydantic import BaseModel
# from typing import List

## Option 1
# class Edges(BaseModel):
#     subject: list[str]
#     predicate: list[str]
#     object: list[str]

## Option 2
## add a desired Pubmed id
## add a desired clinical trial id
class Attributes(BaseModel):
    Note:list[str]
    Type: str
    
class Node(BaseModel):
    name:str
    attributes: Attributes

class Edge(BaseModel):
    subject: Node
    predicate: str
    object: Node

class Edges(BaseModel):
    edge: Edge

## 1. Prompt engineering method
import ollama
## Set a fix number of potential responses
for i in range(10):
    response = ollama.chat(
        model="deepseek-r1:14b",
        messages=[
            {"role": "system", "content": "You are a helpful assistant in biomedical research:\
                                        extracting all possible / potential edges (e.g. between protein, genes, diseases)\
                                        from given abstracts for each paper"},
            {"role": "system", "content": "Please provide answers satisfing following relationship:\
                                            subject -> predicate -> object"},
            {"role": "system", "content": "Q:Based on this abstract, " + sample_abstract + "A:" + sample_response},
            {"role": "user", "content": abstracts},
        ],
          format=Edges.model_json_schema(),
    #     options={'temperature': 70},  # temp = 0: Makes responses more deterministic
    )
    print(response["message"]["content"])

{ "edge": { "subject": { "name": "Leukemia", "attributes": { "Note": [ "Hematopoietic malignancy" ] , "Type": "acute myeloid leukemia (AML)" } }, "predicate": "is a type of", "object": { "name": "Disease", "attributes": { "Note": [ "Cancer" ] , "Type": "Malignant neoplasm" } } } }
   							   							
{ "edge" : {"subject": {"name": "急性髓性白血病（AML）", "attributes": {"Note": ["AML是成人中常见和致命的血液系统恶性肿瘤。治疗方案包括化疗、靶向治疗和造血干细胞移植。"] ,"Type": "疾病" } }, "predicate": "提出解决方案", "object": { "name": "异基因造血干细胞移植（allo-HSCT）", "attributes": {"Note": ["allo-HSCT是AML患者的主要治疗方法之一，特别是对于年轻且适合的患者。"], "Type": "治疗方案" }} } }
   					  					  		 
{ "edge": { "subject": { "name": "acute myeloid leukemia", "attributes": { "Note": [ "Note: acute myeloid leukemia is a type of blood and bone marrow cancer." ] ,"Type": "leukemia" } }, "predicate": "has therapu_2019", "object": { "name": "Immunotherapy药物治疗", "attributes": { "Note": [ "Note: 免疫疗法是一种利用人体免疫系统来对抗疾病的方法。" ] ,"Type": "therapeutic method" } } } }
               			
{

In [None]:
## Ollama's notes on how to create structured output
## https://ollama.com/blog/structured-outputs
from pydantic import BaseModel
# from typing import List

class Subject(BaseModel):
    subject_name: str
#     subject_type: str

class Object(BaseModel):
    object_name: str
#     object_type: str

class Predicate(BaseModel):
    predicate_name: str

    
class Edges(BaseModel):
    edge: list[[Subject, Predicate, Object]]
    

## 1. Prompt engineering method
import ollama
response = ollama.chat(
    model="deepseek-r1:14b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant in biomedical research:\
                                        extracting all potential edges (e.g. between protein, genes, diseases, medications, compounds)\
                                        from given abstracts for each paper"},
#         {"role": "system", "content": "Please note that there are multiple abstracts, each starts with number 1., 2., 3., etc."},
#         {"role": "system", "content": "Please provide a final answer in following format:\
#                                         subject -> predicate -> object"},
        {"role": "user", "content": abstracts},
        {"role": "user", "content": 'I have provided you with 2 abstracts. Return a list of edges.'},
    ],
    format=Edges.model_json_schema(),
    options={'temperature': 0},  # Make responses more deterministic
)
print(response["message"]["content"])

In [None]:
## 3. fine tuning with FDA drugs

## drug or drug combination
## or diseases 