### Connect with NCBI Pubmed
* Provide a valid email address

In [1]:
from Bio import Entrez

## Provide your email for NCBI
Entrez.email = "qwei@systemsbiology.org"

## Search PubMed for articles related to "Acute Myeloid Leukemia"
## API documents: https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch
## retstart: Sequential index of the first UID in the retrieved set to be shown in the XML output (default = 0)
## retmax: Total number of UIDs from the retrieved set to be shown in the XML output (default = 20)
## Note: For PubMed, ESearch can only retrieve the first 10,000 records matching the query. 
##To obtain more than 10,000 PubMed records, consider using <EDirect> that contains 
## additional logic to batch PubMed search results automatically so that an arbitrary number can be retrieved

search_term = "Acute Myeloid Leukemia"
handle = Entrez.esearch(db="pubmed", term=search_term, retmax=40, restart=0)
record = Entrez.read(handle)
id_list = record["IdList"]

## Fetch the details (abstracts) for the articles
## retmode: Retrieval mode. This parameter specifies the data format of the records returned, such as plain text, 
## HMTL or XML. See Table 1 for a full list of allowed values for each database.
## Table 1: https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly

handle = Entrez.efetch(db="pubmed", id=id_list, rettype="abstract", retmode="text")
abstracts = handle.read()
print(abstracts)

1. Curr Issues Mol Biol. 2025 Mar 4;47(3):173. doi: 10.3390/cimb47030173.

Exploring the Therapeutic Potential of the DOT1L Inhibitor EPZ004777 Using 
Bioinformatics and Molecular Docking Approaches in Acute Myeloid Leukemia.

Kivrak M(1), Nalkiran I(2), Sevim Nalkiran H(2).

Author information:
(1)Department of Biostatistics and Medical Informatics, Faculty of Medicine, 
Recep Tayyip Erdogan University, 53020 Rize, Türkiye.
(2)Department of Medical Biology, Faculty of Medicine, Recep Tayyip Erdogan 
University, 53020 Rize, Türkiye.

BACKGROUND: Acute myeloid leukemia (AML) is a malignancy characterized by the 
clonal expansion of hematopoietic stem and progenitor cells, often associated 
with mutations such as NPM1. DOT1L inhibitors have shown potential as new 
therapeutic opportunities for NPM1-mutant AML. The aim of this study was to 
investigate potential alternative targets of the small-molecule inhibitor 
EPZ004777, in addition to its primary target, DOT1L, using RNA sequencing d

Load the total results extracted using eDirect from Pubmed

In [None]:
## Notice!! Please change the file path of following codes into your own
extracted_pubmed_file_path = '/Users/Weiqi0/ISB_working/Ilya_lab/Translator/Pharmagenomics_KG/Pubmed_query_results/'


import os
file_path = 'AML_Pubmed_abstracts.txt'
if os.path.exists(extracted_pubmed_file_path + file_path):
    with open(file_path, 'r') as file:
        content = file.read()
        print(content)
else:
    print(f"Error: File '{file_path}' not found.")

Load the pre-trained LLM model and try extracting info from the extracted abstract

In [3]:
## workaround to fix this issue
## cannot import name 'PILLOW_VERSION' from 'PIL'

from PIL import __version__ as PILLOW_VERSION

## Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext")
model = AutoModelForMaskedLM.from_pretrained("microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext")

ModuleNotFoundError: No module named 'PIL'

In [None]:
## 2. Locally use deepseek LLM model


In [None]:
## 1. Prompt engineering method



In [None]:
## 3. fine tuning with FDA drugs

## drug or drug combination
## or diseases 