​To retrieve scientific documents related to healthspan and lifespan from PubMed using Python, you can leverage LangChain's integration with the PubMed API. LangChain offers tools like PubMedRetriever and PubMedLoader to facilitate this process. Here's how you can implement each approach:​

1. Using PubMedRetriever:

The `PubMedRetriever` allows you to fetch document summaries based on a query.​ 
 - [LangChain Python API](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.pubmed.PubMedAPIWrapper.html?utm_source=chatgpt.com)
 - [GitHub](https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/utilities/pubmed.py?utm_source=chatgpt.com)

Installation:

Ensure you have the necessary package installed:

In [None]:
pip install xmltodict


_Implementation:_

In [2]:
from langchain.retrievers import PubMedRetriever

In [4]:
from langchain.retrievers import PubMedRetriever

# Initialize the retriever
retriever = PubMedRetriever()

# Define your query
query = "healthspan and lifespan"

# Retrieve documents
docs = retriever.get_relevant_documents(query)

# Display the results with metadata inspection
for doc in docs:
    print("Available metadata keys:", doc.metadata.keys())
    title = doc.metadata.get('title', 'No title available')
    authors = doc.metadata.get('authors', 'No authors available')
    source = doc.metadata.get('source', 'No source available')
    print(f"Title: {title}")
    print(f"Authors: {authors}")
    print(f"Source: {source}")
    print("-" * 50)


Available metadata keys: dict_keys(['uid', 'Title', 'Published', 'Copyright Information'])
Title: No title available
Authors: No authors available
Source: No source available
--------------------------------------------------
Available metadata keys: dict_keys(['uid', 'Title', 'Published', 'Copyright Information'])
Title: No title available
Authors: No authors available
Source: No source available
--------------------------------------------------
Available metadata keys: dict_keys(['uid', 'Title', 'Published', 'Copyright Information'])
Title: No title available
Authors: No authors available
Source: No source available
--------------------------------------------------


​To retrieve scientific documents from PubMed using a list of 69 keywords, you can construct a comprehensive query that combines these keywords effectively. Given the complexity and length of such a query, it's advisable to manage and execute it programmatically using Python. Here's how you can approach this

1. Constructing the Query:

PubMed's search syntax allows the use of Boolean operators like AND and OR to combine search terms. To search for documents containing any of your specified keywords, you can combine them using the OR operator. For example, if your keywords are stored in a list named keywords, you can create a query string as follows:

In [6]:
keywords = ["keyword1", "keyword2", "keyword3", "keyword69"]
query = " OR ".join(keywords)


This concatenates all your keywords into a single string separated by OR, instructing PubMed to retrieve documents that contain any of these terms.

2. Using the PubMed API with Python:

To interact with PubMed programmatically, you can utilize the Biopython library, which provides an interface to the Entrez API. First, ensure you have Biopython installed:

In [7]:
pip install biopython


Collecting biopython
  Downloading biopython-1.85-cp311-cp311-win_amd64.whl.metadata (13 kB)
Downloading biopython-1.85-cp311-cp311-win_amd64.whl (2.8 MB)
   ---------------------------------------- 0.0/2.8 MB ? eta -:--:--
   -------------- ------------------------- 1.0/2.8 MB 5.6 MB/s eta 0:00:01
   -------------------------- ------------- 1.8/2.8 MB 5.3 MB/s eta 0:00:01
   ---------------------------------------- 2.8/2.8 MB 5.0 MB/s eta 0:00:00
Installing collected packages: biopython
Successfully installed biopython-1.85
Note: you may need to restart the kernel to use updated packages.


Next, you can use the following script to search PubMed with your constructed query and fetch relevant document summaries:

In [9]:
from Bio import Entrez

# Set your email here; NCBI requires it for identification
Entrez.email = "your_email@example.com"

# Construct the search query
keywords = ["keyword1", "keyword2", "keyword3", "keyword69"]
query = " OR ".join(keywords)

# Search PubMed
handle = Entrez.esearch(db="pubmed", term=query, retmax=100)
record = Entrez.read(handle)
handle.close()

# Fetch the list of PubMed IDs (PMIDs)
pmids = record["IdList"]

# Retrieve the corresponding articles
handle = Entrez.efetch(db="pubmed", id=pmids, rettype="abstract", retmode="text")
articles = handle.read()
handle.close()

# Output the results
print(articles)


1. Heliyon. 2024 Aug 31;10(17):e36727. doi: 10.1016/j.heliyon.2024.e36727. 
eCollection 2024 Sep 15.

Modulation of liver cholesterol homeostasis by choline supplementation during 
fibrosis resolution.

Saijou E(1), Kamiya Y(1), Fujiki K(2), Shirahige K(2), Nakato R(1).

Author information:
(1)Laboratory of Computational Genomics, Institute for Quantitative Biosciences, 
The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.
(2)Laboratory of Genome Structure and Function, Institute for Quantitative 
Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-0032, 
Japan.

Liver fibrosis is a critical global health challenge, often leading to severe 
liver diseases without timely intervention. Choline deficiency has been linked 
to metabolic dysfunction associated steatohepatitis (MASH) and liver fibrosis, 
suggesting choline supplementation as a potential therapeutic approach. This 
study aimed to explore the therapeutic potential of choline supplementat

#### Explanation:

 - **Email Identification:** The `Entrez.email` parameter should be set to your email address. This is required by NCBI to identify the user and is essential for API access.​

 - **Search Query:** The `esearch` function searches the PubMed database (`db="pubmed"`) using your constructed query (`term=query`). The `retmax` parameter specifies the maximum number of results to retrieve; adjust this number based on your needs.​

 - **Fetching Articles:** The `efetch` function retrieves the full records for the list of PMIDs obtained from the search. The `rettype="abstract"` and `retmode="text"` parameters specify that you want the abstracts in plain text format.

#### Considerations:

 - *API Usage Limits:* NCBI imposes usage limits on the Entrez API. Ensure you adhere to their guidelines to avoid being blocked. Detailed information can be found in the [NCBI Entrez Programming Utilities Help](https://www.ncbi.nlm.nih.gov/books/NBK25497/).​

 - *Query Length:* PubMed has a maximum query length. If your combined query exceeds this limit, you may need to split your keywords into smaller groups and perform multiple searches.​

 - *Advanced Query Techniques:* For complex queries, consider using PubMed's advanced search features, such as field tags and proximity operators, to refine your search results. More information is available in the [PubMed User Guide](https://pubmed.ncbi.nlm.nih.gov/help/).