**PubMed Searcher Demo**

This notebook is intended to demonstrate how to use the `PubMedSearcher` class to search, download, and explore references from PubMed queries.

To create instance of the PubMedSearcher class, just provide a search string, an email, and an optional dataframe that will be used to store the results.

The important methods of the class are:
- `search`: to search for references in PubMed. This will populate the df attribute with results, including the title, abstract, and other metadata. Does not need to be called if a dataframe is provided at instantiation.
- `check_open_access`: to check if the articles in the search results are open access. This will populate the df attribute with a column indicating if the article is open access, and with the URL to the open access version. You can use this to check for search results even if you didn't use the `search` method, as long as the df attribute is populated with a doi column.
- `fetch_references`: Fetches references from each article in the search results using APIs. Returns a JSON object with the references.
- `standardize_references`: Standardizes the references in the search results into a new column. Values are lists of dicts, with these keys: ['doi', 'pmid', 'pmcid', 'title', 'authors']
- `fetch_cited_by`: Fetches articles that cite each article in the search results using APIs. Returns a JSON object with the references.
- `save`: Saves the dataframe with the search results to a CSV file (default name is `master_list.csv`)
- `save_abstracts_as_csv`: Saves the abstracts of the search results to a CSV file (default name is `abstracts.csv`)
- `download_articles`: Optional, Downloads PDFs for the articles found in the search. Uses open access links when available, but can fallback to Sci-Hub.
- `download_xml_fulltext`: Optional, Downloads the full text of the articles in XML format. Not commonly available, but works for articles in the PubMed open-access subset.


In [3]:
from calvin_utils.gpt_sys_review.pubmed_utils import PubMedSearcher
import pandas as pd

# Example usage
search_string = """
("brain lesion" 
    OR "brain tumor" 
    OR "brain mass" 
    OR "intracranial lesion" 
    OR "intracranial tumor"
    OR "intracranial mass") 
AND 
    "humans"[MeSH Terms]
"""
email = "lesion_bank@gmail.com"

# Uncomment the following lines if you have a master list of references to search for citing articles
# df = pd.read_csv('master_list.csv')
# searcher = PubMedSearcher(search_string=search_string, email=email, df=df)

searcher = PubMedSearcher(search_string=search_string, email=email)
searcher.search(count=10, min_date=2012, order_by='relevance') # Searches pubmed for the search string; Returns the first 7 results from 2012 onwards
searcher.check_open_access() # Optional (will determine if the articles are open access and find pdf links if they are, but takes longer)
searcher.fetch_references() # Will hit pubmed, pubmed central, european medline, and crossref; Returns whatever it finds first;
searcher.standardize_references() # Standardizes the references to a list of dictionaries (keys: doi, pmid, pmcid, title, authors)
searcher.fetch_cited_by() # Looks for articles that cite the given paper (not very dependable, best for European Medline)

# Optional: download the articles as PDFs
# searcher.download_articles(download_directory='PDFs', allow_pypaperbot=True, save_progress=True, max_downloads=50) # Optional (prefers open access, but falls back to SciHub if allowed)

# Optional: Save the abstracts to a separate CSV file. Not really necessary since the abstracts are already in the master list
# searcher.save_abstracts_as_csv() # If wanted, saves the PMIDs and abstracts to a CSV file 

searcher.save('master_list.csv') # Saves the master list of references to a CSV file
searcher.df

Checking Open Access: 100%|██████████| 10/10 [00:02<00:00,  4.19it/s]
Fetching References: 100%|██████████| 10/10 [00:07<00:00,  1.29it/s]
Standardizing references: 100%|██████████| 10/10 [00:00<00:00, 6348.27it/s]
Fetching Cited By: 100%|██████████| 10/10 [00:01<00:00,  9.68it/s]


Unnamed: 0,pmid,title,authors,first_author,abstract,publication_date,publication_year,journal_info,doi,pmcid,...,is_oa,best_oa_location_url,pdf_url_1,pdf_url_2,pdf_url_3,pdf_url_4,europe_pmc_url,references,references_standardized,cited_by
0,30084265,An overview of meningiomas.,"Buerki, Robin A; Horbinski, Craig M; Kruser, T...",Buerki,Meningiomas are the most common primary intrac...,"{'Year': '2018', 'Month': '08', 'Day': '07'}",2018.0,"Future oncology (London, England)",10.2217/fon-2018-0006,PMC6123887,...,True,https://europepmc.org/articles/pmc6123887?pdf=...,https://europepmc.org/articles/pmc6123887?pdf=...,,,,https://europepmc.org/articles/pmc6123887,[{'citation': 'Chamberlain MC. Meningiomas. In...,"[{'doi': None, 'pmid': None, 'pmcid': None, 't...","[{'source': 'MED', 'citationType': 'review; jo..."
1,32094452,Meningeal lymphatic vessels regulate brain tum...,"Hu, Xueting; Deng, Qiuping; Ma, Lu; Li, Qingqi...",Hu,Recent studies have shown that meningeal lymph...,"{'Year': '2020', 'Month': '02', 'Day': '24'}",2020.0,Cell research,10.1038/s41422-020-0287-8,PMC7054407,...,True,https://www.nature.com/articles/s41422-020-028...,https://www.nature.com/articles/s41422-020-028...,https://europepmc.org/articles/pmc7054407?pdf=...,,,https://europepmc.org/articles/pmc7054407,"[{'citation': 'Engelhardt B, Vajkoczy P, Welle...","[{'doi': None, 'pmid': '28092374', 'pmcid': No...","[{'source': 'MED', 'citationType': 'review-art..."
2,36219688,Cellular immunotherapy for medulloblastoma.,"Schakelaar, Michael Y; Monnikhof, Matthijs; Cr...",Schakelaar,Medulloblastoma (MB) is the most common malign...,{},,Neuro-oncology,10.1093/neuonc/noac236,PMC10076947,...,True,https://academic.oup.com/neuro-oncology/advanc...,https://academic.oup.com/neuro-oncology/advanc...,,,,,"[{'citation': 'Northcott PA, Robinson GW, Krat...","[{'doi': None, 'pmid': '30765705', 'pmcid': No...",
3,35009911,Brain Tumor/Mass Classification Framework Usin...,"Alanazi, Muhannad Faleh; Ali, Muhammad Umair; ...",Alanazi,"With the advancement in technology, machine le...","{'Year': '2022', 'Month': '01', 'Day': '04'}",2022.0,"Sensors (Basel, Switzerland)",10.3390/s22010372,PMC8749789,...,True,https://www.mdpi.com/1424-8220/22/1/372/pdf?ve...,https://www.mdpi.com/1424-8220/22/1/372/pdf?ve...,,,,,"[{'citation': 'Louis D.N., Perry A., Reifenber...","[{'doi': '10.1007/s00401-016-1545-1', 'pmid': ...",
4,32734466,Approach to an Intracranial Mass in Patients W...,"Elicer, Isabel",Elicer,Space-occupying lesions represent a diagnostic...,"{'Year': '2020', 'Month': '07', 'Day': '30'}",2020.0,Current neurology and neuroscience reports,10.1007/s11910-020-01058-y,,...,False,,,,,,,"[{'key': '1058_CR1', 'doi-asserted-by': 'cross...","[{'doi': '10.1371/journal.pone.0098666', 'pmid...",
5,1547580,Intracranial inflammatory pseudotumor.,"Sitton, J E; Harkin, J C; Gerber, M A",Sitton,An intracranial mass thought clinically and by...,{},,Clinical neuropathology,,,...,False,,,,,,,Not found,[],
6,24331626,Primary leptomeningeal melanoma.,"Xie, Zhao-Yu; Hsieh, Kevin Li-Chun; Tsang, Yuk...",Xie,Primary melanoma of the central nervous system...,"{'Year': '2013', 'Month': '10', 'Day': '23'}",2013.0,Journal of clinical neuroscience : official jo...,10.1016/j.jocn.2013.08.018,,...,False,,,,,,,"[{'key': '10.1016/j.jocn.2013.08.018_b0005', '...","[{'doi': '10.1016/j.jocn.2009.12.020', 'pmid':...",
7,34718234,Decreases in Brain Size and Encephalization in...,"Stibel, Jeff Morgan",Stibel,Growth in human brain size and encephalization...,"{'Year': '2021', 'Month': '10', 'Day': '29'}",2021.0,"Brain, behavior and evolution",10.1159/000519504,,...,True,https://www.karger.com/Article/Pdf/519504,https://www.karger.com/Article/Pdf/519504,,,,,"[{'key': 'ref1', 'doi-asserted-by': 'publisher...","[{'doi': '10.1007/s12110-008-9054-0', 'pmid': ...",
8,28571951,Update in mild traumatic brain injury.,"Freire-Aragón, María Dolores; Rodríguez-Rodríg...",Freire-Aragón,There has been concern for many years regardin...,"{'Year': '2017', 'Month': '05', 'Day': '29'}",2017.0,Medicina clinica,10.1016/j.medcli.2017.05.002,,...,False,,,,,,,[{'key': '10.1016/j.medcli.2017.05.002_bib0305...,"[{'doi': '10.1089/neu.2015.4126', 'pmid': None...",
9,34153803,The multifactorial roles of microglia and macr...,"Chaudhary, Rishabh; Morris, Rhianna J; Steinso...",Chaudhary,"The functional characteristics of glial cells,...","{'Year': '2021', 'Month': '06', 'Day': '15'}",2021.0,Journal of neuroimmunology,10.1016/j.jneuroim.2021.577633,,...,False,,,,,,,"[{'issue': '3–4', 'key': '10.1016/j.jneuroim.2...","[{'doi': '10.1007/s12031-017-0980-3', 'pmid': ...",
