## PubMed Search Demo

This notebook is intended to demonstrate how to use the `PubMedSearcher` class to search, download, and explore references from PubMed queries.

To create instance of the PubMedSearcher class, just provide a search string, an email, and an optional dataframe that will be used to store the results.

The important methods of the class are:
- `search`: to search for references in PubMed. This will populate the df attribute with results, including the title, abstract, and other metadata. Does not need to be called if a dataframe is provided at instantiation.
- `download_articles`: to download the articles found in the search. This will download the PDFs of the articles, using open access links when available, but can fallback to Sci-Hub. Uses PyPaperRetriever to download the articles.

Other methods that are available but more specific to certain use cases are:

- `save_abstracts_as_csv`: Saves the abstracts of the search results to a CSV file (default name is `abstracts.csv`)
- `extract_images`: to extract images from the articles after they have been downloaded.
- `fetch_references`: Fetches references from each article in the search results using APIs. Returns a JSON object with the references.
- `standardize_references`: Standardizes the references in the search results into a new column. Values are lists of dicts, with these keys: ['doi', 'pmid', 'pmcid', 'title', 'authors']
- `fetch_cited_by`: Fetches articles that cite each article in the search results using APIs. Returns a JSON object with the references.
- `save`: Saves the dataframe with the search results to a CSV file (default name is `master_list.csv`)
- `download_xml_fulltext`: Optional, Downloads the full text of the articles in XML format. Not commonly available, but works for articles in the PubMed open-access subset.


### Option 1A: Populate search object using CSV from a previous search
If you are starting your search from scratch, just use the following cell (Option 1B) instead.

In [9]:
from calvin_utils.gpt_sys_review.pubmed_utils import PubMedSearcher
import pandas as pd

search_result_path = "master_list.csv" #Replace with the path to your search result

search_string = "brain lesions AND covid-19"

email = "youremail@gmail.com"

searcher = PubMedSearcher(search_string=search_string, email=email, df=pd.read_csv('master_list.csv'))

### Option 1B: Perform a PubMed search directly from the notebook

In [10]:
from calvin_utils.gpt_sys_review.pubmed_utils import PubMedSearcher
import pandas as pd

search_string = "brain lesions AND covid-19"

email = "youremail@gmail.com"

searcher = PubMedSearcher(search_string=search_string, email=email)

searcher.search(
    count=10,
    min_date=1975,
    order_by="relevance",
    only_open_access=False,
    only_case_reports=False
)

display(searcher.df)

Unnamed: 0,pmid,title,authors,first_author,abstract,publication_date,publication_year,journal_info,doi,pmcid,keywords,article_type,country,language
0,34105198,COVID-19 associated brain/spinal cord lesions ...,"Lewis, Ariane; Jain, Rajan; Frontera, Jennifer...",Lewis,We reviewed the literature to evaluate cerebro...,"{'Year': '2021', 'Month': '06', 'Day': '08'}",2021,Journal of neuroimaging : official journal of ...,10.1111/jon.12880,PMC8242764,COVID-19; MRI; SARS-CoV-2; cerebrospinal fluid...,Journal Article; Meta-Analysis; Review; System...,United States,eng
1,35701598,Selective visuoconstructional impairment follo...,"de Paula, Jonas Jardim; Paiva, Rachel E R P; S...",de Paula,People recovered from COVID-19 may still prese...,"{'Year': '2022', 'Month': '06', 'Day': '14'}",2022,Molecular psychiatry,10.1038/s41380-022-01632-5,PMC9196149,,"Journal Article; Research Support, Non-U.S. Gov't",England,eng
2,36445631,Neurological Complications Following COVID-19 ...,"Chatterjee, Aparajita; Chakravarty, Ambar",Chatterjee,A variety of neurological complications have b...,"{'Year': '2022', 'Month': '11', 'Day': '29'}",2022,Current neurology and neuroscience reports,10.1007/s11910-022-01247-x,PMC9707152,COVID-19 vaccination; Cortical sinus venous th...,Journal Article; Review,United States,eng
3,33416999,Rituximab for the treatment of multiple sclero...,"Chisari, Clara Grazia; Sgarlata, Eleonora; Are...",Chisari,"In the last decades, evidence suggesting the d...","{'Year': '2021', 'Month': '01', 'Day': '08'}",2021,Journal of neurology,10.1007/s00415-020-10362-z,PMC7790722,Efficacy; Multiple sclerosis; Rituximab; Safety,Journal Article; Review,Germany,eng
4,36119649,COVID-19-Associated Stroke.,"Shchukin, I A; Fidler, M S; Koltsov, I A; Suvo...",Shchukin,The COVID-19 pandemic has had significant infl...,"{'Year': '2022', 'Month': '09', 'Day': '13'}",2022,Neuroscience and behavioral physiology,10.1007/s11055-022-01291-7,PMC9468522,Mexidol; angiotensin receptors; coronavirus in...,Journal Article,United States,eng
5,34718113,Long-COVID: Cognitive deficits (brain fog) and...,"Hugon, Jacques",Hugon,,"{'Year': '2021', 'Month': '10', 'Day': '28'}",2021,"Presse medicale (Paris, France : 1983)",10.1016/j.lpm.2021.104090,PMC8552626,,Editorial,France,eng
6,37309119,Pathogenesis and Management of Acute Necrotizi...,"Qin, Ningxiang; Wang, Jing; Peng, Xi; Wang, Liang",Qin,"During the COVID-19 pandemic, many cases of ac...","{'Year': '2023', 'Month': '06', 'Day': '13'}",2023,Expert review of neurotherapeutics,10.1080/14737175.2023.2224503,,COVID-19; accessory infection; acute necrotizi...,"Journal Article; Review; Research Support, Non...",England,eng
7,37535100,Brain MRI findings in neurologically symptomat...,"Afsahi, Amir Masoud; Norbash, Alexander M; Sye...",Afsahi,Coronavirus disease 2019 (COVID-19) has been a...,"{'Year': '2023', 'Month': '08', 'Day': '03'}",2023,Journal of neurology,10.1007/s00415-023-11914-9,,COVID-19; MRI; Magnetic resonance imaging; Neu...,Journal Article; Review,Germany,eng
8,35731277,NMOSD typical brain lesions after COVID-19 mRN...,"Lévi-Strauss, Julie; Provost, Corentin; Wane, ...",Lévi-Strauss,,"{'Year': '2022', 'Month': '06', 'Day': '22'}",2022,Journal of neurology,10.1007/s00415-022-11229-1,PMC9214460,,Letter,Germany,eng
9,35001988,Central Nervous System Lesions in COVID-19.,"Kurushina, O V; Barulin, A E",Kurushina,This review discusses current data on CNS lesi...,"{'Year': '2022', 'Month': '01', 'Day': '03'}",2022,Neuroscience and behavioral physiology,10.1007/s11055-021-01183-2,PMC8720549,COVID-19; acute hemorrhagic necrotic encephalo...,Journal Article,United States,eng


### 2. Download the articles to your local machine

In [11]:
searcher.download_articles(
    download_directory="pdf_downloads",
    allow_scihub=True
)

Downloading articles: 100%|██████████| 10/10 [00:00<00:00, 771.62it/s]


<calvin_utils.gpt_sys_review.pubmed_utils.PubMedSearcher at 0x1474fa990>

### 3. Save the search results to a CSV file

In [12]:
searcher.save(csv_path="master_list.csv")