This repository contains a Python script to search and extract PubMed records based on specific authors and topics. The extracted data is stored in a Pandas DataFrame and saved to an Excel file.
YouTube video overview of code: https://youtu.be/sGC66q45BX4
-
Clone the repository:
git clone https://github.com/TLDWTutorials/PubMed-Data-Extraction.git cd PubMed-Data-Extraction
-
Install the required dependencies:
pip install pandas biopython
-
Update the email address to your own to avoid potential issues with Entrez:
Entrez.email = 'your.email@example.com'
-
Customize the list of authors and topics as needed:
authors = ['Bryan Holland', 'Mehmet Oz', 'Anthony Fauci'] topics = ['RNA', 'cardiovascular']
-
Run the script:
python pubmed_extraction.py
- Authors: Modify the
authors
list with the names of authors you want to include in the search. - Topics: Modify the
topics
list with the topics you want to include in the search. - Date Range: Adjust the
date_range
variable to the desired date range for your search.
The script will create an Excel file named PubMed_results.xlsx
containing the following columns:
- PMID
- Title
- Abstract
- Authors
- Journal
- Keywords
- URL
- Affiliations
This project is licensed under the MIT License.