<a href="https://colab.research.google.com/github/cheung0/PubMed-Abstract-Downloader/blob/main/Download_PubMed_abstracts_with_Python_Tool.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Download PubMed abstracts with Python Tool

If you're a medical student, or doctor, or are trying to read PubMed articles relating to your medicine or medical condition, this Python tool's for you. It downloads PubMed abstracts in a text file, allowing you to read them faster. It helps you save time!

By: [Michael Cheung](https://www.linkedin.com/in/michael-cheung0/)

Credits:
[GitHub Repo](https://github.com/erilu/pubmed-abstract-compiler)

**Import packages**

In [8]:
import csv
import re
import urllib
from time import sleep
import requests
from bs4 import BeautifulSoup

**Specify search query**

In [9]:
# Specify your search query here. Works on single words or multiple words.
# query = 'P2RY8'
query = 'Creatine Monohydrate'

# Formats query in correct format
def format_query(search_query):
    if ' ' not in search_query:
        query = search_query
    else:
        query = '"' + '+'.join(search_query.split()) + '"'
    return query

query = format_query(query)
print("Query: " + query)

Query: "Creatine+Monohydrate"


**Url with abstract ids**

In [10]:
# common settings between esearch and efetch
base_url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/'
db = 'db=pubmed'

# esearch specific settings
search_eutil = 'esearch.fcgi?'
search_term = '&term=' + query
search_usehistory = '&usehistory=y'
search_rettype = '&rettype=json'

search_url = base_url+search_eutil+db+search_term+search_usehistory+search_rettype
print(search_url)

f = urllib.request.urlopen(search_url)
search_data = f.read().decode('utf-8')

# obtain total abstract count
total_abstract_count = int(re.findall("<Count>(\d+?)</Count>",search_data)[0])

# obtain webenv and querykey settings for efetch command
fetch_webenv = "&WebEnv=" + re.findall ("<WebEnv>(\S+)<\/WebEnv>", search_data)[0]
fetch_querykey = "&query_key=" + re.findall("<QueryKey>(\d+?)</QueryKey>",search_data)[0]

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="Creatine+Monohydrate"&usehistory=y&rettype=json


**Url with abstract summaries**

You can further filter results by changing optional values. For example, change retmax (return max) to limit amount of abstracts.

In [11]:
# other efetch settings
fetch_eutil = 'efetch.fcgi?'
retmax = 100
retstart = 0
fetch_retstart = "&retstart=" + str(retstart)
fetch_retmax = "&retmax=" + str(retmax)
fetch_retmode = "&retmode=text"
fetch_rettype = "&rettype=abstract"

fetch_url = base_url+fetch_eutil+db+fetch_querykey+fetch_webenv+fetch_retstart+fetch_retmax+fetch_retmode+fetch_rettype
print(fetch_url)

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&query_key=1&WebEnv=MCID_65af63a157600d0bec5c1d31&retstart=0&retmax=100&retmode=text&rettype=abstract


**Download the abstracts into a text file**

In [12]:
def download_webpage(url):
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
        text = soup.get_text()
        return text
    else:
        print("Failed to download.")
        return None

def save_text_to_file(text, filename):
    with open(filename, 'w', encoding='utf-8') as file:
        file.write(text)
    print("Text saved to", filename)

# Example usage:
url = fetch_url
filename = query + " PubMed Abstracts.txt"

webpage_text = download_webpage(url)
if webpage_text:
    save_text_to_file(webpage_text, filename)

Text saved to "Creatine+Monohydrate" PubMed Abstracts.txt
