# Query Pubmed

This is a simple notebook to query pubmed for a list of authors and format the results.

In [1]:
from Bio import Entrez

Define some functions to help

In [2]:
def search(query,maxresults='20'):
    Entrez.email = 'awalsh17@its.jnj.com'
    handle = Entrez.esearch(db='pubmed', 
                            sort='relevance', 
                            retmax=maxresults,
                            retmode='xml', 
                            term=query)
    results = Entrez.read(handle)
    return results

In [3]:
def fetch_details(id_list):
    ids = ','.join(id_list)
    Entrez.email = 'awalsh17@its.jnj.com'
    handle = Entrez.efetch(db='pubmed',
                           retmode='xml',
                           id=ids)
    results = Entrez.read(handle)
    return results

Test it out on one query

In [4]:
results = search('(Walsh, Alice[Author])AND("2016/01/01"[Date - Publication] : "3000"[Date - Publication])')
id_list = results['IdList']
papers = fetch_details(id_list)

In [5]:
results.keys()

dict_keys(['Count', 'RetMax', 'RetStart', 'IdList', 'TranslationSet', 'TranslationStack', 'QueryTranslation'])

In [6]:
results['Count']

'6'

In [7]:
papers.keys()

dict_keys(['PubmedArticle', 'PubmedBookArticle'])

In [8]:
papers['PubmedArticle'][0].keys()

dict_keys(['MedlineCitation', 'PubmedData'])

In [9]:
for i, paper in enumerate(papers['PubmedArticle']):
    print("%d) %s" % (i+1, paper['MedlineCitation']['Article']['ArticleTitle']))


1) Immune checkpoint inhibitor PD-1 pathway is down-regulated in synovium at various stages of rheumatoid arthritis disease progression.
2) Triple DMARD treatment in early rheumatoid arthritis modulates synovial T cell activation and plasmablast/plasma cell differentiation pathways.
3) Integrative analysis reveals CD38 as a therapeutic target for plasma cell-rich pre-disease and established rheumatoid arthritis and systemic lupus erythematosus.
4) CD40L-Dependent Pathway Is Active at Various Stages of Rheumatoid Arthritis Disease Progression.
5) Joint-specific DNA methylation and transcriptome signatures in rheumatoid arthritis identify distinct pathogenic processes.
6) Integrative genomic deconvolution of rheumatoid arthritis GWAS loci into gene and cell type associations.


Trying to figure out what all the fields returned are...

In [11]:
import json
print(json.dumps(papers['PubmedArticle'][0], indent=2, separators=(',', ':')))
#print(json.dumps(allpapers[0], indent=2, separators=(',', ':')))



{
  "MedlineCitation":{
    "CitationSubset":[
      "IM"
    ],
    "OtherID":[],
    "OtherAbstract":[],
    "KeywordList":[],
    "SpaceFlightMission":[],
    "GeneralNote":[],
    "PMID":"29489833",
    "DateCompleted":{
      "Year":"2018",
      "Month":"04",
      "Day":"06"
    },
    "DateRevised":{
      "Year":"2018",
      "Month":"04",
      "Day":"06"
    },
    "Article":{
      "ELocationID":[
        "10.1371/journal.pone.0192704"
      ],
      "Language":[
        "eng"
      ],
      "ArticleDate":[
        {
          "Year":"2018",
          "Month":"02",
          "Day":"28"
        }
      ],
      "Journal":{
        "ISSN":"1932-6203",
        "JournalIssue":{
          "Volume":"13",
          "Issue":"2",
          "PubDate":{
            "Year":"2018"
          }
        },
        "Title":"PloS one",
        "ISOAbbreviation":"PLoS ONE"
      },
      "ArticleTitle":"Immune checkpoint inhibitor PD-1 pathway is down-regulated in synovium at various stages o

Now run for a list of authors/search terms

Will want to modify to accept txt file input with all authors on new lines

In [12]:
authors = ['Ian Anderson', 'Biao Zheng', 'Jacqueline Benson', 'Louise Jopling', 'Michael Elliot', 'Murray McKinnon', 'Alison Budelsky', 'Carl Manthey', 'Paul Dudas', 'Glenda Castro', 'Ryan Eberwine', 'Wai Ping Leung', 'Julianty Angsana', 'Aimee Rose De Leon-Tabaldo', 'Rosa Luna-Roman', 'Xiaohua Xue', 'Anish Suri', 'Arlette Kouwenhoven', 'Bertrand Van Schoubroeck', 'Cheryl Sweeney', 'Frederik Stevenaert', 'Nathan Felix', 'Mairin O’Brien', 'Gavin Hirst', 'Alec Lebsack', 'Jennifer Venable', 'Craig Woods', 'John Keith', 'Ronald Wolin', 'Steven Meduna', 'Wendy Eccles', 'Virginia Tanis', 'William Jones', 'Marcos A Sainz', 'Charlotte Deckhut', 'Hariharan Venkatesan', 'David Kummer', 'Kelly McClure', 'Rachel Nishimura', 'Jianmei Wei', 'Wenying Chai', 'Steven Goldberg', 'Connor Martin', 'Mark Tichenor', 'Elizabeth Fennema', 'Genesis Bacani', 'Russell Smith', 'Aihua Wang', 'Joseph Barbay', 'Kevin Kreutter', 'Paul Krawczuk', 'Jennifer Towne', 'Jacqueline Perrigoue', 'Aisling O’Hara', 'Polina Mamontov', 'Eric Neiman', 'Lani San Mateo', 'Fang Teng', 'Gerardus Bongers', 'Joshua Wertheimer', 'Sreedevi Adhikarakunnathu', 'Karl Kavalkovich', 'Andrew Baltus', 'Ping Ling', 'Jian Zhu', 'Jingxue Yu', 'Siquan Sun', 'Janise Deming', 'Mandana Tootoonchi', 'Mihee Kim', 'Ronald Marchelletta', 'Maxim Poustovoitov', 'Tatiana Ort', 'Ashok Mathur', 'Holger Babbe', 'Lisa Madge', 'Michael Scully', 'Sweta Patel', 'Zachary Hutchins', 'Lauren Eorio', 'Loui Madakamutil', 'Kenneth Kilgore', 'Christine McCauley', 'M Merle Elloso', 'Brian Jones', 'Fang Shen', 'Ann Cai', 'Isabelle Baribaud', 'Yevgeniya Orlovsky', 'Navin Rao', 'Karen Duffy', 'Robert Kuhn', 'Michael Huber', 'Stephane Becart', 'Steven Nguyen', 'Samuel Sihapong', 'Pratima Bansal-Pakala', 'Changbao Liu', 'Melissa Swiecki', 'Yawei Li', 'Sunil Nagpal', 'Suzanne Cole', 'Xuefeng Yin', 'Yanxia Guo', 'Tadimeti Rao', 'Holly Raymond', 'Kyle Bednar', 'Matthias Hesse', 'Ravi Malaviya', 'Heather Deutsch', 'Michael Harris', 'Zhao Zhou', 'Karen Ngo', 'Leon Chang', 'Natasha Rozenkrants', 'Roy Jia-Chian Li', 'Chunxu Gao', 'Debra Gardner', 'Jeffrey Hall', 'Shannon Hitchcock', 'Tanzilya Khayrullina', 'Pejman Soroosh', 'Gavin Lewis', 'German Aleman Muench', 'Homayon Banie''Ian Anderson', 'Biao Zheng', 'Jacqueline Benson', 'Louise Jopling', 'Michael Elliot', 'Murray McKinnon', 'Alison Budelsky', 'Carl Manthey', 'Paul Dudas', 'Glenda Castro', 'Ryan Eberwine', 'Wai Ping Leung', 'Julianty Angsana', 'Aimee Rose De Leon-Tabaldo', 'Rosa Luna-Roman', 'Xiaohua Xue', 'Anish Suri', 'Arlette Kouwenhoven', 'Bertrand Van Schoubroeck', 'Cheryl Sweeney', 'Frederik Stevenaert', 'Nathan Felix', 'Mairin O’Brien', 'Gavin Hirst', 'Alec Lebsack', 'Jennifer Venable', 'Craig Woods', 'John Keith', 'Ronald Wolin', 'Steven Meduna', 'Wendy Eccles', 'Virginia Tanis', 'William Jones', 'Marcos A Sainz', 'Charlotte Deckhut', 'Hariharan Venkatesan', 'David Kummer', 'Kelly McClure', 'Rachel Nishimura', 'Jianmei Wei', 'Wenying Chai', 'Steven Goldberg', 'Connor Martin', 'Mark Tichenor', 'Elizabeth Fennema', 'Genesis Bacani', 'Russell Smith', 'Aihua Wang', 'Joseph Barbay', 'Kevin Kreutter', 'Paul Krawczuk', 'Jennifer Towne', 'Jacqueline Perrigoue', 'Aisling O’Hara', 'Polina Mamontov', 'Eric Neiman', 'Lani San Mateo', 'Fang Teng', 'Gerardus Bongers', 'Joshua Wertheimer', 'Sreedevi Adhikarakunnathu', 'Karl Kavalkovich', 'Andrew Baltus', 'Ping Ling', 'Jian Zhu', 'Jingxue Yu', 'Siquan Sun', 'Janise Deming', 'Mandana Tootoonchi', 'Mihee Kim', 'Ronald Marchelletta', 'Maxim Poustovoitov', 'Tatiana Ort', 'Ashok Mathur', 'Holger Babbe', 'Lisa Madge', 'Michael Scully', 'Sweta Patel', 'Zachary Hutchins', 'Lauren Eorio', 'Loui Madakamutil', 'Kenneth Kilgore', 'Christine McCauley', 'M Merle Elloso', 'Brian Jones', 'Fang Shen', 'Ann Cai', 'Isabelle Baribaud', 'Yevgeniya Orlovsky', 'Navin Rao', 'Karen Duffy', 'Robert Kuhn', 'Michael Huber', 'Stephane Becart', 'Steven Nguyen', 'Samuel Sihapong', 'Pratima Bansal-Pakala', 'Changbao Liu', 'Melissa Swiecki', 'Yawei Li', 'Sunil Nagpal', 'Suzanne Cole', 'Xuefeng Yin', 'Yanxia Guo', 'Tadimeti Rao', 'Holly Raymond', 'Kyle Bednar', 'Matthias Hesse', 'Ravi Malaviya', 'Heather Deutsch', 'Michael Harris', 'Zhao Zhou', 'Karen Ngo', 'Leon Chang', 'Natasha Rozenkrants', 'Roy Jia-Chian Li', 'Chunxu Gao', 'Debra Gardner', 'Jeffrey Hall', 'Shannon Hitchcock', 'Tanzilya Khayrullina', 'Pejman Soroosh', 'Gavin Lewis', 'German Aleman Muench', 'Homayon Banie']
#authors = ['George Vratsanos', 'Carrie Brodmerkel', 'Ernesto Munoz', 'Patrick Branigan', 'Samuel DePrimo', 'Monika Banaszewska', 'Xuejun Liu', 'Yanqing Chen', 'Joshua Friedman', 'Frederic Baribaud', 'Ling-Yang Hao', 'Shannon Telesco', 'Katherine Li', 'Bryan Linggi', 'Calixte Monast', 'Eric Wadman', 'Karen Hayden', 'Daniel Horowitz', 'Lynn Tomsho', 'Sunita Bhagat', 'Takahiro Sato', 'Kim Campbell', 'Bidisha Dasgupta', 'Chris Huang', 'Keying Ma', 'Kristen Sweet', 'Matthew Loza', 'Carol Franks', 'Nancy Peffer', 'Donald Raible ', 'Bart Frederick', 'Alexa Piantone', 'Bart van Hartingsveldt', 'Wendy Cordier', 'Cathye Shu', 'Stanley Marciniak', 'Dick De Vries', 'Lisa Pierre', 'William Barchuk', 'Qingmin Wang', 'Donna Moore', 'Ian Gourley', 'Cesar Calderon', 'Mark Rigby', 'Paul Dunford', 'Richard Strauss', 'Radu Dobrin', 'Amy Hart', 'Alice Walsh', 'Aleksandar Stojmirovic']
#authors = ['Bensley, Karen', 'Bili, Androniki', 'Caroselli, Madeline', 'Chan, Daphne', 'Clark, Michael', 'Fei, Kaiyin', 'Flavin, Susan', 'Goldstein, Neil', 'Greenbaum, Linda', 'Hackman, Sandy', 'Harrison, Diane', 'Hewitt, Bill', 'Hsia, Elizabeth', 'Hsu, Benjamin', 'Jacobstein, Doug', 'Kollmeier, Alexa', 'Marano, Colleen', 'Marty-Ethgen, Pascale']
dates = '("2017/01/01"[Date - Publication] : "3000"[Date - Publication])'
queries = ['(('+x+'[Author])'+'AND'+dates+')AND Janssen[Affiliation]' for x in authors]

In [13]:
allpapers = []
matched_author = []
i = 0
for name in queries:
    author_search = authors[i]
    results = search(name)
    id_list = results['IdList']
    i += 1
    if len(id_list)>0:
        papers = fetch_details(id_list)
        allpapers = allpapers + papers['PubmedArticle']
        matched_author = matched_author + [author_search] * len(papers['PubmedArticle'])

In [14]:
outfile = open("Disc.txt",'w')
all_titles = []
count = 1
for i, paper in enumerate(allpapers):
    title = paper['MedlineCitation']['Article']['ArticleTitle']
    journal = ''
    if 'ISOAbbreviation' in paper['MedlineCitation']['Article']['Journal'].keys():
        journal = paper['MedlineCitation']['Article']['Journal']['ISOAbbreviation']
    elif 'MedlineJournalInfo' in paper['MedlineCitation'].keys():
        journal = paper['MedlineCitation']['MedlineJournalInfo']['MedlineTA']
    year = paper['MedlineCitation']['Article']['Journal']['JournalIssue']['PubDate']
    authors = paper['MedlineCitation']['Article']['AuthorList']
    #author_print = ', '.join([' '.join([x['LastName'],x['Initials']]) for x in authors])
    author_print = ''
    for x in authors:
        if 'CollectiveName' in x.keys():
            author_print = author_print + x['CollectiveName'] + ', '
        else:
            author_print = author_print + ' '.join([x['LastName'],x['Initials']]) + ', '    
    if 'Year' in year.keys():
        year = year['Year']
    elif 'MedlineDate' in year.keys():
        year = year['MedlineDate']
    if title not in all_titles:    
        newline  = "%d) %s\"%s\" %s (%s)\n" % (count, author_print, title, journal, year)
        print(newline)
        outfile.write("%s\n" % (newline))
        count += 1
    all_titles.append(title)
    
outfile.close()

1) Cole S, Walsh A, Yin X, Wechalekar MD, Smith MD, Proudman SM, Veale DJ, Fearon U, Pitzalis C, Humby F, Bombardieri M, Axel A, Adams H, Chiu C, Sharp M, Alvarez J, Anderson I, Madakamutil L, Nagpal S, Guo Y, "Integrative analysis reveals CD38 as a therapeutic target for plasma cell-rich pre-disease and established rheumatoid arthritis and systemic lupus erythematosus." Arthritis Res. Ther. (2018)

2) Guo Y, Walsh AM, Canavan M, Wechalekar MD, Cole S, Yin X, Scott B, Loza M, Orr C, McGarry T, Bombardieri M, Humby F, Proudman SM, Pitzalis C, Smith MD, Friedman JR, Anderson I, Madakamutil L, Veale DJ, Fearon U, Nagpal S, "Immune checkpoint inhibitor PD-1 pathway is down-regulated in synovium at various stages of rheumatoid arthritis disease progression." PLoS ONE (2018)

3) Barbay JK, Cummings MD, Abad M, Castro G, Kreutter KD, Kummer DA, Maharoof U, Milligan C, Nishimura R, Pierce J, Schalk-Hihi C, Spurlino J, Tanis VM, Urbanski M, Venkatesan H, Wang A, Woods C, Wolin R, Xue X, Edwards

In [15]:
len(allpapers)

191

To read into Word document, open and set to Unicode UTF-8 for symbol characters to be displayed correctly.