# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 8 - Literature Analyses**
Welcome to the eighth jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with obtaining different literature-related information such as the information that can be obtained using the Textpresso Central website.
Let's get started!

We will start by importing required libraries for the analysis. We use the Europe PMC API for obtaining this information!

In [1]:
import requests, sys, json, urllib3, xml.dom.minidom
from lxml import etree
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

Let us first explore the fields that are available in the Europe PMC API. 

In [2]:
#Generate the required URL for fetching the fields and send the request
request = requests.get('https://www.ebi.ac.uk/europepmc/webservices/rest/fields', headers={ "Content-Type" : "application/json", "Accept" : ""})

#Print the output from the query 
if not request.ok:
  request.raise_for_status()
  sys.exit() 
result = xml.dom.minidom.parseString(request.text)
result = result.toprettyxml()
print(result)

<?xml version="1.0" ?>
<searchTermsList>
	<searchTermList>
		<searchTerms>
			<term>ABBR</term>
		</searchTerms>
		<searchTerms>
			<term>ABSTRACT</term>
		</searchTerms>
		<searchTerms>
			<term>ACCESSION_ID</term>
		</searchTerms>
		<searchTerms>
			<term>ACCESSION_TYPE</term>
		</searchTerms>
		<searchTerms>
			<term>ACK_FUND</term>
		</searchTerms>
		<searchTerms>
			<term>AFF</term>
		</searchTerms>
		<searchTerms>
			<term>ANNOTATION_PROVIDER</term>
		</searchTerms>
		<searchTerms>
			<term>ANNOTATION_TYPE</term>
		</searchTerms>
		<searchTerms>
			<term>APPENDIX</term>
		</searchTerms>
		<searchTerms>
			<term>ARXPR_PUBS</term>
		</searchTerms>
		<searchTerms>
			<term>AUTH</term>
		</searchTerms>
		<searchTerms>
			<term>AUTHORID</term>
		</searchTerms>
		<searchTerms>
			<term>AUTHORID_TYPE</term>
		</searchTerms>
		<searchTerms>
			<term>AUTHOR_ROLES</term>
		</searchTerms>
		<searchTerms>
			<term>AUTH_CON</term>
		</searchTerms>
		<searchTerms>
			<term>AUTH_FIRST</term>
		

In case you know the accession ID for a paper, it is very easy to download any supplementary material that is associated with this paper by using the supplementaryFiles end point of the API.

In [3]:
#Generate the URL required for the query by entering the accession id of the paper in the variable below.
id = 'PMC3027648'
request = requests.get('https://www.ebi.ac.uk/europepmc/webservices/rest/'+id+'/supplementaryFiles?includeInlineImage=true', headers={ "Content-Type" : "application/zip", "Accept" : ""}, stream=True)

#download the queried results to your system into a .zip fike
target_path='supplementaryFiles.zip'
handle = open(target_path, 'wb')
for chunk in request.iter_content(chunk_size=512):
    if chunk:
        handle.write(chunk)
handle.close()

It is extremely useful to query for papers that contain a certain keyword. For this we define a function which you do not need to make any changes to which will query the keyword across the entire Europe PMC database.

In [4]:
def searchEuropePMCclient(query, format='XML'):
    base_url = 'https://www.ebi.ac.uk/europepmc/webservices/rest/search?'
    payload = {'query' : query, 'format' : format}
    request = requests.get(base_url, params=payload)
    if request.ok:
        result = xml.dom.minidom.parseString(request.text)
        result = result.toprettyxml()
        print(result)
    else:
        print('Something has gone wrong!!')

In [5]:
#Enter the keyword that you want to search for
keyword = 'Caenorhabditis elegans'
searchEuropePMCclient(keyword)

<?xml version="1.0" ?>
<responseWrapper xmlns:slx="http://www.scholix.org" xmlns:epmc="https://www.europepmc.org/data">
	<version>6.5</version>
	<hitCount>79766</hitCount>
	<nextCursorMark>AoIIQj3FSig0NDAzNTAwMw==</nextCursorMark>
	<request>
		<queryString>Caenorhabditis elegans</queryString>
		<resultType>lite</resultType>
		<cursorMark>*</cursorMark>
		<pageSize>25</pageSize>
		<sort/>
		<synonym>false</synonym>
	</request>
	<resultList>
		<result>
			<id>34335159</id>
			<source>MED</source>
			<pmid>34335159</pmid>
			<pmcid>PMC8319666</pmcid>
			<fullTextIdList>
				<fullTextId>PMC8319666</fullTextId>
			</fullTextIdList>
			<doi>10.3389/fnins.2021.678590</doi>
			<title>Regulation of Satiety Quiescence by Neuropeptide Signaling in &lt;i&gt;Caenorhabditis elegans&lt;/i&gt;.</title>
			<authorString>Makino M, Ulzii E, Shirasaki R, Kim J, You YJ.</authorString>
			<journalTitle>Front Neurosci</journalTitle>
			<journalVolume>15</journalVolume>
			<pubYear>2021</pubYear>
			<journalI

Another useful utility provided by the Europe PMC API is the possibility to query for the works of a certain author using either their name or their ORCID ID.

In [7]:
#Enter the author's name or ORCID ID
author_id = '0000-0001-8314-8497'

In [9]:
#Generate the required URL for fetching the papers written by the author and send the request
request = requests.get('https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=AUTHORID:'+author_id, headers={ "Content-Type" : "application/json", "Accept" : ""})

#Print the output from the query 
if not request.ok:
  request.raise_for_status()
  sys.exit() 
result = xml.dom.minidom.parseString(request.text)
result = result.toprettyxml()
print(result)

<?xml version="1.0" ?>
<responseWrapper xmlns:slx="http://www.scholix.org" xmlns:epmc="https://www.europepmc.org/data">
	<version>6.5</version>
	<hitCount>299</hitCount>
	<nextCursorMark>AoIIP5wE2CgzNzM3NzkzMQ==</nextCursorMark>
	<request>
		<queryString>AUTHORID:0000-0001-8314-8497</queryString>
		<resultType>lite</resultType>
		<cursorMark>*</cursorMark>
		<pageSize>25</pageSize>
		<sort/>
		<synonym>false</synonym>
	</request>
	<resultList>
		<result>
			<id>34264324</id>
			<source>MED</source>
			<pmid>34264324</pmid>
			<doi>10.1093/hmg/ddab198</doi>
			<title>The International Human Genome Project.</title>
			<authorString>Birney E.</authorString>
			<journalTitle>Hum Mol Genet</journalTitle>
			<pubYear>2021</pubYear>
			<journalIssn>0964-6906; 1460-2083; </journalIssn>
			<pubType>journal article</pubType>
			<isOpenAccess>N</isOpenAccess>
			<inEPMC>N</inEPMC>
			<inPMC>N</inPMC>
			<hasPDF>N</hasPDF>
			<hasBook>N</hasBook>
			<hasSuppl>N</hasSuppl>
			<citedByCount>0</cited

It is also possible to list the papers that have cited a certain publication by just entering the source of the paper and its external id which can be its accession id in most cases.

In [10]:
#Enter source and external id of the paper
source = 'MED' #Can be AGR, CBA, CTX, ETH, HIR, MED, PAT, PMC, PPR
external_id = '30206121'

In [11]:
#Generate the required URL for fetching the papers that cite the queried paper and send the request
request = requests.get('https://www.ebi.ac.uk/europepmc/webservices/rest/'+source+'/'+external_id+'/citations', headers={ "Content-Type" : "application/json", "Accept" : ""})

#Print the output from the query 
if not request.ok:
  request.raise_for_status()
  sys.exit() 
result = xml.dom.minidom.parseString(request.text)
result = result.toprettyxml()
print(result)

<?xml version="1.0" ?>
<responseWrapper xmlns:slx="http://www.scholix.org" xmlns:epmc="https://www.europepmc.org/data">
	<version>6.5</version>
	<hitCount>13</hitCount>
	<request>
		<id>30206121</id>
		<source>MED</source>
		<offSet>0</offSet>
		<pageSize>25</pageSize>
	</request>
	<citationList>
		<citation>
			<id>34335159</id>
			<source>MED</source>
			<citationType>research-article; journal article</citationType>
			<title>Regulation of Satiety Quiescence by Neuropeptide Signaling in &lt;i&gt;Caenorhabditis elegans&lt;/i&gt;.</title>
			<authorString>Makino M, Ulzii E, Shirasaki R, Kim J, You YJ.</authorString>
			<journalAbbreviation>Front Neurosci</journalAbbreviation>
			<pubYear>2021</pubYear>
			<volume>15</volume>
			<pageInfo>678590</pageInfo>
			<citedByCount>0</citedByCount>
		</citation>
		<citation>
			<id>34179018</id>
			<source>MED</source>
			<citationType>research-article; journal article</citationType>
			<title>Oleic Acid Protects &lt;i&gt;Caenorhabditis&lt;/i&gt

This is the end of the tutorial on replicating Textpresso results using the Europe PMC RESTful API to get the literature analyses information. The data is up-to date and is very quick to extract, and is easy to handle.

This tutorial is also the end of the analysis series. In the next tutorial, we will implement and test some simple utilities for the data.