1. What is bibliographic and citation data?
2. What can we do with it?
  - MeSH & Abstract
  - Small Dashboard
  - Comparison
    - Mesh 
    - RN/ SB Enzymes 

# Common Enzymes Among Two Diseases

The main goal of this project is to search and find common enzymes from bibliographic data of any two query terms. These search terms can be two distinct or similar oncogenic diseases, viral or bacterial pathogens, or any biomedical terms for that matter. As long as there is scientific literature for these two query terms, this code can fetch their bibliographic and citation data.

Below I have used two diseases with different origins, **Cancer** & **SARS CoV 2** as two query terms examples. We'd first fetch the bibliographic data from the Entrez database of NCBI. Then we'd process the data to extract only enzymes. Using these enzymes as Nodes we'd then construct a network graph. The layout of the graph is shown below.

**FAQs**
1. Why bibliographic data?
- Frankly I don't know how we can fetch this information from another method or database. In the future, if I found out there is a better way of doing this, I'll update the code.

2. Why bother to find common enzymes among two diseases?
- I get curious about the commonality aspect of two distinct things or even topics. I wanted to create a custom script that can return a list of common enzymes between two scientific terms or phenomena. 
  
3. Does every article's citation & bibliographic data has enzymes list?
- Not necessarily. If enzyme(s) are mentioned in the article and its identifier(s) is included in the MeSH record.

In [None]:
! pip install -q biopython

! pip install -q pyvis


[?25l[K     |▏                               | 10 kB 4.0 MB/s eta 0:00:01[K     |▎                               | 20 kB 4.8 MB/s eta 0:00:01[K     |▍                               | 30 kB 4.5 MB/s eta 0:00:01[K     |▋                               | 40 kB 3.6 MB/s eta 0:00:01[K     |▊                               | 51 kB 3.0 MB/s eta 0:00:01[K     |▉                               | 61 kB 3.6 MB/s eta 0:00:01[K     |█                               | 71 kB 3.8 MB/s eta 0:00:01[K     |█▏                              | 81 kB 4.3 MB/s eta 0:00:01[K     |█▎                              | 92 kB 4.5 MB/s eta 0:00:01[K     |█▍                              | 102 kB 3.8 MB/s eta 0:00:01[K     |█▋                              | 112 kB 3.8 MB/s eta 0:00:01[K     |█▊                              | 122 kB 3.8 MB/s eta 0:00:01[K     |█▉                              | 133 kB 3.8 MB/s eta 0:00:01[K     |██                              | 143 kB 3.8 MB/s eta 0:00:01[K     

In [None]:
# Fetching PubMed article metadata
from Bio import Entrez, Medline

# Graph creation and visualisation
from pyvis.network import Network
import networkx as nx 

import time
import os 
from operator import itemgetter

## Manual Text File

In [None]:
def process_pmid_txt(text_file_path):

  pmids = list()

  f = open(text_file_path, "r")

  for pmid in f.read().split('\n'):
    pmids.append(pmid.strip())  
  
  f.close()
  
  return pmids

In [None]:
def fetch_data(pmids):
    """Returns MEDLINE/pubmed record associated with the PMID(s)"""
    
    Entrez.email = 'akishirsath@gmail.com'

    handle = Entrez.efetch(db="pubmed", 
                           id=pmids, 
                           rettype="medline", 
                           retmode="text")

    records = Medline.parse(handle)    
    
    return list(records)

In [None]:
first_file = "/content/drive/MyDrive/05-Data/PubMed-Common-Enzymes/pmid-Dementia-set.txt"

second_file = "/content/drive/MyDrive/05-Data/PubMed-Common-Enzymes/pmid-Schizophre-set.txt"

In [None]:
first_pmids = process_pmid_txt(first_file)

first_data = fetch_data(",".join(first_pmids))

time.sleep(10)

second_pmids = process_pmid_txt(second_file)

second_data = fetch_data(",".join(second_pmids))

In [None]:
first_data[10]

{'AB': "Dementia raises many ethical issues. The present review, taking note of the fact that the stages of dementia raise distinct ethical issues, focuses on three issues associated with stages of dementia's progression: (1) how the emergence of preclinical and asymptomatic but at-risk categories for dementia creates complex questions about preventive measures, risk disclosure, and protection from stigma and discrimination; (2) how despite efforts at dementia prevention, important research continues to investigate ways to alleviate clinical dementia's symptoms, and requires additional human subjects protections to ethically enroll persons with dementia; and (3) how in spite of research and prevention efforts, persons continue to need to live with dementia. This review highlights two major themes. First is how expanding the boundaries of dementias such as Alzheimer's to include asymptomatic but at-risk persons generate new ethical questions. One promising way to address these questions

In [None]:
D = nx.Graph()

for record in first_data:
  substances = record.get('RN', "NONE")
  if substances != "NONE":
    for molecule in substances:
      if molecule.startswith('EC'):

        # Primary PMID node
        main_node = str(record.get('PMID', "NONE")).strip()
        D.add_node(main_node, ntype='Dementia_Primary')

        # Secondary Enzyme node
        D.add_node(molecule, ntype='Dementia_Secondary')
        D.add_edge(main_node, molecule)

In [None]:
D.number_of_nodes()

128

In [None]:
S = nx.Graph()

for record in second_data:
  substances = record.get('RN', "NONE")
  if substances != "NONE":
    for molecule in substances:
      if molecule.startswith('EC'):

        # Primary PMID node
        main_node = str(record.get('PMID', "NONE")).strip()
        S.add_node(main_node, ntype='Schizophrenia_Primary')

        # Secondary Enzyme node
        S.add_node(molecule, ntype='Schizophrenia_Secondary')
        S.add_edge(main_node, molecule)

In [None]:
S.number_of_nodes()

263

In [None]:
# Combing graph 
compose_graph = nx.compose(D, S)

In [None]:
compose_graph.number_of_nodes()

375

In [None]:
compose_graph.nodes()

NodeView(('22840750', 'EC 2.7.11.13 (Protein Kinase C)', '15265275', 'EC 1.14.13.39 (Nitric Oxide Synthase)', '6130593', 'EC 2.3.1.6 (Choline O-Acetyltransferase)', 'EC 3.1.1.7 (Acetylcholinesterase)', 'EC 4.1.1.15 (Glutamate Decarboxylase)', '16924032', 'EC 3.4.- (Amyloid Precursor Protein Secretases)', 'EC 3.4.- (Endopeptidases)', 'EC 3.4.23.- (Aspartic Acid Endopeptidases)', 'EC 3.4.23.46 (BACE1 protein, human)', '14739545', 'EC 1.1.1.27 (L-Lactate Dehydrogenase)', '26450764', 'EC 2.3.2.- (Aminoacyltransferases)', 'EC 2.3.2.5 (glutaminyl-peptide cyclotransferase)', '12938732', 'EC 1.6.3.- (NADPH Oxidases)', '21648315', 'EC 2.1.1.6 (Catechol O-Methyltransferase)', '8951800', 'EC 3.1.1.8 (Butyrylcholinesterase)', 'EC 3.1.1.8 (Cholinesterases)', '18525128', 'EC 3.4.21.- (KLK6 protein, human)', 'EC 3.4.21.- (Kallikreins)', '20405665', 'EC 3.6.1.- (Adenosine Triphosphatases)', 'EC 3.6.4.6 (Valosin Containing Protein)', '16190916', 'EC 3.6.1.- (TTF2 protein, human)', '10582609', 'EC 1.14.

In [None]:
nx.write_graphml_lxml(compose_graph, "common_dim_sch_enzyme.graphml")

In [None]:
from pyvis.network import Network

nt = Network(height='700px', width='81%', bgcolor='#222222', font_color='#ecf0f1')

nt.from_nx(compose_graph)

In [None]:
nt.set_options("""
var options = {
"edges": {
    "arrows": {
    "to": {
        "enabled": true,
        "scaleFactor": 0.5
    }
    },
    "color": {
    "inherit": true
    },
    "smooth": {
    "forceDirection": "none"
    }
},
"physics": {
    "barnesHut": {
    "gravitationalConstant": -17350,
    "springLength": 210,
    "springConstant": 0.055,
    "avoidOverlap": 0.53
    },
    "minVelocity": 0.75
}
}
""")

In [None]:
nt.write_html("bi_enzyme_graph.html")