# Common Enzymes Among Two Diseases

The main goal of this project is to search and find common enzymes from bibliographic data of any two query terms. These search terms can be two distinct or similar oncogenic diseases, viral or bacterial pathogens, or any biomedical terms for that matter. As long as there is scientific literature for these two query terms, this code can fetch their bibliographic and citation data.

Below I have used two diseases with different origins, **Cancer** & **SARS CoV 2** as two query terms examples. We'd first fetch the bibliographic data from the Entrez database of NCBI. Then we'd process the data to extract only enzymes. Using these enzymes as Nodes we'd then construct a network graph. The layout of the graph is shown below.

**FAQs**
1. Why bibliographic data?
- Frankly I don't know how we can fetch this information from another method or database. In the future, if I found out there is a better way of doing this, I'll update the code.

2. Why bother to find common enzymes among two diseases?
- I get curious about the commonality aspect of two distinct things or even topics. I wanted to create a custom script that can return a list of common enzymes between two scientific terms or phenomena. 
  
3. Does every article's citation & bibliographic data has enzymes list?
- Not necessarily. If enzyme(s) are mentioned in the article and its identifier(s) is included in the MeSH record.

In [1]:
! pip install -q biopython

! pip install -q pyvis

[?25l[K     |▏                               | 10 kB 17.5 MB/s eta 0:00:01[K     |▎                               | 20 kB 20.3 MB/s eta 0:00:01[K     |▍                               | 30 kB 16.3 MB/s eta 0:00:01[K     |▋                               | 40 kB 11.7 MB/s eta 0:00:01[K     |▊                               | 51 kB 7.7 MB/s eta 0:00:01[K     |▉                               | 61 kB 8.9 MB/s eta 0:00:01[K     |█                               | 71 kB 8.3 MB/s eta 0:00:01[K     |█▏                              | 81 kB 8.7 MB/s eta 0:00:01[K     |█▎                              | 92 kB 9.6 MB/s eta 0:00:01[K     |█▍                              | 102 kB 8.6 MB/s eta 0:00:01[K     |█▋                              | 112 kB 8.6 MB/s eta 0:00:01[K     |█▊                              | 122 kB 8.6 MB/s eta 0:00:01[K     |█▉                              | 133 kB 8.6 MB/s eta 0:00:01[K     |██                              | 143 kB 8.6 MB/s eta 0:00:01[K 

In [2]:
# Fetching PubMed article metadata
from Bio import Entrez, Medline

# Graph creation and visualisation
from pyvis.network import Network

import time
import os

In [3]:
def process_pmid_txt(text_file_path):

  pmids = list()

  with open(text_file_path, "r") as f:
    for pmid in f.read().split('\n'):
      pmids.append(pmid.strip())  

  return pmids

In [4]:
def efetch(pmids):
    """Returns MEDLINE/pubmed record associated with the PMID(s)"""
    
    Entrez.email = 'akishirsath@gmail.com'

    handle = Entrez.efetch(db="pubmed", 
                           id=pmids, 
                           rettype="medline", 
                           retmode="text")

    records = Medline.parse(handle)    
    
    return list(records)

In [5]:
first_file = "/content/drive/MyDrive/05-Data/PubMed-Common-Enzymes/pmid-Cancer-set.txt"

second_file = "/content/drive/MyDrive/05-Data/PubMed-Common-Enzymes/pmid-sarscov-2-set.txt"

first_pmids = process_pmid_txt(first_file)

second_pmids = process_pmid_txt(second_file)

In [6]:
first_topic_records = efetch(",".join(first_pmids))

time.sleep(10)

second_topic_records = efetch(",".join(second_pmids))

In [7]:
colors = {
    'backgrd' : '#f1f2f6',    # Background color
    'font' : '#2f3542',       # Text font color
    'first_prim' : '#6F1E51', # Article nodes color (first)
    'second_prim' : '#1B1464',# Article nodes color (second)
    'first_sec' : '#ED4C67',  # Enzyme nodes color (first)
    'second_sec' : '#0652DD'  # Enzyme nodes color (second)
}

In [8]:
N = Network(height='750px', 
            width='100%', 
            bgcolor=colors['backgrd'], 
            font_color=colors['font'], 
            notebook=True)

In [9]:
N.set_options("""
var options = {
  "edges": {
    "arrows": {
      "to": {
        "enabled": true,
        "scaleFactor": 0.5
      }
    },
    "color": {
      "inherit": true
    },
    "smooth": {
      "forceDirection": "none"
    }
  },
  "physics": {
    "barnesHut": {
      "gravitationalConstant": -17350,
      "springLength": 210,
      "springConstant": 0.055,
      "avoidOverlap": 0.53
    },
    "minVelocity": 0.75
  }
}
""")

In [10]:
for record in first_topic_records:
  substances = record.get('RN', "NONE")
  if substances != "NONE":
    for molecule in substances:
      if molecule.startswith('EC'):

        # Primary PMID node
        article = str(record.get('PMID', "NONE")).strip()
        N.add_node(article, size=25, color=colors['first_prim'])

        # Secondary Enzyme node
        N.add_node(molecule, size=15, color=colors['first_sec'])
        N.add_edge(article, molecule)

In [11]:
for record in second_topic_records:
  substances = record.get('RN', "NONE")
  if substances != "NONE":
    for molecule in substances:
      if molecule.startswith('EC'):

        # Primary PMID node
        article = str(record.get('PMID', "NONE")).strip()
        N.add_node(article, size=25, color=colors['second_prim'])

        # Secondary Enzyme node
        N.add_node(molecule, size=15, color=colors['second_sec'])
        N.add_edge(article, molecule)

In [12]:
N.show('common_enzymes_net_graph_viz.html')