<a href="https://colab.research.google.com/github/akshayonly/BioNER-MeSH-Net-Graph/blob/main/mesh_vs_ner_pubmed_abstract.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Medical Subject Heading Vs. Named Entity Recognition

In this article, we'd explore BioNER and MeSH terms for building and visualizing network graphs of any PubMed articles.

## All Libraries

In [1]:
'''
!pip install -q biopython
!pip install -q pyvis
!pip install -q nxviz
!pip install -q scispacy
!pip install -q https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bc5cdr_md-0.4.0.tar.gz

import pandas as pd
import seaborn as sns
import networkx as nx
import matplotlib.pyplot as plt

import networkx as nx
import pyvis as nt

from Bio import Entrez
from Bio import Medline

import os
from tqdm import tqdm

import scispacy
import spacy
nlp = spacy.load("en_ner_bc5cdr_md")
'''

'\n!pip install -q biopython\n!pip install -q pyvis\n!pip install -q nxviz\n!pip install -q scispacy\n!pip install -q https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bc5cdr_md-0.4.0.tar.gz\n\nimport pandas as pd\nimport seaborn as sns\nimport networkx as nx\nimport matplotlib.pyplot as plt\n\nimport networkx as nx\nimport pyvis as nt\n\nfrom Bio import Entrez\nfrom Bio import Medline\n\nimport os\nfrom tqdm import tqdm\n\nimport scispacy\nimport spacy\nnlp = spacy.load("en_ner_bc5cdr_md")\n'

## Introduction

In [2]:
# Pass

## Building & Visualising Graphs

In [3]:
!pip install -q networkx
!pip install -q pyvis

In [4]:
import networkx as nx
import pyvis as pv

## PubMed & Entrez

In [5]:
!pip install -q biopython

[?25l[K     |▏                               | 10 kB 29.5 MB/s eta 0:00:01[K     |▎                               | 20 kB 36.1 MB/s eta 0:00:01[K     |▍                               | 30 kB 33.3 MB/s eta 0:00:01[K     |▋                               | 40 kB 22.0 MB/s eta 0:00:01[K     |▊                               | 51 kB 8.2 MB/s eta 0:00:01[K     |▉                               | 61 kB 9.6 MB/s eta 0:00:01[K     |█                               | 71 kB 10.0 MB/s eta 0:00:01[K     |█▏                              | 81 kB 10.1 MB/s eta 0:00:01[K     |█▎                              | 92 kB 11.3 MB/s eta 0:00:01[K     |█▍                              | 102 kB 9.2 MB/s eta 0:00:01[K     |█▋                              | 112 kB 9.2 MB/s eta 0:00:01[K     |█▊                              | 122 kB 9.2 MB/s eta 0:00:01[K     |█▉                              | 133 kB 9.2 MB/s eta 0:00:01[K     |██                              | 143 kB 9.2 MB/s eta 0:00:01

In [6]:
from Bio import Entrez
from Bio import Medline

## Medical Subject Headings (MeSH)

In [7]:
def fetch_data(pmids):
    """Returns MEDLINE record associated with the PMID(s)"""
    
    Entrez.email = 'akishirsath@gmail.com'

    handle = Entrez.efetch(db="pubmed", 
                           id=pmids, 
                           rettype="medline", 
                           retmode="text")

    records = Medline.parse(handle)    
    
    return list(records)

In [8]:
pmid = '25006672'

In [9]:
data = fetch_data(pmid)
data

[{'AB': "Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearson's chi-square test for independence. To characterize topological properties of the network we adopted some specific measures, including diameter, average path length, clustering coefficient, and degree distribution. For the full MeSH network the average path length was 1.95 with a diameter of three edges and clustering coefficient of 0.26. The Kolmogorov-Sm

In [14]:
data_dict = data[0]

In [15]:
data_dict

{'AB': "Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearson's chi-square test for independence. To characterize topological properties of the network we adopted some specific measures, including diameter, average path length, clustering coefficient, and degree distribution. For the full MeSH network the average path length was 1.95 with a diameter of three edges and clustering coefficient of 0.26. The Kolmogorov-Smi

In [16]:
for field_desc, data in (data_dict.items()):
  print(f"{field_desc}\t{data}")

PMID	25006672
OWN	NLM
STAT	MEDLINE
DCOM	20160315
LR	20211021
IS	1932-6203 (Electronic) 1932-6203 (Linking)
VI	9
IP	7
DP	2014
TI	Large-scale structure of a network of co-occurring MeSH terms: statistical analysis of macroscopic properties.
PG	e102188
LID	10.1371/journal.pone.0102188 [doi]
AB	Concept associations can be represented by a network that consists of a set of nodes representing concepts and a set of edges representing their relationships. Complex networks exhibit some common topological features including small diameter, high degree of clustering, power-law degree distribution, and modularity. We investigated the topological properties of a network constructed from co-occurrences between MeSH descriptors in the MEDLINE database. We conducted the analysis on two networks, one constructed from all MeSH descriptors and another using only major descriptors. Network reduction was performed using the Pearson's chi-square test for independence. To characterize topological properties 

In [17]:
data_dict.get('MH', 'NONE')

['Algorithms',
 'Computational Biology/*methods',
 'Humans',
 '*Medical Subject Headings',
 'Models, Statistical',
 'Principal Component Analysis']

## Biological Named-entity recognition (BioNER)

Biomedical named entity recognition (BioNER) is an important and challenging task for understanding biomedical texts. It aims to recognize named entities (NEs), such as diseases, gene, species, etc., in biomedical texts and plays an important role in many downstream natural language processing (NLP) tasks, such as drug-drug interaction task and knowledge base completion

In [10]:
!pip install -q nxviz
!pip install -q scispacy
!pip install -q https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bc5cdr_md-0.4.0.tar.gz

[K     |████████████████████████████████| 4.8 MB 10.7 MB/s 
[K     |████████████████████████████████| 6.4 MB 33.5 MB/s 
[K     |████████████████████████████████| 11.2 MB 51.0 MB/s 
[K     |████████████████████████████████| 895 kB 51.9 MB/s 
[K     |████████████████████████████████| 76 kB 5.9 MB/s 
[K     |████████████████████████████████| 1.1 MB 46.2 MB/s 
[K     |████████████████████████████████| 216 kB 49.5 MB/s 
[K     |████████████████████████████████| 558 kB 61.2 MB/s 
[K     |████████████████████████████████| 181 kB 61.0 MB/s 
[K     |████████████████████████████████| 130 kB 46.4 MB/s 
[K     |████████████████████████████████| 63 kB 1.6 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you

In [11]:
import scispacy
import spacy

nlp = spacy.load("en_ner_bc5cdr_md")