# Analyse Nature Subjects
Get the number of concepts and the depth of the ontology

It is important to have rdflib, SPARQLWrapper, pandas, and numpy
* pip install rdflib
* pip install SPARQLWrapper
* pip install pandas
* pip install numpy

Download the lastest version of Nature from http://data.nature.com/downloads/latest/ttl/npg-subjects-ontology.ttl

In [3]:
from rdflib import Graph
from rdflib.namespace import RDFS
from rdflib import URIRef
import rdflib
import json
from collections import deque
import numpy as np
import pandas as pd

In [5]:
input_file = "npg-subjects-ontology.ttl"
g = Graph()
g.parse(input_file, format="ttl")

<Graph identifier=Nc1642de6475b458db88f1b9d61732cef (<class 'rdflib.graph.Graph'>)>

In [16]:
qres = g.query(
    """PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
       SELECT DISTINCT ?a
       WHERE {
          ?a a <http://ns.nature.com/terms/Subject> .
       }""")


topics = dict()
for row in qres:
    topics[row[0]] = True
    
print("Number of concepts: {}".format(len(topics)))

Number of concepts: 2636


In [7]:
qres = g.query(
    """PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
       SELECT DISTINCT ?a ?b
       WHERE {
          ?a skos:broader ?b .
       }""")

broaders = dict()
narrowers = dict()
for row in qres:
    if row[0] not in broaders:
        broaders[row[0]] = list()
    broaders[row[0]].append(row[1])
    if row[1] not in narrowers:
        narrowers[row[1]] = list()
    narrowers[row[1]].append(row[0])

In [8]:
unhier = broaders
concepts = topics
for concept, value in concepts.items():
    queue = deque() 
    max_depth = value
    queue.append({"t":concept,"d":value})
    while len(queue) > 0:
        dequeued = queue.popleft()
        if dequeued["t"] in unhier:
            broads = unhier[dequeued["t"]]
            new_depth = dequeued["d"]+1
            if new_depth > max_depth:
                max_depth = new_depth
            for broader in broads:
                queue.append({"t":broader,"d":dequeued["d"]+1})
    
    concepts[concept] = max_depth

In [11]:
list_of_depths = pd.DataFrame.from_dict(concepts, orient='index', columns=['depth'])
list_of_depths.sort_values('depth', inplace=True, ascending=False)

In [12]:
print("Concepts are ranked by maximum depth")
list_of_depths.head(20)

Concepts are ranked by maximum depth


Unnamed: 0,depth
http://ns.nature.com/subjects/t-helper-17-cells,8
http://ns.nature.com/subjects/t-helper-2-cells,8
http://ns.nature.com/subjects/regulatory-t-cells,8
http://ns.nature.com/subjects/follicular-t-helper-cells,8
http://ns.nature.com/subjects/t-helper-1-cells,8
http://ns.nature.com/subjects/cytotoxic-t-cells,8
http://ns.nature.com/subjects/stroke,7
http://ns.nature.com/subjects/follicular-b-cells,7
http://ns.nature.com/subjects/t-cell-receptor,7
http://ns.nature.com/subjects/hepatitis-b,7


In [14]:
print("If it does not print anything after this line, it means Nature Subjects is monohierarchical (a narrower has only one broader)")
for k, v in broaders.items(): 
    if len(v) > 1: 
        print("{} has {} parents".format(k, len(v)))

If it does not print anything after this line, it means Nature Subjects is monohierarchical (a narrower has only one broader)
http://ns.nature.com/subjects/supercontinuum-generation has 2 parents
http://ns.nature.com/subjects/stress-and-resilience has 2 parents
http://ns.nature.com/subjects/parkinsons-disease has 4 parents
http://ns.nature.com/subjects/nucleic-acid-therapeutics has 2 parents
http://ns.nature.com/subjects/peripheral-nervous-system has 2 parents
http://ns.nature.com/subjects/post-translational-modifications has 3 parents
http://ns.nature.com/subjects/experimental-organisms has 2 parents
http://ns.nature.com/subjects/electroencephalography-eeg has 2 parents
http://ns.nature.com/subjects/nmr-spectroscopy has 4 parents
http://ns.nature.com/subjects/finance has 2 parents
http://ns.nature.com/subjects/neural-stem-cells has 3 parents
http://ns.nature.com/subjects/sirnas has 2 parents
http://ns.nature.com/subjects/genetic-interaction has 2 parents
http://ns.nature.com/subjects/