# Analyse Modern Science Ontology

### Importing Required Libraries

In this cell, we are importing the following libraries:

- `json`: Used for working with JSON data.
- `pandas`: Used for data structurization.
- `deque` from `collections`: Used for creating a double-ended queue.
- `Graph` and `plugin` from `rdflib`: Used for working with RDF data.
- `SPARQLWrapper` from `SPARQLWrapper`: Used for querying RDF data using SPARQL.
- `Serializer` from `rdflib.serializer`: Used for serializing RDF data.

These libraries are necessary for the subsequent cells in this Jupyter Notebook.

In [21]:
import json
import pandas as pd
from collections import deque
from rdflib import Graph, plugin
from SPARQLWrapper import SPARQLWrapper
from rdflib.serializer import Serializer

In [2]:
input_file = "ontology.ttl"

### Parsing the Ontology

In this cell, we are parsing the ontology file using the `Graph().parse()` method from the `rdflib` library. The ontology file is specified by the `input_file` variable, which contains the path to the ontology file.

After parsing the ontology, we print the graph object `g` to display the parsed ontology.

This step is necessary for further analysis and querying of the ontology data.

In [None]:
g = Graph().parse(input_file, format='ttl')
print(g)

### Querying the Ontology

In the following, we are querying the ontology using the `g.query()` method from the `rdflib` library. The query is specified in the SPARQL format and is used to count the number of classes in the ontology.

The query selects all instances of the `owl:Class` and retrieves their labels using the `rdfs:label` property. The result is the count of classes, which is assigned to the variable `?at`.

This step is necessary for further analysis and understanding of the ontology structure.

In [37]:
qres = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>

        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
       SELECT (count(?a) as ?at)
       WHERE {
       ?a a owl:Class .
       ?a rdfs:label ?b .
       }""")

In [None]:
for row in qres:
    print(row.at)

In [4]:
modsci_json = g.serialize(format='json-ld', indent=4)

In [None]:
print(modsci_json)

In [None]:
modsci = pd.read_json(modsci_json)
modsci.head(50)

In [7]:
modsci1 = modsci[["@id","http://www.w3.org/2000/01/rdf-schema#subClassOf"]][modsci['@id'].str.contains("w3id") 
                                                                            &
                                                                            modsci['http://www.w3.org/2000/01/rdf-schema#subClassOf'].notnull()]

In [None]:
modsci1.head(20)

In [None]:
print("{}".format(len(modsci1)))

In [10]:
concepts = dict()
for index, row in modsci1.iterrows():
        concepts[row['@id']] = 1

In [None]:
len(concepts)

In [12]:
modsci1 = modsci1[["@id","http://www.w3.org/2000/01/rdf-schema#subClassOf"]][~modsci1['@id'].str.contains("https://w3id.org/skgo/modsci#ModernScience")]

In [None]:
unhier = dict()
max_length = 0
for index, row in modsci1.iterrows():
    length = 0
    if row['@id'] not in unhier:
        unhier[row['@id']] = list()
    parents = row["http://www.w3.org/2000/01/rdf-schema#subClassOf"]
    for parent in parents:
        unhier[row['@id']].append(parent['@id'])
        length += 1
        if length > max_length:
            max_length = length
            print("{}: {} parents".format(row['@id'], max_length))

In [None]:
print(max_length)

In [None]:
print(json.dumps(unhier, indent=4))

In [16]:
def find_max_depth(concepts, unhier):
    for concept, value in concepts.items():
        visited = set()
        queue = deque()
        max_depth = value
        queue.append({"t": concept, "d": value})

        while queue:
            dequeued = queue.popleft()
            concept_name = dequeued["t"]
            depth = dequeued["d"]

            if concept_name in visited:
                continue

            visited.add(concept_name)

            if concept_name in unhier:
                broaders = unhier[concept_name]
                new_depth = depth + 1
                if new_depth > max_depth:
                    max_depth = new_depth
                for broader in broaders:
                    queue.append({"t": broader, "d": depth + 1})

        concepts[concept] = max_depth

In [None]:
find_max_depth(concepts, unhier)
print(concepts)

In [18]:
list_of_depths = pd.DataFrame.from_dict(concepts, orient='index', columns=['depth'])

In [None]:
list_of_depths.sort_values('depth', inplace=True, ascending=False)
list_of_depths.head()

In [None]:
print(json.dumps(unhier,indent=4))