# Analyse all Ontologies from Open Biological and Biomedical Ontology Foundry

### Importing Required Libraries

In this cell, we import the following libraries:

- `rdflib`: This library is used for working with RDF (Resource Description Framework) data.
- `deque` from `collections`: This library provides a double-ended queue implementation.
- `pprint` from `pprint`: This library is used for pretty-printing data structures.
- `pandas` as `pd`: This library is used for data manipulation and analysis.
- `os`: This library provides a way of using operating system dependent functionality.

These libraries are necessary for the subsequent code execution in this Jupyter Notebook document.

In [None]:
import rdflib
from collections import deque
import pprint as pprint
import pandas as pd
import os

### Set the Path for Ontologies

In this cell, we set the path for the ontologies. The path is specified as `./ontologies`, which means that the ontologies are located in a directory named "ontologies" in the current working directory.

This path will be used in subsequent cells to load and process the ontologies.

In [None]:
PATH = './ontologies'

### Get Maximum Depth of Concepts in an Ontology

The following code defines a function `get_max_depth` that calculates the maximum depth of concepts in an ontology.



In [None]:
def get_max_depth(source_ontology):
    g=rdflib.Graph()

    try:
        g.parse(os.path.join(PATH, source_ontology))
        # Parse the ontology file

    except:
        print('FILE {} not found'.format(file))
    
    concepts = dict()
    for s,p,o in g:    
        if s.find('purl.obolibrary.org/obo') > 0:
            if str(s) not in concepts:
                concepts[str(s)] = 1
        if o.find('purl.obolibrary.org/obo') > 0:
            if str(o) not in concepts:
                concepts[str(o)] = 1
    
    num_of_hier_rel = 0            
    unhier = dict()
    for s,p,o in g:
        if p.find('rdf-schema#subClassOf') > 0:
            if s.find('purl.obolibrary.org/obo') > 0 and o.find('purl.obolibrary.org/obo') > 0:
                if str(s) not in unhier:
                    unhier[str(s)] = list()
                unhier[str(s)].append(str(o))
                num_of_hier_rel += 1
    # Cleaning
    for key, value in unhier.items():
        unhier[key] = list(set(value))
                
    LIMIT = 60
    for concept, value in concepts.items():
        queue = deque() 
        max_depth = value
        queue.append({"t":concept,"d":value})
        while len(queue) > 0:
            dequeued = queue.popleft()
            if dequeued["t"] in unhier:
                broaders = unhier[dequeued["t"]]
                new_depth = dequeued["d"]+1
                if new_depth > max_depth:
                    max_depth = new_depth
                if new_depth > LIMIT:
                    break
                for broader in broaders:
                    queue.append({"t":broader,"d":dequeued["d"]+1})

        concepts[concept] = max_depth

    list_of_depths = pd.DataFrame(list(concepts.items()), columns=['concept','depth'])
    list_of_depths.sort_values('depth', inplace=True, ascending=False)

    return {'ontology':source_ontology, 
            'num_concepts': len(concepts),
            'num_of_hier_rel': num_of_hier_rel,
            'most_deep_concept': list_of_depths.iloc[0]['concept'] if len(list_of_depths) > 0 else 'na', 
            'max_depth': list_of_depths.iloc[0]['depth'] if len(list_of_depths) > 0 else 0 }

### Calculate Maximum Depth for Specific Ontology

In this cell, we calculate the maximum depth of concepts in the "vto.owl" ontology using the `get_max_depth()` function. 

The result is then printed using the `print()` function.

In [None]:
result = get_max_depth("vto.owl")
print(result)

In [None]:
files = []

### Add New Files to the List

In this code, we use a loop to iterate through the files in the specified directory using `os.walk()`. For each file, we check if it has the extension ".owl" and if it is not already in the `files` list. If both conditions are met, we append the file to the `files` list.

This code allows us to add new files to the list of files for further processing.

In [None]:
# r=root, d=directories, f = files
for r, d, f in os.walk(PATH):
    for file in f:
        if '.owl' in file and file not in files:
            files.append(file)

In [None]:
print(f'Number of Files: {len(files)} \n\n Name of Files: {files}')

In [None]:
ind = 0
df = pd.DataFrame()
for file in files:
    ind += 1
    print('Processing {} -> {}'.format(ind,file))
    result = get_max_depth(file)   
    df = df.append(result,ignore_index=True)
    print(result)

In [None]:
df[['ontology','num_concepts','num_of_hier_rel','most_deep_concept','max_depth']].to_csv('report_obo_ontologies_FINAL.csv')

In [None]:
ind = 0
for file in files:
    ind += 1
    print('Processing {} -> {}'.format(ind,file))
    if len(df[df['ontology'] == file]):
        continue
    result = get_max_depth(file)   
    df = df.append(result,ignore_index=True)
    print(result)

In [None]:
#manually add
df = df.append({'ontology':'micro.owl', 
            'num_concepts': 0,
            'num_of_hier_rel': 0,
            'most_deep_concept': 'na', 
            'max_depth': 0}
    ,ignore_index=True)


In [None]:
df.sort_values(by=["max_depth"], ascending=False).head(50)

In [None]:
df = pd.read_csv('report_obo_ontologies_V1.0.csv')

In [None]:
PATH = './obo-ontologies'
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(PATH):
    for file in f:
        if '.owl' in file:
            files.append(file)#os.path.join(r, file))

In [None]:
len(files)

### Process Files with Missing Data

In this code, we iterate through the list of files and check if the "most_deep_concept" column in the dataframe `df` for the corresponding ontology file is equal to 'na'. If it is, we process the file by calling the `get_max_depth()` function and printing the result.

This code is used to handle files that have missing data in the "most_deep_concept" column and update the dataframe `df` with the new results.

In [None]:
ind = 0
for file in files:
    ind += 1
    try:
        if df[df['ontology'] == file]['most_deep_concept'].iloc[0] == 'na':
            print('Processing {} -> {}'.format(ind,file))
            result = get_max_depth(file)   
            #df = df.append(result,ignore_index=True)
            print(result)
    except IndexError:
        print('ROW {} not found'.format(file))

In [None]:
df[df['ontology'] == file]['most_deep_concept'].iloc[0]

In [None]:
print(file)