# Analyse Physical Subject Headings

### Importing Required Libraries

In this cell, we import the following libraries:

- ``pprint``: to pretty print data structures
- ``deque``: to implement a double-ended queue

These libraries are necessary for the subsequent code execution.

In [None]:
import json
import pprint
from collections import deque

### Loading Data

In this cell, we load the data from the file 'Physical-Subject-Headings.json' into a variable called `data`. The data is stored in JSON format.

In [None]:
with open('Physical-Subject-Headings.json','r') as f:
    data = json.load(f)

In [None]:
pprint.pprint(data)

In [None]:
print(json.dumps(data[1], indent=4))

### Counting Concept Types

In the following, we count the number of occurrences of each concept type in the `data` variable. The concept types are extracted from the `@type` field of each concept in the `data` variable.

The code snippets below shows how we count the concept types:

In [None]:
types = dict()
for concept in data:
    for type_t in concept["@type"]:
        if type_t not in types:
            types[type_t] = 0
        types[type_t] += 1

print(types)

In [None]:
#Getting Facets
for concept in data:
    for type_t in concept["@type"]:
        if type_t in 'https://physh.org/rdf/2018/01/01/core#Facet':
            print(json.dumps(concept["http://www.w3.org/2004/02/skos/core#prefLabel"][0]["@value"], indent=4))

In [None]:
#Getting Discipline
for concept in data:
    for type_t in concept["@type"]:
        if type_t in 'https://physh.org/rdf/2018/01/01/core#Discipline':
            print(json.dumps(concept["http://purl.org/dc/terms/title"][0]["@value"], indent=4))

In [None]:
#Getting Discipline
for concept in data:
    for type_t in concept["@type"]:
        if type_t in 'https://physh.org/rdf/2018/01/01/core#Discipline':
            print(json.dumps(concept, indent=4))

In [None]:
print("Number of concepts: {}".format(len(data)))

In [None]:
concepts = dict()
for concept in data:
    _id = concept['@id']
    if _id not in concepts:
        concepts[_id] = 1

In [None]:
print(len(concepts))

In [None]:
hier = dict()

for concept in data:
    if 'http://www.w3.org/2004/02/skos/core#narrower' in concept:
        _id = concept['@id']
        narrowers = concept['http://www.w3.org/2004/02/skos/core#narrower']
        for narrower in narrowers:
            if _id not in hier:
                hier[_id] = list()
            hier[_id].append(narrower['@id'])

In [None]:
pprint.pprint(hier)

In [None]:
unhier = dict()

In [None]:
for concept in data:
    if 'http://www.w3.org/2004/02/skos/core#narrower' in concept:
        _id = concept['@id']
        narrowers = concept['http://www.w3.org/2004/02/skos/core#narrower']
        for narrower in narrowers:
            if narrower['@id'] not in unhier:
                unhier[narrower['@id']] = list()
            unhier[narrower['@id']].append(_id)

In [None]:
pprint.pprint(unhier)

In [None]:
for concept, value in concepts.items():
    queue = deque() 
    max_depth = value
    queue.append({"t":concept,"d":value})
    while len(queue) > 0:
        dequeued = queue.popleft()
        if dequeued["t"] in unhier:
            broaders = unhier[dequeued["t"]]
            new_depth = dequeued["d"]+1
            if new_depth > max_depth:
                max_depth = new_depth
            for broader in broaders:
                queue.append({"t":broader,"d":dequeued["d"]+1})
    
    concepts[concept] = max_depth

In [None]:
pprint.pprint(concepts)

In [None]:
import pandas as pd
list_of_depths = pd.DataFrame.from_dict(concepts, orient='index', columns=['depth'])

In [None]:
list_of_depths.sort_values('depth', inplace=True, ascending=False)
list_of_depths.head()