# Analyse the ACM Computing Classification Scheme
Get the number of concepts and the depth of the ontology

It is important to have rdflib, SPARQLWrapper, pandas, tqdm, and numpy
* pip install rdflib
* pip install SPARQLWrapper
* pip install pandas
* pip install numpy
* pip install tqdm


The 2012 version of ACM CCS is already available in this repository (RDF/XML format).

However, you can download the lastest version from https://dl.acm.org/ccs

The 2012 ACM Computing Classification System has been developed as a **poly-hierarchical** ontology that can be utilized in semantic web applications. It replaces the traditional 1998 version of the ACM Computing Classification System (CCS), which has served as the de facto standard classification system for the computing field. It is being integrated into the search capabilities and visual topic displays of the Digital Library. It relies on a semantic vocabulary as the single source of categories and concepts that reflect the state of the art of the computing discipline and is receptive to structural change as it evolves in the future. ACM provides a tool within the visual display format to facilitate the application of CCS categories to forthcoming papers and a process to ensure that the CCS stays current and relevant. The CCS visual display has both Interactive and Flat views of the classification tree. You may also opt to download the CCS SKOS file. The new classification system will play a key role in the development of a people search interface in the ACM Digital Library to supplement its current traditional bibliographic search.

 

Authors, an important aspect of preparing your paper for publication by ACM Press is to provide the proper indexing and retrieval information from the ACM Computing Classification System (CCS). This is beneficial to you because accurate categorization provides the reader with quick content reference, facilitating the search for related literature, as well as searches for your work in ACM's Digital Library and on other online resources.

In [1]:
from rdflib import Graph
from rdflib.namespace import RDFS
from rdflib import URIRef
import rdflib
import json
from collections import deque
import numpy as np
from tqdm import tqdm

In [2]:
input_file = "ACM-CCS-2012.xml"
g = Graph()
g.parse(input_file)

<Graph identifier=Na6b9f18f1fdd4c87a04e0a11b16586b1 (<class 'rdflib.graph.Graph'>)>

In [3]:
def clean_concept(concept):
    return concept.split("/")[-1]

In [4]:
qres = g.query(
    """SELECT DISTINCT ?a
       WHERE {
          ?a rdf:type skos:Concept .
       }""")
topics = dict()
for row in qres:
    topics[clean_concept(row[0])] = True
print(len(topics))  

2113


In [5]:
qres = g.query(
    """SELECT DISTINCT ?a ?b
       WHERE {
          ?a skos:narrower ?b .
       }""")

broaders = dict()
narrowers = dict()
for row in qres:
    if clean_concept(row[0]) not in narrowers:
        narrowers[clean_concept(row[0])] = list()
    narrowers[clean_concept(row[0])].append(clean_concept(row[1]))
    if clean_concept(row[1]) not in broaders:
        broaders[clean_concept(row[1])] = list()
    broaders[clean_concept(row[1])].append(clean_concept(row[0]))

# Checking if it is Polyhierarchical

ACM claims that the Computing Classification System is **Poly-hierarchical**

In [6]:
count = 0
for key, broad in broaders.items():
    if (len(broad) > 1):
        count += 1
        #print(key, broad)
        
print("Found {} topics that have more than one parent".format(count))

Found 0 topics that have more than one parent


# Assessing the depth

In [7]:
unhier = broaders
concepts = topics
with tqdm(total=len(concepts)) as pbar:
    for concept, value in concepts.items():       
        queue = deque() 
        max_depth = value
        queue.append({"t":concept,"d":value})
        while len(queue) > 0:
            dequeued = queue.popleft()
            if dequeued["t"] in unhier:
                broads = unhier[dequeued["t"]]
                new_depth = dequeued["d"]+1
                if new_depth > max_depth:
                    max_depth = new_depth
                for broader in broads:
                    queue.append({"t":broader,"d":dequeued["d"]+1})

        concepts[concept] = max_depth
        pbar.update(1)

100%|██████████| 2113/2113 [00:00<00:00, 228063.93it/s]


In [8]:
import pandas as pd
list_of_depths = pd.DataFrame.from_dict(concepts, orient='index', columns=['depth'])

In [9]:
list_of_depths.sort_values('depth', inplace=True, ascending=False)
list_of_depths.head()

Unnamed: 0,depth
10011007.10010940.10010941.10010949.10010965.10010968,6
10003752.10003809.10003716.10011136.10011797.10011798,6
10011007.10010940.10010941.10010942.10010944.10010947,6
10011007.10010940.10010941.10010942.10010944.10010946,6
10011007.10010940.10010941.10010942.10010944.10010945,6


# Top Concepts

In [10]:
top_concepts = set(narrowers.keys())-set(broaders.keys())
for concept in top_concepts:
    print(concept)

10003033
10003752
10010147
10003456
10010520
10010583
10003120
10002944
10002950
10002951
10010405
10011007
10002978
