## Dataset Exploration
First of all we try to get information about the structure of the dataset, by exploring the T-Box (Terminological Box). 

In [None]:
#count of all the triples in the dataset
import sparql_dataframe
endpoint = 'https://dati.cultura.gov.it/sparql'

query_triple_count = '''
    SELECT (COUNT (*) AS ?tripleCount) 
    WHERE {
        ?s ?p ?o .
    }
'''

df = sparql_dataframe.get(endpoint, query_triple_count)
print(f'The total number of triples is:\n {df}')

### Predicates
Then we procede by obtaining the predicates and the number of their occurrences.

In [None]:
#the number of predicates
query_predicate_repetition = '''
    SELECT ?p (COUNT(?p) AS ?predicate)
    WHERE { 
    ?s ?p ?o .
    }
    GROUP BY ?p
    ORDER BY DESC(?predicate)
'''

df = sparql_dataframe.get(endpoint, query_predicate_repetition)
df.to_csv("numberOfPredicates.csv")

### Classes
Furthermore we retrieve all the classes, defined both with the type rdfs:Class and owl:Class. Since the dataset is composed by sub-dataset we expect classes defined with both the types.

In [None]:
#all the classes expressed with the type rdfs:Class.
query_classes = '''
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT DISTINCT ?c
    WHERE {
        ?c a rdfs:Class .
    }
    ORDER BY ?c
'''
df = sparql_dataframe.get(endpoint, query_classes)
df.to_csv("Classes(rdfs).csv")

In [None]:
#all the classes expressed with the type owl:Class.
query_classes = '''
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    SELECT DISTINCT ?c
    WHERE {
        ?c a owl:Class .
    }
    ORDER BY ?c
'''
df = sparql_dataframe.get(endpoint, query_classes)
df.to_csv("Classes(owl).csv")

### Concepts
Since this dataset could have incorporated other dataset which have their own definition of classes, we have tried also to make a list of the concepts, by extracting every concept that describes a subject, through the property rdfs:type (or a).

In [None]:
query_concept = '''
    SELECT DISTINCT ?concept 
    WHERE {
    ?s a ?concept .
    }
'''
df = sparql_dataframe.get(endpoint, query_concept)
df.to_csv("Concepts.csv")