## Cultural Properties
Arco ontology tries to give the widest representation of italian cultural heritage, inglobing all the main classifications provided by international organizations. However, since our analysis aims at the exploration of all the italian cultural sites, not all the cultural heritage will be included in the data we will extract. We have identified two main predicates that will be able to filter cultural heritage, e.g. natural heritage, that does not give information about cultural sites, and these are: 
- a-loc:hasCulturalInstituteOrSite: it links a cultural property to the cultural institute or site that is its juridical collocation.
- a-cd:isLocationOf: it links one cultural property to another one that is its collocation.

In [2]:
from sparql_dataframe import get
endpoint = 'https://dati.cultura.gov.it/sparql'

In [1]:
query = '''
SELECT DISTINCT ?s WHERE {
?s ?p ?o.
FILTER(?p =  a-loc:hasCulturalInstituteOrSite || ?p = a-cd:isLocatedIn)
}
'''

CulturalProperties = get(endpoint, query)
CulturalProperties

Unnamed: 0,s
0,https://w3id.org/arco/resource/HistoricOrArtis...
1,https://w3id.org/arco/resource/HistoricOrArtis...
2,https://w3id.org/arco/resource/HistoricOrArtis...
3,https://w3id.org/arco/resource/HistoricOrArtis...
4,https://w3id.org/arco/resource/HistoricOrArtis...
...,...
29995,https://w3id.org/arco/resource/NumismaticPrope...
29996,https://w3id.org/arco/resource/NumismaticPrope...
29997,https://w3id.org/arco/resource/NumismaticPrope...
29998,https://w3id.org/arco/resource/NumismaticPrope...


## a-loc:hasCulturalInstituteOrSite

We procede getting the cultural institute's names, and their relative sites, since the same site can host more than one cultural institute. We keep as information the number of cultural properties per institute.<br>
Since we need to parse a large number of data we will retrieve them in two separates tables. We also want to keep the IRI in order to have a unique identifier both for institutes and sites that will help us to merge our tables.

In [4]:
#table with institution - count of CP - name of the insitution
query = '''
SELECT DISTINCT(?institute) COUNT(DISTINCT(?s) as ?count) ?instituteLabel  WHERE {
?s ?p ?o.
FILTER(?p =  a-loc:hasCulturalInstituteOrSite || ?p = a-cd:isLocatedIn)
?s a-loc:hasCulturalInstituteOrSite ?institute.
?institute rdfs:label ?instituteLabel.
}
'''


CulturalInstitutes = get(endpoint, query)
CulturalInstitutes.to_csv("cultural_institutes.csv")

query = '''SELECT DISTINCT(?institute) ?site ?siteLabel WHERE {
?s ?p ?o.
FILTER(?p =  a-loc:hasCulturalInstituteOrSite || ?p = a-cd:isLocatedIn)
?s a-loc:hasCulturalInstituteOrSite ?institute.
?institute cis:hasSite ?site.
?site rdfs:label ?siteLabel
}
'''
CulturalSites = get(endpoint, query)
CulturalSites.to_csv("cultural_sites.csv")



While the predicate a-loc:hasCulturalInstituteOrSite links an object to a cultural institute or site (museums, libraries, archives, archaeological sites, monumental building), the predicate cis:hasSite is in the range of the class cis:Site which defines a georeferences physical space. An institute can have more sites associated to it. Since our analysis is spatial, the sites will be more relevant. As for the cultural properties, we can build another table with the number of institutes per site.

In [None]:
query = '''
SELECT COUNT(DISTINCT(?institute) as ?count) ?site ?siteLabel WHERE {
?s ?p ?o.
FILTER(?p =  a-loc:hasCulturalInstituteOrSite || ?p = a-cd:isLocatedIn)
?s a-loc:hasCulturalInstituteOrSite ?institute.
?institute cis:hasSite ?site.
?site rdfs:label ?siteLabel
}
'''

InstituteCount = get(endpoint, query)
InstituteCount.to_csv("institute_count.csv")

## a-cd:isLocatedIn