<h1 style="text-align:center">Cultural Sites in Italy</h1>
Welcome in the documentation of the information visualization's project Cultural Sites in Italy. The project aims at exploring the data collected in data.beniculturali.it, related to cultural properties, collected by Arco, and cultural events, collected by the MIBACT (today MIC).<br>
The notebook will be devided in two sections: 
<ul>
<li>Data exploration</li>
<li>Data management</li>
</ul>

Each section will be divided according to the two datasets: the first part will be dedicated to events, while the second to properties.

Table of contents: 
* [1. Data Exploration](#dataExpl):
    * [1.1 Cultural Events](#DECE)
    * [1.2 Cultural Properties](#DECP)
* [2. Data Management](#datamngmt):
    * [2.1 Geospatial information](#geoinf)
        * [2.1.1 Cultural Events' map](#GICE)
        * [2.1.2 Cultural Properties' map](#GICP)
    * [2.2 Additional information](addinf)
        * [2.2.1 Cultural Events' sites per region](#AICE)
        * [2.2.2 Cultural Propreties' institutes per region](#AICP)

Before starting, let's import all the libraries necessary and assign the endpoint variables that we will use later:

In [1]:
from pandas import *

from bokeh.plotting import figure
from bokeh.tile_providers import get_provider, CARTODBPOSITRON
from bokeh.io import output_notebook, show, export_png
from bokeh.models import LogColorMapper, ColumnDataSource
from bokeh.palettes import Oranges256 as oranges

from pyproj import Proj, transform

import rdflib

from sparql_dataframe import get
endpoint = 'https://dati.cultura.gov.it/sparql'

endpointDB = 'https://dbpedia.org/sparql'



<h2 style="text-align:center" id="dataExpl">1. Data Exploration</h2>

<h3 style="text-align:center" id="DECE">1.1 Cultural Events</h3>
The <b>CIS <a href="http://dati.beniculturali.it/lode/extract?lang=it&url=https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/Ontologie/Cultural-ON/v3.2/Cultural-ON-AP_IT.rdf">(Cultural Institute/Site and Cultural Event)</a> ontology</b> aims at modelling the data on cultural institutes or sites such as data regarding the agents that play a specific role on cultural institutes or sites, the sites themselves, the contact points, all multimedia files which describe the cultural institute or site and any other information useful to the public in order to access the institute or site.

We specificially employed it to extract all the cultural events present in the dataset through the property **cis:CulturalEvent**.

We get also the cultural events' names, through using the property, **rdfs:label**. We kept the IRI in order to have a unique identifier both for institutes and sites that will help us to merge our tables.

Since we promptly employed the CIS ontology for the event parts of our research, we only employed the part of the dataset which used this ontology to actually be more coherent with what we worked with, which entailed that we used only the data employed by the MiBACT; to do so, a filter was employed to obtain such a thing.

In [None]:
query_events = """
SELECT DISTINCT ?s ?event WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
FILTER (contains(str(?s), "mibact"))


}
"""
df = sparql_dataframe.get(endpoint, query_events)

We then employed the property **cis:isHostedBySite** to extract all the sites in which the events were held as this specific property links the **Event** to the **Site** of the Cultural Institute or Site. 

Also in this case we actually extracted both the IRIs and the label through the same property as the one mentioned above.

In many cases the same site had different URLs, and a similar situation happened with the same event having different URLs; in both cases it was due to the fact that different URLs did refer to sites or events with the same name but with different informations, hence we counted them separately.

In [None]:
query_site = """
SELECT DISTINCT ?s ?event ?o ?site  WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
?s cis:isHostedBySite ?o.
?o rdfs:label ?site.
}
"""
df = sparql_dataframe.get(endpoint, query_site)

Then we examined various ways to retrieve the city of each event, examing the various predicates associated to them.

The most appropriate and helpful way for us was to first extract the address through **cis:siteAddress** and then from here to get the city through the employment of another ontology <a href="https://ontopia-lode.agid.gov.it/lode/extract?url=https://w3id.org/italia/onto/CLV">(Address (Location) Ontology)</a> and its property **clavpit:hasCity**.

In [None]:
query_culture_events = """
SELECT DISTINCT ?s ?event ?o ?site ?urlcity ?city WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
?s cis:isHostedBySite ?o.
?o rdfs:label ?site.
?o cis:siteAddress ?address.
?address clvapit:hasCity ?urlcity.
?urlcity rdfs:label ?city
}
"""
df = sparql_dataframe.get(endpoint, query_culture_events)

To further clear up this file from any duplicates that might appear we used the pandas method **drop_duplicates** in order to obtain a clearer table of all the data that we obtained together.

In [None]:
df=df.drop_duplicates(["s", "o"])

Then we worked onto counting first the **events per sites**

In [None]:
query_site = """
SELECT DISTINCT (count (?s) as ?count) ?site WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
?s cis:isHostedBySite ?o.
?o rdfs:label ?site.
?o cis:siteAddress ?address.
?address clvapit:hasCity ?urlcity.
?urlcity rdfs:label ?city
}
"""
df_count_sites = sparql_dataframe.get(endpoint, query_site)

Then the **events per city**

In [None]:
query_city = """
SELECT DISTINCT (count (?s) as ?count) ?urlcity ?city WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
?s cis:isHostedBySite ?o.
?o rdfs:label ?site.
?o cis:siteAddress ?address.
?address clvapit:hasCity ?urlcity.
?urlcity rdfs:label ?city
}
"""

df_city = sparql_dataframe.get(endpoint, query_city)

After having extracted all the geographical information, we then moved onto working into the temporal one.

We did this through the employment of the **Time ontology** <a href="https://ontopia-lode.agid.gov.it/lode/extract?url=https://w3id.org/italia/onto/TI">(Italian application profile)</a> in the properties **tiapit:atTime**.

This actually gave us a time interval composed by a **start time** and **end time**, which were then explicited through the xsd format with the properties **tiapit:startTime** and **tiapit:endTime**.

In [None]:
query_time="""
SELECT DISTINCT ?s ?time ?starttime ?endtime WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
?s tiapit:atTime ?time.
?time tiapit:startTime ?startTime;
tiapit:endTime ?endtime.
}"""
df_time = sparql_dataframe.get(endpoint, query_time)

Similarly as we had done with the geographical information we counted how many events through the years.

In [None]:
query_count_time ="""
SELECT DISTINCT (count (?s) as ?count)  (year(xsd:dateTime(?starttime)) as ?StartYear) (year(xsd:dateTime(?endtime)) as ?EndYear) WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
?s tiapit:atTime ?time.
?time tiapit:startTime ?starttime;
tiapit:endTime ?entime.
?s cis:isHostedBySite ?o.
?o rdfs:label ?site.
?o cis:siteAddress ?address.
?address clvapit:hasCity ?urlcity.
}"""

df_time_count = sparql_dataframe.get(endpoint, query_count_time)

<h3 style="text-align:center" id="DECP">1.2 Cultural Properties</h3>
Arco ontology tries to give the widest representation of italian cultural heritage, inglobing all the main classifications provided by international organizations. The class <b>arco:CulturalProperty</b> is the higher class, however it is not always assigned to all the cultural properties in the dataset. Furthermore each cultural property can belong to different classes. For this reason, in order to retrieve all the cultural properties, we applyed a filter which lists all the possible classes and subclasses that have as type on of the cultural properties'classes.

In [None]:
query = '''
SELECT COUNT(DISTINCT(?CP) as ?count) ?o WHERE {
?CP a ?o.
FILTER (?o = arco:CulturalProperty || ?o = arco:IntangibleCulturalProperty || ?o = arco:TangibleCulturalProperty || ?o = arco:ArchaeologicalProperty || ?o = arco:ImmovableCulturalProperty || ?o = arco:ArchitecturalOrLandscapeHeritage || ?o = arco:HistoricOrArtisticProperty || ?o = arco:MusicHeritage || ?o = arco:NaturalHeritage || ?o = arco:BotanicalHeritage || ?o = arco:MineralHeritage || ?o = arco:PalaeontologicalHeritage || ?o = arco:PertologicHeritage || ?o = arco:PlanetaryScienceHeritage || ?o = arco:ZoologicalHeritage || ?o = arco:NumismaticProperty || ?o = arco:PhotographicHeritage || ?o = arco:ScientificOrTechnologicalHeritage)
}
'''

CulturalProperties = get(endpoint, query)
CulturalProperties

We procede getting the cultural institute's names. the predicate a-loc:hasCulturalInstituteOrSite links an object to a cultural institute or site (museums, libraries, archives, archaeological sites, monumental building). We keep as information the number of cultural properties per institute.</br>
From now on the subclasses which don't have instances will be removed from the filter. 

In [None]:
query = '''
SELECT DISTINCT(?institute) COUNT(DISTINCT(?s) as ?CP_count) ?instituteLabel WHERE {
?s a ?o.
FILTER (?o = arco:CulturalProperty || ?o = arco:IntangibleCulturalProperty || ?o = arco:ArchaeologicalProperty || ?o = arco:ImmovableCulturalProperty || ?o = arco:ArchitecturalOrLandscapeHeritage || ?o = arco:HistoricOrArtisticProperty || ?o = arco:MusicHeritage || ?o = arco:NaturalHeritage || ?o = arco:BotanicalHeritage || ?o = arco:MineralHeritage || ?o = arco:PlanetaryScienceHeritage || ?o = arco:ZoologicalHeritage || ?o = arco:NumismaticProperty || ?o = arco:PhotographicHeritage || ?o = arco:ScientificOrTechnologicalHeritage)
?s a-loc:hasCulturalInstituteOrSite ?institute.
?institute rdfs:label ?instituteLabel.
}
'''

CulturalInstitutes = get(endpoint, query)
CulturalInstitutes

In order to retreive the coordinates of our cultural properties we use the predicate clvapit:hasGeometry. Also, since there are a lot of cultural properties which belongs to the same institute, i.e. they have the same coordinates, we filter per institute. The institute is the juridical. The predicate a-loc:hasCulturalInstituteOrSite links an object to a cultural institute or site (museums, libraries, archives, archaeological sites, monumental building). An institute can have more sites, i.e. georeferences physical space, associated to it.

In [None]:
query = '''
SELECT DISTINCT(?institute) COUNT(DISTINCT(?s) as ?CP_count)  ?lat ?long WHERE {
?s a ?o.
FILTER (?o = arco:CulturalProperty || ?o = arco:IntangibleCulturalProperty || ?o = arco:TangibleCulturalProperty || ?o = arco:ArchaeologicalProperty || ?o = arco:ImmovableCulturalProperty || ?o = arco:ArchitecturalOrLandscapeHeritage || ?o = arco:HistoricOrArtisticProperty || ?o = arco:MusicHeritage || ?o = arco:NaturalHeritage || ?o = arco:BotanicalHeritage || ?o = arco:MineralHeritage || ?o = arco:PalaeontologicalHeritage || ?o = arco:PertologicHeritage || ?o = arco:PlanetaryScienceHeritage || ?o = arco:ZoologicalHeritage || ?o = arco:NumismaticProperty || ?o = arco:PhotographicHeritage || ?o = arco:ScientificOrTechnologicalHeritage)
?s a-loc:hasCulturalInstituteOrSite ?institute.
  ?s clvapit:hasGeometry ?geometry .
  ?geometry a-loc:hasCoordinates ?coordinates .
?coordinates a-loc:lat ?lat.
?coordinates a-loc:long ?long.
}
'''
Coordinates = get(endpoint, query)

In order to retrieve the cities we need to use two other properties: 
- cis:hasSite: it is in the range of the class cis:Site which defines a georeferences physical space. An institute can have more sites associated to it.
- cis:siteAddress: it is a subclass of clvapit:hasAddress. 
We want to know the city for each institute.

In [None]:
query = '''
SELECT DISTINCT(?institute) ?city ?cityLabel WHERE {
?CP a ?o.
FILTER (?o = arco:CulturalProperty || ?o = arco:IntangibleCulturalProperty || ?o = arco:TangibleCulturalProperty || ?o = arco:ArchaeologicalProperty || ?o = arco:ImmovableCulturalProperty || ?o = arco:ArchitecturalOrLandscapeHeritage || ?o = arco:HistoricOrArtisticProperty || ?o = arco:MusicHeritage || ?o = arco:NaturalHeritage || ?o = arco:BotanicalHeritage || ?o = arco:MineralHeritage || ?o = arco:PalaeontologicalHeritage || ?o = arco:PertologicHeritage || ?o = arco:PlanetaryScienceHeritage || ?o = arco:ZoologicalHeritage || ?o = arco:NumismaticProperty || ?o = arco:PhotographicHeritage || ?o = arco:ScientificOrTechnologicalHeritage)
?CP a-loc:hasCulturalInstituteOrSite ?institute.
?institute cis:hasSite ?site.
?site cis:siteAddress ?address.
?address clvapit:hasCity ?city.
?city rdfs:label ?cityLabel
}
'''

cities = get(endpoint, query)
cities

<h2 style="text-align:center" id="datamngmt">2. Data Management</h2>
In this section we will: 
<ul>
    <li>retrieve geospatial information in order to plot them in an actual map;</li>
    <li>visualize some of the emerging pattern;</li>
    <li>address more specific research queries;</li>
</ul>

<h3 style="text-align:center" id="geoinf">2.1 Geospatial information</h3>

<h3 style="text-align:center" id="GICE">2.1.1 Cultural Events' map</h3>
In order to retreive information about longitude and latitude of the city which have hosted cultural events, we need to recur to the dbpedia database. For this reason, first we will collect the dbpedia link for each city and then we will query the dbpedia sparql endpoint to retrieve geospatial information.

In [None]:
query = '''
SELECT DISTINCT(?s) ?event ?dbpedia ?urlcity ?label WHERE {
 ?s a cis:CulturalEvent;
rdfs:label ?event.
?s cis:isHostedBySite ?o.
?o rdfs:label ?site.
?o cis:siteAddress ?address.
?address clvapit:hasCity ?urlcity.
?urlcity owl:sameAs ?urlcity2.
?urlcity2 rdfs:label ?label.
?urlcity2 owl:sameAs ?dbpedia.
FILTER (contains(str(?dbpedia), "dbpedia"))
}
'''

dbPediaLinks = get(endpoint, query)
dbPediaLinks

In [2]:
dbpedia = dbPediaLinks["dbpedia"].drop_duplicates()
link = []


CE_coordinates = DataFrame({"latitude":[], "longitude":[]})

for idx, item in dbpedia.items():
    query = '''
    SELECT ?latitude ?longitude WHERE {
       '''f'<{item}>''''  geo:lat ?latitude;
        geo:long ?longitude.
    } '''
    coordinates = get(endpointDB, query)
    link.append(item)
    CE_coordinates = concat([CE_coordinates, coordinates])

CE_coordinates.insert(0, "dbPedia", link)
CE_coordinates

Now we plot the cities in a map, using the python library bokeh.

In [None]:
CE_coordinates_cleaned = CE_coordinates.merge(dbPediaLinks, left_on="dbPedia", right_on="dbpedia")
CE_coordinates_cleaned = CE_coordinates_cleaned[["event", "label", "latitude", "longitude"]].drop_duplicates(subset= ["event"])
CE_coordinates_cleaned.reset_index()

# projection WGS 84 - used by GPS
inProj = Proj(init='epsg:4326')

# WGS84 Pseudo Web Mercator - projection used by most web services, e.g. Google Maps, OpenStreet Maps
outProj = Proj(init='epsg:3857')

CE_coordinates_cleaned['longitude'],CE_coordinates_cleaned['latitude'] = transform(inProj,outProj,CE_coordinates_cleaned['longitude'].values,CE_coordinates_cleaned['latitude'].values)
CE_coordinates_cleaned.head()

In [None]:
# create a dictionary with lists
source = ColumnDataSource(data=dict(
    lat=CE_coordinates_cleaned.latitude.values.tolist(),
    lon=CE_coordinates_cleaned.longitude.values.tolist(),
    name=CE_coordinates_cleaned.event.values.tolist(),
    keeper=CE_coordinates_cleaned.label.values.tolist()
))

In [None]:
# import tile
cartodb = get_provider('CARTODBPOSITRON')

# draw the frame
p = figure(outer_width=900, outer_height=700, # range EU/Africa
           x_axis_type="mercator", y_axis_type="mercator", # labels on axes
           tooltips=[ ("Name", "@name, @keeper")],
           title="Cultural Events sites")

# add tile
p.add_tile(cartodb)

# draw points
p.circle(x='lon', y='lat',
         size=5,
         fill_color="gold", line_color="gold",
         fill_alpha=0.3,
         source=source)

# add for colab output
output_notebook()

show(p)
export_png(p, filename="CitiesCE.png")

<h3 style="text-align:center" id="GICP">2.1.2 Cultural Properties' map</h3>
The same process will be followed for the cultural properties. Since the data have the property clvapit:hasGeometry, which through the predicate a-loc:hasCoordinates from Arco locations and geometry provide directly the latitude and longitude of the institutes, we will plot one map to explore those locations. 

Furthermore a second map will be produced through the latitude and longitude of the cities, retrieved from dbpedia, with the same procedure as for cultural events, in order to compare those two maps. 

**Cultural Properties' Institutes Map**

In [None]:
query = '''
SELECT DISTINCT(?CP) ?stripped_CPLabel ?institute ?instituteLabel ?lat ?long WHERE {
?CP a ?o.
FILTER (?o = arco:CulturalProperty || ?o = arco:IntangibleCulturalProperty || ?o = arco:TangibleCulturalProperty || ?o = arco:ArchaeologicalProperty || ?o = arco:ImmovableCulturalProperty || ?o = arco:ArchitecturalOrLandscapeHeritage || ?o = arco:HistoricOrArtisticProperty || ?o = arco:MusicHeritage || ?o = arco:NaturalHeritage || ?o = arco:BotanicalHeritage || ?o = arco:MineralHeritage || ?o = arco:PalaeontologicalHeritage || ?o = arco:PertologicHeritage || ?o = arco:PlanetaryScienceHeritage || ?o = arco:ZoologicalHeritage || ?o = arco:NumismaticProperty || ?o = arco:PhotographicHeritage || ?o = arco:ScientificOrTechnologicalHeritage)
?CP rdfs:label ?CPLabel;
a-loc:hasCulturalInstituteOrSite ?institute.
BIND (STR(?CPLabel)  AS ?stripped_CPLabel)
?institute rdfs:label ?instituteLabel.
  ?CP clvapit:hasGeometry ?geometry .
  ?geometry a-loc:hasCoordinates ?coordinates .
?coordinates a-loc:lat ?lat.
?coordinates a-loc:long ?long.
}
'''
CP_coordinates = get(endpoint, query)

In [None]:
# projection WGS 84 - used by GPS
inProj = Proj(init='epsg:4326')

# WGS84 Pseudo Web Mercator - projection used by most web services, e.g. Google Maps, OpenStreet Maps
outProj = Proj(init='epsg:3857')

CP_coordinates['long'],CP_coordinates['lat'] = transform(inProj,outProj,CP_coordinates['long'].values,CP_coordinates['lat'].values)
CP_coordinates.head()

In [None]:
# create a dictionary with lists
source = ColumnDataSource(data=dict(
    lat=CP_coordinates.lat.values.tolist(),
    lon=CP_coordinates.long.values.tolist(),
    name=CP_coordinates.stripped_CPLabel.values.tolist(),
    keeper=CP_coordinates.instituteLabel.values.tolist()
))

In [None]:
# import tile
cartodb = get_provider('CARTODBPOSITRON')

# draw the frame
p = figure(outer_width=900, outer_height=700,
            x_range=(2000000, 800000), y_range=(4300000, 6000000),
           x_axis_type="mercator", y_axis_type="mercator", # labels on axes
           tooltips=[ ("Name", "@name, @keeper")],
           title="Cultural Properties sites")

# add tile
p.add_tile(cartodb)

# draw points
p.circle(x='lon', y='lat',
         size=5,
         fill_color="mediumpurple", line_color="mediumpurple",
         fill_alpha=0.3,
         source=source)

# add for colab output
output_notebook()

show(p)
export_png(p, filename="InstitutesCP.png")

**Cultural Properties' cities map**
We already have extracted the cities from the dataset, but now we retrieve for each city also the dbpedia link. After this passage we can query the dbpedia endpoint in order to retrieve the coordinates.

In [None]:
cityLinksToSearch = cities["city"].drop_duplicates()
citiesLinks = DataFrame({})

for idx, city in cityLinksToSearch.items(): 
    query1 = '''
    SELECT ('''f'<{city}>'''') as ?city ?dbpedia WHERE {
        '''f'<{city}>'''' owl:sameAs ?dbpedia 
        FILTER (contains(str(?dbpedia), "dbpedia"))}'''
    CP_dbPediaLink = get(endpoint, query1)
    citiesLinks = concat([citiesLinks, CP_dbPediaLink])

citiesLinks

In [None]:
CPCityCoordinates = DataFrame({})
for idx, item in citiesLinks["dbpedia"].drop_duplicates().items() :
    query2 = '''
        SELECT ('''f'<{item}>'''') as ?dbpedia ?latitude ?longitude WHERE{
           '''f'<{item}>'''' geo:lat ?latitude;
        geo:long ?longitude.
    }'''
    coordinates = get(endpointDB, query2)
    CPCityCoordinates = concat([CPCityCoordinates, coordinates])
CPCityCoordinates

In [None]:
CPCityCoordinatesMerged = CPCityCoordinates.merge(citiesLinks, left_on="dbpedia", right_on="dbpedia")[["city", "latitude", "longitude"]].drop_duplicates()
CPCityCoordinatesMerged = CPCityCoordinatesMerged.merge(cities, left_on="city", right_on="city")[["stripped_CPLabel", "cityLabel", "latitude", "longitude"]].drop_duplicates()
CPCityCoordinatesMerged

In [None]:
# projection WGS 84 - used by GPS
inProj = Proj(init='epsg:4326')

# WGS84 Pseudo Web Mercator - projection used by most web services, e.g. Google Maps, OpenStreet Maps
outProj = Proj(init='epsg:3857')

CPCityCoordinatesMerged['longitude'],CPCityCoordinatesMerged['latitude'] = transform(inProj,outProj,CPCityCoordinatesMerged['longitude'].values,CPCityCoordinatesMerged['latitude'].values)
CPCityCoordinatesMerged.head()

In [None]:
# create a dictionary with lists
source = ColumnDataSource(data=dict(
    lat=CPCityCoordinatesMerged.latitude.values.tolist(),
    lon=CPCityCoordinatesMerged.longitude.values.tolist(),
    name=CPCityCoordinatesMerged.stripped_CPLabel.values.tolist(),
    keeper=CPCityCoordinatesMerged.cityLabel.values.tolist()
))

In [None]:
# import tile
cartodb = get_provider('CARTODBPOSITRON')

# draw the frame
p = figure(outer_width=900, outer_height=700,
           x_range=(2000000, 800000), y_range=(4300000, 6000000),
           x_axis_type="mercator", y_axis_type="mercator", # labels on axes
           tooltips=[ ("Name", "@name, @keeper")],
           title="Cultural Properties sites")

# add tile
p.add_tile(cartodb)

# draw points
p.circle(x='lon', y='lat',
         size=5,
         fill_color="mediumpurple", line_color="mediumpurple",
         fill_alpha=0.3,
         source=source)

# add for colab output
output_notebook()

show(p)
export_png(p, filename="CitiesCP.png")

<h3 style="text-align:center" id="addinf">2.2 Additional information: Regions</div>

<h3 style="text-align:center" id="AICE">2.2.1 Cultural Events' sites per region</h3>
We retrieve the information about the regions from dbpedia and, after cleaning the label, we count how many cultural sites of cultural events each region host.

In [None]:
dbpedia = dbPediaLinks["dbpedia"].drop_duplicates()
link = []


Regions = DataFrame({})

for idx, item in dbpedia.items():
    query = '''
        SELECT ('''f'<{item}>'''') as ?dbpedia ?region ?regionLabel WHERE{
        '''f'<{item}>'''' dbo:region ?region.
        ?region rdfs:label ?regionLabel.
        FILTER (langMatches(lang(?regionLabel), "it"))
    }'''
    coordinates = get(endpointDB, query)
    link.append(item)
    Regions = concat([Regions, coordinates])
    
Regions

In [None]:
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Marken", value='Marche')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Consiglio regionale della Lombardia", value='Lombardia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Consiglio regionale del Friuli-Venezia Giulia", value='Friuli-Venezia Giulia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Friuli", value='Friuli-Venezia Giulia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Emilia", value='Emilia-Romagna')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Emilia Romagna", value='Emilia-Romagna')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Sardegna (isola)", value='Sardegna')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Regno di Sardegna", value='Sardegna')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Isola di Sicilia", value='Sicilia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Sicilia (provincia romana)", value='Sicilia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Consiglio regionale della Calabria", value='Calabria')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Aosta", value="Valle d'Aosta")
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Provincia di Udine", value='Friuli-Venezia Giulia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Latium", value='Lazio')
Regions["regionLabel"].drop_duplicates()
Regions

In [None]:
AllData = dbPediaLinks.merge(Regions, left_on="dbpedia", right_on="dbpedia")[["urlcity", "regionLabel"]]
AllData = AllData.merge(df_city, left_on="urlcity", right_on="urlcity")[["callret-1", "city", "regionLabel"]]

In [None]:
import json
EventCount = {}
for idx, row in AllData.iterrows():
    if row["regionLabel"] not in EventCount:
        EventCount[row["regionLabel"]] = {}
        EventCount[row["regionLabel"]][row["city"]] = row["callret-1"]
    else: 
        if row["urlcitylabel"] not in EventCount:
            EventCount[row["regionLabel"]][row["city"]] = row["callret-1"]
        else: 
            EventCount[row["regionLabel"]][row["city"]] += row["callret-1"]

with open("RegionCityEventCount.json", "w", encoding="utf-8") as f:
    json.dump(EventCount, f, indent=4, ensure_ascii=False)

<h3 style="text-align:center" id="AICP">2.2.2 Cultural Properties' institutes per region</h3>
The same procedure we will follow for the cultural properties' institutes.

In [None]:
Regions = DataFrame({})
for idx, item in citiesLinks["dbpedia"].drop_duplicates().items() :
    query2 = '''
        SELECT ('''f'<{item}>'''') as ?dbpedia ?region ?regionLabel WHERE{
        '''f'<{item}>'''' dbo:region ?region.
        ?region rdfs:label ?regionLabel.
        FILTER (langMatches(lang(?regionLabel), "it"))
    }'''
    coordinates = get(endpointDB, query2)
    Regions = concat([Regions, coordinates])
Regions

In [None]:
query = '''
SELECT DISTINCT(?city) COUNT(DISTINCT(?institute) as ?count) WHERE {
?CP a ?o.
FILTER (?o = arco:CulturalProperty || ?o = arco:IntangibleCulturalProperty || ?o = arco:TangibleCulturalProperty || ?o = arco:ArchaeologicalProperty || ?o = arco:ImmovableCulturalProperty || ?o = arco:ArchitecturalOrLandscapeHeritage || ?o = arco:HistoricOrArtisticProperty || ?o = arco:MusicHeritage || ?o = arco:NaturalHeritage || ?o = arco:BotanicalHeritage || ?o = arco:MineralHeritage || ?o = arco:PalaeontologicalHeritage || ?o = arco:PertologicHeritage || ?o = arco:PlanetaryScienceHeritage || ?o = arco:ZoologicalHeritage || ?o = arco:NumismaticProperty || ?o = arco:PhotographicHeritage || ?o = arco:ScientificOrTechnologicalHeritage)
 ?CP a-loc:hasCulturalInstituteOrSite ?institute.
?institute cis:hasSite ?site.
?site cis:siteAddress ?address.
?address clvapit:hasCity ?city.
}
'''

InstituteCount = get(endpoint, query)
InstituteCount

In [None]:
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Marken", value='Marche')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Consiglio regionale della Lombardia", value='Lombardia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Consiglio regionale del Friuli-Venezia Giulia", value='Friuli-Venezia Giulia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Friuli", value='Friuli-Venezia Giulia')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Emilia", value='Emilia-Romagna')
Regions["regionLabel"] = Regions["regionLabel"].replace(to_replace="Emilia Romagna", value='Emilia-Romagna')
Regions["regionLabel"].drop_duplicates()
Regions

In [None]:
citiesLinks = citiesLinks.drop_duplicates()
AllData = InstituteCount.merge(citiesLinks, left_on="city", right_on="city")[["dbpedia", "callret-1"]]
Regions = Regions.drop_duplicates()
InstituteRegion = AllData.merge(Regions, left_on="dbpedia", right_on="dbpedia")[["callret-1", "regionLabel"]]

RegionNumber = {}
for idx, row in InstituteRegion.iterrows():
    if row["regionLabel"] not in RegionNumber:
        RegionNumber[row["regionLabel"]] = row["callret-1"]
    else: 
        RegionNumber[row["regionLabel"]] += row["callret-1"]

with open("RegionCityPropertyCount.json", "w", encoding="utf-8") as f:
    json.dump(RegionNumber, f, indent=4, ensure_ascii=False)