<h1> Querying RDF dataset </h1>

This notebook is intended to help a user querying the SURE-KG dataset.

You can query the dataset from our Virtuoso endpoint: http://erebe-vm2.i3s.unice.fr:5000/sparql .

<h3> Install required packages </h3>

<strong> SPARQLWrapper </strong>

This package helps to convert service output to a Pandas DataFrame. https://rdflib.dev/sparqlwrapper/

<strong> Pandas</strong>

Using Pandas DataFrame to contain the query results.

<strong> Geopandas and Shapely</strong>

Using Geopandas and Shapely for spatial data (with a geometry).

<strong> Folium </strong>

Using Folium to visualize spatial data



In [None]:
#!pip install SPARQLWrapper
#!pip install pandas
#!pip install geopanads
#!pip install shapely
#!pip install folium


In [41]:
import pandas as pd
from SPARQLWrapper import SPARQLWrapper, JSON

import geopandas as gpd
import folium
import shapely
from shapely.geometry import Point, Polygon, MultiPoint, MultiPolygon


In [4]:
def sparql_to_dataframe(endpoint, query):
    """
    Convert SPARQL results into a Pandas DataFrame.
    Credit: https://lawlesst.github.io/notebook/sparql-dataframe.html
    """
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.queryAndConvert()

    cols = results['head']['vars']
    out = []
    for row in results['results']['bindings']:
        item = []
        for c in cols:
            item.append(row.get(c, {}).get('value'))
        out.append(item)

    return pd.DataFrame(out, columns=cols)


# Run queries

In [6]:
endpoint = 'http://erebe-vm2.i3s.unice.fr:5000/sparql/'

## Scenario 1 :  Helping potential buyers to find a Real Estate property.

In [12]:
## Retrieve the price, floor size and coordinates of all the apartments with 3 rooms in Nice, France
query = '''
    PREFIX sure: <https://ns.inria.fr/sure#>
    PREFIX suredt: <https://ns.inria.fr/sure/data/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX geosparql: <http://www.opengis.net/ont/geosparql#>
    PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX schema: <http://schema.org/>
    PREFIX oa: <http://www.w3.org/ns/oa#>
    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    PREFIX dctypes: <http://purl.org/dc/dcmitype/>
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>
   
   SELECT ?realEstate ?floorSize ?price ?lat ?long WHERE {
       ?realEstate rdf:type sure:RealEstate, schema:Apartment;
       sure:locatedIn ?city;
       schema:numberOfRooms "3"^^xsd:integer;
       schema:floorSize ?floorSize;
       sure:hasPrice ?price;
       geo:lat ?lat;
       geo:long ?long.
       
       ?city rdf:type sure:City;
       rdfs:label "nice".
    } LIMIT 10
'''
df = sparql_to_dataframe(endpoint, query)
df

Unnamed: 0,realEstate,floorSize,price,lat,long
0,https://ns.inria.fr/sure/data/RealEstate61c0b9...,54.0,310000.0,43.7009,7.29581
1,https://ns.inria.fr/sure/data/RealEstate61c0b9...,59.0,310500.0,43.7006,7.26462
2,https://ns.inria.fr/sure/data/RealEstate61c0b9...,66.0,252000.0,43.7277,7.26152
3,https://ns.inria.fr/sure/data/RealEstate61c0b9...,104.0,1990000.0,43.6912,7.2932
4,https://ns.inria.fr/sure/data/RealEstate61c0b9...,55.0,280000.0,43.7034,7.2662
5,https://ns.inria.fr/sure/data/RealEstate61c0b9...,63.0,365000.0,43.7145,7.25572
6,https://ns.inria.fr/sure/data/RealEstate61c1fe...,55.0,235000.0,43.7089,7.29258
7,https://ns.inria.fr/sure/data/RealEstate61c1fe...,69.0,535500.0,43.6968,7.25533
8,https://ns.inria.fr/sure/data/RealEstate61c1fe...,63.0,195000.0,43.7219,7.28563
9,https://ns.inria.fr/sure/data/RealEstate61c1fe...,78.0,309000.0,43.7096,7.26467


In [8]:
my_map = folium.Map(location=[43.58, 7.123055555],
                    zoom_start = 10)

for i in range(0,len(df)):
    long = df.long.iloc[i]
    lat = df.lat.iloc[i] 
    folium.Marker([lat, long],
                  icon=folium.Icon(color='orange'),popup=str(df.price.iloc[i])).add_to(my_map)
    
my_map

In [14]:
## Retrieve the first 10 real properties located in places having label "Cimiez" and with a price lower than 400000 euros 

query = '''
    PREFIX sure: <https://ns.inria.fr/sure#>
    PREFIX suredt: <https://ns.inria.fr/sure/data/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX geosparql: <http://www.opengis.net/ont/geosparql#>
    PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX schema: <http://schema.org/>
    PREFIX oa: <http://www.w3.org/ns/oa#>
    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    PREFIX dctypes: <http://purl.org/dc/dcmitype/>
    PREFIX dbo: <http://dbpedia.org/ontology/>

   SELECT ?realEstate ?price ?lat ?long ?place WHERE {
       ?realEstate rdf:type sure:RealEstate, schema:Apartment;
       sure:locatedIn ?place, ?city;
       sure:hasPrice ?price;
       geo:lat ?lat;
       geo:long ?long.
       
       ?city rdf:type sure:City;
       rdfs:label "nice".
       
       ?place rdfs:label "cimiez".
       
       FILTER(?price <= 400000)
       
    } LIMIT 10
'''
df = sparql_to_dataframe(endpoint, query)
df

Unnamed: 0,realEstate,price,lat,long,place
0,https://ns.inria.fr/sure/data/RealEstate61c1fe...,220000.0,43.7184,7.27603,https://ns.inria.fr/sure/data/Cimiez_06088
1,https://ns.inria.fr/sure/data/RealEstate61c1ff...,242000.0,43.7119,7.27123,https://ns.inria.fr/sure/data/Cimiez_06088
2,https://ns.inria.fr/sure/data/RealEstate61c1ff...,180000.0,43.7229,7.28396,https://ns.inria.fr/sure/data/Cimiez_06088
3,https://ns.inria.fr/sure/data/RealEstate61c1ff...,395960.0,43.7177,7.27917,https://ns.inria.fr/sure/data/Cimiez_06088
4,https://ns.inria.fr/sure/data/RealEstate61c200...,390000.0,43.718,7.27558,https://ns.inria.fr/sure/data/Arenes_Cimiez_06088
5,https://ns.inria.fr/sure/data/RealEstate61c201...,239000.0,43.7175,7.27194,https://ns.inria.fr/sure/data/Cimiez_06088
6,https://ns.inria.fr/sure/data/RealEstate61c203...,350000.0,43.7153,7.27056,https://ns.inria.fr/sure/data/Cimiez_06088
7,https://ns.inria.fr/sure/data/RealEstate61c203...,69800.0,43.7057,7.27088,https://ns.inria.fr/sure/data/Cimiez_06088
8,https://ns.inria.fr/sure/data/RealEstate61c203...,379000.0,43.7231,7.27674,https://ns.inria.fr/sure/data/Cimiez_06088
9,https://ns.inria.fr/sure/data/RealEstate61c204...,130900.0,43.7077,7.27599,https://ns.inria.fr/sure/data/Cimiez_06088


In [15]:
my_map = folium.Map(location=[43.58, 7.123055555],
                    zoom_start = 10)

for i in range(0,len(df)):
    long = df.long.iloc[i]
    lat = df.lat.iloc[i] 
    folium.Marker([lat, long],
                  icon=folium.Icon(color='orange'),popup=str(df.price.iloc[i])).add_to(my_map)
    
my_map

## Scenario 2: Helping real estate professional to analyse the real estate market

In [16]:
## Retrieve mean price by square meter (in euros) for each neighborhood ("quartier") in Nice, France

query = '''
    PREFIX sure: <https://ns.inria.fr/sure#>
    PREFIX suredt: <https://ns.inria.fr/sure/data/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX geosparql: <http://www.opengis.net/ont/geosparql#>
    PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX schema: <http://schema.org/>
    PREFIX oa: <http://www.w3.org/ns/oa#>
    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    PREFIX dctypes: <http://purl.org/dc/dcmitype/>
    PREFIX dbo: <http://dbpedia.org/ontology/>
   
   SELECT ?place ROUND(AVG(?priceSize)) AS ?priceSqm WHERE {
       ?realEstate rdf:type sure:RealEstate;
       sure:hasPrice ?price;
       schema:floorSize ?floorSize;
       sure:locatedIn ?city, ?place.
    
       ?city rdf:type sure:City;
       rdfs:label "nice".
       
       ?place rdf:type sure:AbsolutePlace, sure:Quartier;
       rdfs:label ?label.
       
       BIND(if(?floorSize >"0"^^xsd:integer, ?price/?floorSize, "0"^^xsd:integer) AS ?priceSize )
       FILTER(?priceSize > "0"^^xsd:integer)
    }
    GROUP BY ?place
    ORDER BY DESC(?priceSqm)
    
'''
sparql_to_dataframe(endpoint, query)

Unnamed: 0,place,priceSqm
0,https://ns.inria.fr/sure/data/Quartier_Carre_O...,8327
1,https://ns.inria.fr/sure/data/Quartier_Mont_Bo...,7983
2,https://ns.inria.fr/sure/data/Quartier_Antiqua...,7747
3,https://ns.inria.fr/sure/data/Quartier_Port_06088,7444
4,https://ns.inria.fr/sure/data/Quartier_Cimiez_...,7278
5,https://ns.inria.fr/sure/data/Quartier_Vieux_N...,6907
6,https://ns.inria.fr/sure/data/Quartier_Garibal...,6854
7,https://ns.inria.fr/sure/data/Quartier_Nice_06088,6571
8,https://ns.inria.fr/sure/data/Quartier_Gairaut...,6453
9,https://ns.inria.fr/sure/data/Quartier_Republi...,6320


## Scenario 3: Helping geographers to analyse the territory.

In [36]:
## Retrieve Neigboorhood ("quartier") in Nice with the attribute "recherché" (i.e., "researched") and Alpha-Cut geometry with alpha=0.8

query = '''
    PREFIX sure: <https://ns.inria.fr/sure#>
    PREFIX suredt: <https://ns.inria.fr/sure/data/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX geosparql: <http://www.opengis.net/ont/geosparql#>
    PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX schema: <http://schema.org/>
    PREFIX oa: <http://www.w3.org/ns/oa#>
    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    PREFIX dctypes: <http://purl.org/dc/dcmitype/>
    PREFIX dbo: <http://dbpedia.org/ontology/>
   
   SELECT ?place ?label ?attribute ?geometry WHERE {
       ?realEstate rdf:type sure:RealEstate;
       sure:locatedIn ?city, ?place.
    
       ?city rdf:type sure:City;
       rdfs:label "nice".
       
       ?place rdf:type sure:AbsolutePlace, sure:Quartier;
       geosparql:hasGeometry ?geom;
       rdfs:label ?label;
       sure:hasAttribute ?attribute.
       
       ?geom geosparql:asWKT ?geometry;
       sure:hasAlpha "0.8"^^xsd:double.
       
       FILTER(contains(?attribute,"recherche"))
       
    }
    GROUP BY ?place
    LIMIT 100
'''
df = sparql_to_dataframe(endpoint, query)
df

Unnamed: 0,place,label,attribute,geometry
0,https://ns.inria.fr/sure/data/Quartier_Nice_06088,nice,recherche,MULTIPOLYGON(((7.2514845414088 43.694545924328...
1,https://ns.inria.fr/sure/data/Quartier_Riquier...,riquier,recherche,MULTIPOLYGON(((7.2873643375538 43.701166082153...
2,https://ns.inria.fr/sure/data/Quartier_Sainte_...,sainte marguerite,recherche,MULTIPOLYGON(((7.2054448997375 43.679834547937...
3,https://ns.inria.fr/sure/data/Quartier_Mont_Bo...,mont boron,recherche,MULTIPOLYGON(((7.2916621245105 43.693973210952...
4,https://ns.inria.fr/sure/data/Quartier_Ventabr...,ventabrun,recherche,MULTIPOLYGON(((7.2243462643289 43.704184421067...
5,https://ns.inria.fr/sure/data/Quartier_Lantern...,lanterne,recherche,"MULTIPOLYGON(((7.215009677736 43.682611778068,..."
6,https://ns.inria.fr/sure/data/Quartier_Musicie...,musiciens,recherche,MULTIPOLYGON(((7.2582987260174 43.699939891296...
7,https://ns.inria.fr/sure/data/Quartier_Gairaut...,gairaut,recherche,MULTIPOLYGON(((7.2583794295924 43.725852859752...
8,https://ns.inria.fr/sure/data/Quartier_Fleurs_...,fleurs,recherche,"MULTIPOLYGON(((7.2525676553678 43.69638494005,..."
9,https://ns.inria.fr/sure/data/Quartier_Eveche_...,eveche,recherche,MULTIPOLYGON(((7.2546152913907 43.712889049113...


In [48]:
gdf = gpd.GeoDataFrame(df,crs=4326)
my_map = folium.Map(location=[43.58, 7.123055555],
                    zoom_start = 10)

for i in range(len(gdf)):
    geojson = folium.GeoJson(shapely.wkt.loads(gdf.geometry.iloc[i]),style_function=lambda x: {'fillColor': 'red',"color":"red","fillOpacity":0.5,"opacity":0.3})
    popup = folium.Popup(str(gdf.label.iloc[i]))
    popup.add_to(geojson)
    geojson.add_to(my_map)
    
my_map

  """Entry point for launching an IPython kernel.


## Scenario 4: Helping NLP researchers to test their model.

In [20]:
## Retrieve annotations and the text for a confidence score greater or equal to 0.8

query = '''
    PREFIX sure: <https://ns.inria.fr/sure#>
    PREFIX suredt: <https://ns.inria.fr/sure/data/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX geosparql: <http://www.opengis.net/ont/geosparql#>
    PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
    PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
    PREFIX schema: <http://schema.org/>
    PREFIX oa: <http://www.w3.org/ns/oa#>
    PREFIX dc: <http://purl.org/dc/elements/1.1/>
    PREFIX dctypes: <http://purl.org/dc/dcmitype/>
    PREFIX dbo: <http://dbpedia.org/ontology/>
   
   SELECT ?annot ?typeAnnotation ?confidence ?start ?end ?text WHERE {
    
       ?annot rdf:type oa:Annotation;
       sure:confidence ?confidence;
       oa:hasBody ?typeAnnotation;
       oa:hasTarget ?source;
       oa:hasTarget ?selector.
       
       ?source oa:hasSource ?textURI.
       
       ?selector oa:start ?start;
       oa:end ?end.
        
       ?textURI dc:description ?text.
       
       FILTER(?confidence >= 0.8)
       
    }
    GROUP BY ?place
    ORDER BY ASC(?place)
'''
sparql_to_dataframe(endpoint, query)

Unnamed: 0,annot,typeAnnotation,confidence,start,end,text
0,https://ns.inria.fr/sure/data/Entity475470,https://ns.inria.fr/sure#Toponym,0.99,59.0,61.0,"Rare , en proximité immédiate de Monaco , vill..."
1,https://ns.inria.fr/sure/data/Entity502583,https://ns.inria.fr/sure#SpatialRelation,0.99,95.0,95.0,"Proche Arnaud TZANCK , particulier vend appart..."
2,https://ns.inria.fr/sure/data/Entity493171,https://ns.inria.fr/sure#Toponym,0.92,186.0,186.0,"Appartement d exception à St Jean Cap Ferrat ,..."
3,https://ns.inria.fr/sure/data/Entity501174,https://ns.inria.fr/sure#SpatialRelation,1.0,7.0,7.0,"Appartement idéalement placé , quartier tzanck..."
4,https://ns.inria.fr/sure/data/Entity485933,https://ns.inria.fr/sure#SpatialRelation,0.95,90.0,90.0,"à le coeur de La Roquette sur Siagne , et à pr..."
...,...,...,...,...,...,...
9995,https://ns.inria.fr/sure/data/Entity77806,https://ns.inria.fr/sure#Transport,0.99,15.0,15.0,"Dans résidence neuve , à proximité directe des..."
9996,https://ns.inria.fr/sure/data/Entity77910,https://ns.inria.fr/sure#Toponym,0.99,5.0,8.0,Appartement 3 pièces 84 m² SAINT LAURENT DU VA...
9997,https://ns.inria.fr/sure/data/Entity77940,https://ns.inria.fr/sure#Feature,0.99,105.0,105.0,"Dans une résidence sécurisée avec gardien , 2 ..."
9998,https://ns.inria.fr/sure/data/Entity77966,https://ns.inria.fr/sure#Toponym,0.99,0.0,1.0,"Val Fleuri Dans une résidence sécurisée , très..."
