Putting the 'linked' into Linked Open Data
==========

While web endpoints to RDF data stores (e.g. [DBpedia](http://dbpedia.org/sparql), [data.admin.ch](http://data.admin.ch/sparql/), or [UK transport data](http://openuplabs.tso.co.uk/sparql/gov-transport)) are very useful for seeing what sort of data is available from a given service, you generally cannot use one service's endpoint to query another service's data. This is a job for a client-side program!

We can do this in Python with the `SPARQLWrapper` library. Install it with

    pip install SPARQLWrapper
    
either at the command line or in a cell prefixed with the `!` character. We load it up like so:

In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON

Figuring out how to query RDF data always takes some legwork - you have to understand what is in the dataset before you can usefully query it. We can begin with a query we did in class, to select municipalities from the Swiss government data. We will have to tell Python the URL of the endpoint that can give us the answer. 

We will ask for everything that is a `Municipality` according to the vocabulary of `data.admin.ch`, and find the corresponding link to its dbpedia entry. The result looks something like this:

In [2]:
sparql = SPARQLWrapper("http://data.admin.ch/query/")

sparql.setQuery("""
PREFIX ch: <http://data.admin.ch/vocab/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?name, ?dbpentry WHERE {
  ?sub a ch:Municipality .
  ?sub ch:municipalityShortName ?name .
  ?sub owl:sameAs ?dbpentry .
  FILTER regex(?dbpentry, 'dbpedia.org')
} LIMIT 10
""")

sparql.setReturnFormat(JSON)
answer = sparql.query().convert()

answer

{'head': {'link': [], 'vars': ['name', 'dbpentry']},
 'results': {'bindings': [{'dbpentry': {'type': 'uri',
     'value': 'http://dbpedia.org/resource/Riehen'},
    'name': {'type': 'literal', 'value': 'Riehen'}},
   {'dbpentry': {'type': 'uri',
     'value': 'http://dbpedia.org/resource/Stallikon'},
    'name': {'type': 'literal', 'value': 'Stallikon'}},
   {'dbpentry': {'type': 'uri',
     'value': 'http://dbpedia.org/resource/Rickenbach,_Thurgau'},
    'name': {'type': 'literal', 'value': 'Rickenbach (TG)'}},
   {'dbpentry': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Jaberg'},
    'name': {'type': 'literal', 'value': 'Jaberg'}},
   {'dbpentry': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Pigniu'},
    'name': {'type': 'literal', 'value': 'Pigniu'}},
   {'dbpentry': {'type': 'uri',
     'value': 'http://dbpedia.org/resource/Richterswil'},
    'name': {'type': 'literal', 'value': 'Richterswil'}},
   {'dbpentry': {'type': 'uri',
     'value': 'http://dbpedia.org/re

We can see, in the returned JSON, that our results are in the `bindings` key of the `results` key of the JSON answer, and each element in the bindings list corresponds, essentially, to a row of the results. Our variables `?name` and `?dbpentry` are the keys within that element, and their values each have a `type` and a `value`. So we can map municipality name to DBpedia URL like so.

In [3]:
dbp_uris = {}
for row in answer['results']['bindings']:
    mname = row['name']['value']
    uri = row['dbpentry']['value']
    dbp_uris[mname] = uri
    
dbp_uris

{'Cully': 'http://dbpedia.org/resource/Cully,_Switzerland',
 'Grosshöchstetten': 'http://dbpedia.org/resource/Grossh%C3%B6chstetten',
 'Jaberg': 'http://dbpedia.org/resource/Jaberg',
 'Oberrohrdorf': 'http://dbpedia.org/resource/Oberrohrdorf',
 'Pigniu': 'http://dbpedia.org/resource/Pigniu',
 'Richterswil': 'http://dbpedia.org/resource/Samstagern',
 'Rickenbach (TG)': 'http://dbpedia.org/resource/Rickenbach,_Thurgau',
 'Riehen': 'http://dbpedia.org/resource/Riehen',
 'Stallikon': 'http://dbpedia.org/resource/Stallikon'}

Now to get the information we're after - the area of each municipality - we need to go query DBpedia! Some Internet legwork (i.e. Googling "dbpedia sparql endpoint") leads us to the new endpoint that we need to use. We also need to go hunt around on the DBpedia endpoint page to find out what URI corresponds to the namespace prefix `dbp`. I've done both of these things for you, so we can write our new query accordingly. Let's try it at first just with one of the cities.

In [4]:
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?area WHERE {
  <%s> dbp:area ?area .
} 
""" % dbp_uris['Richterswil'] )
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
results

{'head': {'link': [], 'vars': ['area']},
 'results': {'bindings': [], 'distinct': False, 'ordered': True}}

So we see what we need to do! For each of these cities, we need to feed its DBpedia URI into the query that we've written in order to get an answer. This requires a judicious use of a `for` loop. 

In [5]:
# Here is our query, with the %s where we will need to fill in the URI.
query_template = """
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?area WHERE {
  <%s> dbp:area ?area .
} 
"""

# Loop through each city, applying the query.
for city in dbp_uris:
    sparql.setQuery(query_template % dbp_uris[city])
    sparql.setReturnFormat(JSON)
    answer = sparql.query().convert()
    # See if we got an answer; if not, the bindings will have a length of zero.
    if answer['results']['bindings']:
        print("Result for %s is %s" % (city, answer['results']['bindings'][0]['area']['value']))
    else:
        print("Got no result for %s" % city)
        

Result for Stallikon is 12.01
Got no result for Grosshöchstetten
Got no result for Richterswil
Result for Jaberg is 1.3
Result for Cully is 2.38
Result for Oberrohrdorf is 4.3
Result for Rickenbach (TG) is 1.56
Result for Pigniu is 17.98
Result for Riehen is 10.86
