<a href="https://colab.research.google.com/github/InTaVia/backend-presentation-ljubljana-2023/blob/main/intavia_hands_on_9_23.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We begin with installing some libraries to interact with SPARQl and Rest endpoints

In [12]:
!pip install httpx SPARQLWrapper rdflib matplotlib
import httpx
import SPARQLWrapper
import matplotlib



Lets start with  a simple query against the InTaVia RestAPI

In [5]:
res = httpx.get('https://intavia-backend.acdh-dev.oeaw.ac.at/v2/api/entities/search', params={'q': 'Klimt'})

When the query has been executed whatever the endoint returns is stored in the variable `res`. We can e.g. see the http status code the endpoint returned.

In [6]:
res.status_code

200

`200` means everything went well and the endpoint returned results. As the default return format is `Json` we need to convert the `Json` to python internal data before we can look at it.

In [7]:
res.json()

{'count': 7,
 'page': 1,
 'pages': 1,
 'results': [{'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX3BlcnNvbi8yODU5MA==',
   'label': {'default': 'no label provided'},
   'kind': 'person',
   'linkedIds': [{'label': 'Österreichische Biographische Lexikon, APIS',
     'url': 'https://apis.acdh.oeaw.ac.at/entity/70679'},
    {'label': 'Gemeinsame Normdatei (GND)',
     'url': 'https://d-nb.info/gnd/136070213'}],
   'gender': {'id': 'http://ldf.fi/schema/bioc/Male',
    'label': {'default': 'male'}},
   'occupations': [{'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvb2NjdXBhdGlvbi8xMzk=',
     'label': {'default': 'Bildende und angewandte Kunst'}},
    {'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvb2NjdXBhdGlvbi80MzIz',
     'label': {'default': 'Bildende und angewandte Kunst >> Kunstgewerbler und Medailleur'}}],
   'alternativeLabels': [{'default': 'no label provided'},
    {'default': 'Klimt, Georg'}],
   'biographies': ['aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvdGV4dC83MDY3OS9iaW8='],
   'relatio

There is some metadata, such as the number of hits, the number of pages and a list of results. These results contain a lot of complicated hashes, lets look at one of them:

In [9]:
import base64
coded_string = 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX3BlcnNvbi8zNzU3'
base64.b64decode(coded_string)

b'http://www.intavia.eu/provided_person/3757'

In [None]:
Now lets put what we just did in a more structured/reuseable form.
Lets define a list of queries we are interested in. In our case we use the person we already saw in the presentation: "Giuseppe Acerbi"

In [10]:
queries = ["Acerbi"]

Next we define a function that takes a query parameter and runs that query against the InTaVia Rest endpoint.

In [11]:
def query_intavia_rest(query_param: str, **kwargs) -> list:
  params = {'q': query_param}
  for key, value in kwargs.items():
    params[key] = value
  print(params)
  res = httpx.get('https://intavia-backend.acdh-dev.oeaw.ac.at/v2/api/entities/search', params=params)
  if res.status_code == 200:
    return res.json()['results']

In [13]:
for query in queries:
  res = []
  res.extend(query_intavia_rest(query))

{'q': 'Acerbi'}


In [14]:
res

[{'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX3BlcnNvbi82Njg3',
  'label': {'default': 'Acerbi, Enrico'},
  'kind': 'person',
  'linkedIds': [{'label': 'Österreichische Biographische Lexikon, APIS',
    'url': 'https://apis.acdh.oeaw.ac.at/entity/90793'},
   {'label': 'Gemeinsame Normdatei (GND)',
    'url': 'https://d-nb.info/gnd/116241470'}],
  'gender': {'id': 'http://ldf.fi/schema/bioc/Male',
   'label': {'default': 'male'}},
  'occupations': [{'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvb2NjdXBhdGlvbi8xNTQ=',
    'label': {'default': 'Medizin'}},
   {'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvb2NjdXBhdGlvbi8yMDI=',
    'label': {'default': 'Medizin >> Mediziner'}}],
  'alternativeLabels': [{'default': 'Acerbi, Enrico'},
   {'default': 'no label provided'}],
  'biographies': ['aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvdGV4dC85MDc5My9iaW8='],
  'relations': [{'event': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvZGVhdGhldmVudC85MDc5Mw==',
    'role': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2lkbS1yb2xlL2

The return contains a lot of not resolved entities that we need to run against other endpoints

In [15]:
def resolve_events(entity: dict) -> dict:
  event_ids = list(set([rel['event'] for rel in entity['relations']]))
  role_ids = list(set([rel['role'] for rel in entity['relations']]))
  print(list(event_ids))
  res_events = httpx.post('https://intavia-backend.acdh-dev.oeaw.ac.at/v2/api/events/retrieve', json={"id": event_ids})
  res_roles = httpx.post('https://intavia-backend.acdh-dev.oeaw.ac.at/v2/api/vocabularies/roles/retrieve', json={"id": role_ids})
  if res_events.status_code == 200 and res_roles.status_code == 200:
    events_data = res_events.json()['results']
    roles_data = res_roles.json()['results']
    for idx, relation in enumerate(entity['relations']):
      for event in events_data:
        if event['id'] == relation['event']:
          event.pop('relations')
          entity['relations'][idx]['event'] = event
      for role in roles_data:
        if role['id'] == relation['role']:
          entity['relations'][idx]['role'] = role
  return entity



In [16]:
resolved_entities = resolve_events(res[1])

['aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMg==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtNA==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtOQ==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMTI=', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMTE=', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMTA=', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtOA==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMTU=', 'aHR0cHM6Ly93d3cuaW50YXZpYS5ldS9wcm9kdWN0aW9uX2V2ZW50L1E1MTQxNzEwOQ==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMw==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMTQ=', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMTY=', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtMQ==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtNw==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtNQ==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2RlYXRoZXZlbnQvMzA3OA==', 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL2V2ZW50LzMwNzgtNg==', 'aHR0cDovL3d3dy5pbnRhdmlhLm

In [17]:
resolved_entities

{'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX3BlcnNvbi85NjQ4',
 'label': {'default': 'Acerbi, Giuseppe'},
 'kind': 'person',
 'linkedIds': [{'label': 'BiographySampo', 'url': 'http://ldf.fi/nbf/p/3078'},
  {'label': 'Wikidata', 'url': 'http://www.wikidata.org/entity/Q55007624'},
  {'label': 'Österreichische Biographische Lexikon, APIS',
   'url': 'https://apis.acdh.oeaw.ac.at/entity/90796'},
  {'label': 'Gemeinsame Normdatei (GND)',
   'url': 'https://d-nb.info/gnd/119372843'}],
 'gender': {'id': 'http://ldf.fi/schema/bioc/Male',
  'label': {'default': 'male'}},
 'occupations': [{'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL29jY3VwYXRpb24vMzY4MzQ=',
   'label': {'default': 'composer'}},
  {'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvb2NjdXBhdGlvbi8xMzU=',
   'label': {'default': 'Naturwissenschaft'}},
  {'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2JzL29jY3VwYXRpb24vMzU3OTAzNQ==',
   'label': {'default': 'travel writer'}},
  {'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvb2NjdXBhdGlvbi8xMjg=',
   '

# Compare data with Wikidata using SPARQL

lets create a SPARQL query to compare the date of birth in wikidata with those from InTaVia.
We start with getting the birth dates from our InTaVia data

In [18]:
birth_dates_acerbi = []
for event in resolved_entities['relations']:
  if event['role']['label']['default'] == 'Born Person':
    print(event['event']['startDate'])
    birth_dates_acerbi.append(event['event']['startDate'])

1773-01-01
1773-05-03


The birth dates are now stored in `birth_dates_acerbi`

In [19]:
birth_dates_acerbi

['1773-01-01', '1773-05-03']

Next we need the wikidata id.

In [22]:
for linked_id in resolved_entities['linkedIds']:
  if linked_id['label'] == 'Wikidata':
    print(linked_id)

{'label': 'Wikidata', 'url': 'http://www.wikidata.org/entity/Q55007624'}


Lets write the SPARQL query to get the same data from Wikidata. For writing Wikidata queries the Wikidata query service is a handy application. Lets go [there](https://query.wikidata.org/).

In [20]:
sparql = """
select * where {
BIND(<http://www.wikidata.org/entity/Q55007624> AS ?acerbi)
?acerbi wdt:P31 wd:Q5 ;
        wdt:P569 ?date_of_birth
}
"""

In [21]:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql_wikidata = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql_wikidata.setQuery(sparql)
sparql_wikidata.setReturnFormat(JSON)
results = sparql_wikidata.query().convert()

In [None]:
results

{'head': {'vars': ['acerbi', 'date_of_birth']},
 'results': {'bindings': [{'acerbi': {'type': 'uri',
     'value': 'http://www.wikidata.org/entity/Q55007624'},
    'date_of_birth': {'datatype': 'http://www.w3.org/2001/XMLSchema#dateTime',
     'type': 'literal',
     'value': '1773-05-03T00:00:00Z'}}]}}

# Create a simple visualization in Python using the API

In [None]:
institutions = query_intavia_rest('Künstlerhaus', kind=['group'])

{'q': 'Künstlerhaus', 'kind': ['group']}


In [None]:
kuenstlerhaus = institutions[9]

In [None]:
kuenstlerhaus

{'id': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L3Byb3ZpZGVkX2dyb3VwLzI0NDc=',
 'label': {'default': 'Genossenschaft der Bildenden Künstler Wiens (Künstlerhaus)'},
 'kind': 'group',
 'linkedIds': [{'label': 'Österreichische Biographische Lexikon, APIS',
   'url': 'https://apis.acdh.oeaw.ac.at/entity/98141'},
  {'label': 'Gemeinsame Normdatei (GND)',
   'url': 'https://d-nb.info/gnd/3009578-5'}],
 'alternativeLabels': [{'default': 'Genossenschaft der Bildenden Künstler Wiens (Künstlerhaus)'}],
 'relations': [{'event': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvY2FyZWVyLzExODgyNA==',
   'role': 'aHR0cDovL2xkZi5maS9zY2hlbWEvYmlvYy9Hcm91cF9SZWxhdGlvbnNoaXBfUm9sZQ=='},
  {'event': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvY2FyZWVyLzEwOTY4MA==',
   'role': 'aHR0cDovL2xkZi5maS9zY2hlbWEvYmlvYy9Hcm91cF9SZWxhdGlvbnNoaXBfUm9sZQ=='},
  {'event': 'aHR0cDovL3d3dy5pbnRhdmlhLmV1L2FwaXMvY2FyZWVyLzExNjI4MQ==',
   'role': 'aHR0cDovL2xkZi5maS9zY2hlbWEvYmlvYy9Hcm91cF9SZWxhdGlvbnNoaXBfUm9sZQ=='},
  {'event': 'aHR0cDovL3d3dy5pbnRh