Exercise 10
# Question Answering on Knowledge Graphs

### Task 3: Information Extraction-based Approaches


Task 3.2: Write a simple QA system that uses the following strategy:
1. Load the knowledge graph and the ontology from the provided RDF files "exercise10.ttl" and "exercise10_ontology.ttl".
2. Use the `rdfs:label` relations in the ontology to detect properties mentioned in the question.
3. Use the spaCy library to detect named entities in the question.
4. Create sub graphs of the knowledge graphs with an increasing number of nodes.
5. Take the sub graphs with the least amount of nodes that contain the named entities and properties detected in the question.
6. From these sub graphs, print those nodes which are not the named entities detected in the question.

Use the following code for the first step.

In [3]:
# Required Libraries
import itertools
import networkx as nx
import spacy
from rdflib import Graph, RDF, Namespace, RDFS, URIRef
from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph

# Only run the following line once:
spacy.cli.download("en_core_web_sm")
nlp = spacy.load("en_core_web_sm")

DBO = Namespace("https://dbpedia.org/ontology/")
DBR = Namespace("https://dbpedia.org/resource/")

# Load the graph from exercise10.ttl into an RDFLib graph
g = Graph()
g.parse("data/exercise10.ttl")

# Load the ontology from exercise9_ontology.ttl into an RDFLib graph
ontology = Graph()
ontology.parse("data/exercise10_ontology.ttl")

Collecting en-core-web-sm==3.8.0
  Using cached https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


<Graph identifier=N191941ede2b34c75827c204648c3aaa0 (<class 'rdflib.graph.Graph'>)>

Create the `get_answer` function by implementing the remaining five steps described above.

In [12]:
for property,_,_ in ontology.triples((None, RDF.type, RDF.Property)):
    for _,_,label in ontology.triples((property, RDFS.label, None)):
        print(property, label)

https://dbpedia.org/ontology/populationTotal population
https://dbpedia.org/ontology/capital capital
https://dbpedia.org/ontology/birthPlace birth place
https://dbpedia.org/ontology/birthPlace born in
https://dbpedia.org/ontology/deathPlace death place
https://dbpedia.org/ontology/deathPlace died in
https://dbpedia.org/ontology/partner partner


In [22]:
def get_answer(question):
    print("\nQuestion:", question)

    # 2. Find properties mentioned in the question via their labels in the ontology
    question_properties = set()
    for property, _, _ in ontology.triples((None, RDF.type, RDF.Property)):
        for _, _, label in ontology.triples((property, RDFS.label, None)):
            # print("***********************")
            # print("finding labels in question")
            # print(label, "||||||||||||", property)
            if label in question:
                question_properties.add(property)
    print(" Question properties:", [str(x) for x in question_properties])

    # 3. Detect entities mentioned in the question using spacy
    question_entities = set()
    doc = nlp(question)
    print("############### documnet entities ################")
    print(doc.ents)
    for ent in doc.ents:
        question_entities.add(URIRef(DBR[ent.text.replace(" ", "_")]))
    print(" Question entities:", [str(x) for x in question_entities])

    # For simplicity: Remove rdfs:label relations from the knowledge graph
    g.remove((None, RDFS.label, None))
    # Convert the graph to a networkx graph o you can use networkx and itertools graph operations
    G = rdflib_to_networkx_multidigraph(g).to_undirected()

    # 4. Check connected sub graphs of increasing size. If results were found, don't search for bigger sub graphs.
    results_found = False
    sub_graph_size = 2
    while not results_found:
        for sub_graph in (G.subgraph(selected_nodes) for selected_nodes in
                          itertools.combinations(G, sub_graph_size)):
            if nx.is_connected(sub_graph):
                # 5A. Check if all question entities are contained in the sub graph
                if all(elem in sub_graph.nodes for elem in question_entities):
                    # 5B. Check if all question properties are contained in the sub graph
                    if all(elem in [x[2] for x in sub_graph.edges] for elem in question_properties):
                        print(" Sub graph: ",
                              ([str(x[0]) + " " + str(x[2]) + " " + str(x[1]) for x in sub_graph.edges]))
                        # 6. Print those nodes which are not the named entities detected in the question
                        print("-> Answer:", [str(x) for x in set(sub_graph.nodes()).difference(question_entities)])
                        results_found = True
        sub_graph_size += 1

Test your system with the following questions.

In [23]:
get_answer('Who died in Berlin?')
get_answer('Who died in Paris?')
get_answer('Who was born in the capital of Germany?')
get_answer('Who was a partner of Marlene Dietrich?')
get_answer('Who was a partner of someone who died in the capital of France?')


Question: Who died in Berlin?
 Question properties: ['https://dbpedia.org/ontology/deathPlace']
############### documnet entities ################
(Berlin,)
 Question entities: ['https://dbpedia.org/resource/Berlin']
 Sub graph:  ['https://dbpedia.org/resource/Berlin https://dbpedia.org/ontology/deathPlace https://dbpedia.org/resource/Rosa_Luxemburg']
-> Answer: ['https://dbpedia.org/resource/Rosa_Luxemburg']

Question: Who died in Paris?
 Question properties: ['https://dbpedia.org/ontology/deathPlace']
############### documnet entities ################
(Paris,)
 Question entities: ['https://dbpedia.org/resource/Paris']
 Sub graph:  ['https://dbpedia.org/resource/Paris https://dbpedia.org/ontology/deathPlace https://dbpedia.org/resource/Marlene_Dietrich']
-> Answer: ['https://dbpedia.org/resource/Marlene_Dietrich']
 Sub graph:  ['https://dbpedia.org/resource/Paris https://dbpedia.org/ontology/deathPlace https://dbpedia.org/resource/Oscar_Wilde']
-> Answer: ['https://dbpedia.org/resour