<div><h1 style="display:inline !important;">Question Answering</h1><br/><h3 style="color: #6e6e6e; display:inline  !important;">Relation Identification and Linking</h3></div>

## Dependencies

<ul>
<li>Currently uses the Stanford dependency parser provided by nltk.</li>
<li>loadW2V is a program that loads word2Vec model stored in disk. (The models are provide by gensim)</li>
<li>OxfordDictionary calls the Oxford Dictionary API to retrieve synonyms for certain words.</li>
</ul>

In [14]:
from nltk.parse.stanford import StanfordDependencyParser
from SPARQLWrapper import SPARQLWrapper, JSON
from urlparse import urlparse
import loadW2V
import numpy as np
import OxfordDictionary as od
import requests
import urllib
import ast
import json
import sys

If you want to see details of the extraction of answers, set the following to 'True':

In [15]:
log_details = False

This is to store JSON of the details of extraction of answers.

In [16]:
bigJson = []

# Functions

### get_label(URI)

The function <b>"get_Label"</b> returns the label of the provided URI.
<br/><br/>
For example: <br/>
If the provided URI is http://dbpedia.org/ontology/birthDate, it returns the string <b>birth date</b>.

In [18]:
def get_Label(URI):
    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    query = """
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            SELECT ?label
            WHERE { <""" + URI + """>
            rdfs:label ?label 
            FILTER (lang(?label) = 'en') }            
        """
    
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    
    if(len(results['results']['bindings']) > 0):
        return results['results']['bindings'][0]['label']['value']
    else:
        return False

In [19]:
get_Label('http://dbpedia.org/ontology/birthDate')

u'birth date'

### get_properties(URI)

The function <b>"get_properties"</b> returns the properties of the provided URI.
<br/><br/>
For example: <br/>
If the provided URI is http://dbpedia.org/resource/John_F._Kennedy, it returns the list of all properties in this resource.

In [20]:
def get_properties(URI):
    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    query = """
        select distinct ?prop where {
             <"""+ URI +""">
             ?prop ?ent }
    """
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    allowed_types = ["property", "ontology"]
    properties = []
    
    for result in results["results"]["bindings"]:
        arr = result["prop"]["value"].split('/')
        # properties.append(arr[len(arr) - 1])
        if arr[len(arr) - 1] != 'abstract' and (arr[len(arr) - 2] in allowed_types):
            label = get_Label(result["prop"]["value"])
            properties.append([arr[len(arr) - 1], arr[len(arr) - 2], result["prop"]["value"], label])
    
    if log_details:
        print "\nPROPERTIES:"
        print properties
    return properties

### get_closest_word(word, URI, json_unit, with_synonyms=False)

The function <b>"get_closest_word"</b> returns the property of the URI which is closest to the provided word.
<br/><br/>
For example: <br/>
If the provided word is <b>wife</b> for http://dbpedia.org/resource/John_F._Kennedy, it returns the property <b>spouse</b>.

In [76]:
# Change if you host the server in locally
#w2v_server_url = 'http://localhost:5000/'
w2v_server_url = 'http://131.220.155.102:5000/'

In [73]:
def get_closest_keyword(word, URI, json_unit, with_synonyms=False):
    if with_synonyms == True:
        if len(" ".split(word)) > 1:
            with_synonyms = False
            
    properties = get_properties(URI)

    stack_details = {}
    stack_details["word"] = word
    stack_details["properties"] = properties
    stack_details["withSynonyms"] = with_synonyms
    
    properties = np.array(properties)
    
    if log_details:
        print "To be matched with: ", word
    
    # If word being looked for is in the list of properties
    for p in properties:
        if p[0] == word:
            json_unit["stackDetails"].append(stack_details)
            return p

    if with_synonyms:
        synonyms = od.getSynonyms(word)
        stack_details["synonyms"] = synonyms

        log = []
        if synonyms == False:
            synonyms = []

        synonyms.append((word, word))
        if log_details:
            print "\nSYNONYMS:"
            print synonyms

        for synonym in synonyms:
            for p in properties:
                if p[0] == synonym[0]:
                    json_unit["stackDetails"].append(stack_details)
                    return p


        text_properties = ''
        for p in properties:
            text_properties = text_properties + '+' + p[0]

        text_synonyms = ''
        for s in synonyms:
            text_synonyms = text_synonyms + '+' + s[0]

        url_w2v_server = w2v_server_url + urllib.quote(text_synonyms) + '/' + urllib.quote(text_properties)
        if log_details:
            print "\nURL OF WORD2VEC SERVER:"
            print url_w2v_server

        response = requests.get(url_w2v_server)
        prop_syn_avg = ast.literal_eval(response.text)

        stack_details["propertySynonymAvg"] = prop_syn_avg

        prop_syn_avg = np.array(prop_syn_avg)

        if log_details:
            print "\nRESPONSE FROM W2V SERVER:"
            print prop_syn_avg


        keyword = prop_syn_avg[np.argmax(prop_syn_avg[:,1])][0]

        stack_details["keyword"] = keyword
        json_unit["stackDetails"].append(stack_details)
        
        print "\nKEYWORD"
        print keyword
        if(keyword == None):
            return keyword
        
        for p in properties:
            if p[0] == keyword:
                if log_details:
                    print "\nKEYWORD DETAILS:"
                    print p
                return p
    else:
        phrase = word
        labels = []
        
        text_labels = ''
        for p in properties:
            text_labels = text_labels + '+' + p[3].encode('utf8')
        
        url_w2v_server = w2v_server_url + 'phraselabel/' + urllib.quote(phrase) + '/'+urllib.quote(text_labels)
        
        if log_details:
            print "\nURL OF WORD2VEC SERVER:"
            print url_w2v_server

        response = requests.get(url_w2v_server)
        phrase_label_avg = ast.literal_eval(response.text)

        stack_details["phraseLabelAvg"] = phrase_label_avg
        
        phrase_label_avg = np.array(phrase_label_avg)

        if log_details:
            print "\nRESPONSE FROM W2V SERVER:"
            print phrase_label_avg
        
        keyword = phrase_label_avg[np.argmax(phrase_label_avg[:,2])][1]
        
        stack_details["keyword"] = keyword
        json_unit["stackDetails"].append(stack_details)
        print "\nKEYWORD"
        print keyword
        if(keyword == None):
            return keyword
        
        for p in properties:
            if p[3] == keyword:
                if log_details:
                    print "\nKEYWORD DETAILS:"
                    print p
                return p

### fetch_compound(word, model)

The function <b>"fetch_compound"</b> returns compound words for words from the given model after dependency parsing.
<br/><br/>
For example: <br/>
If the provided word is <b>date</b>, whereas we are actually looking for <b>birth date</b> in the question, it will return so by looking up the model

In [22]:
def fetch_compound(word, model):
    compound_word = []
    for m in model:
        if m[1] == u'compound' and m[0][0] == word:
            compound_word.append(m[2][0])

    compound_word.append(word)
    return compound_word

### call_sparql(keyword, URI, pType)

The function <b>"call_sparql"</b> returns values of properties provided as keywords in the given URI.
<br/><br/>
For example: <br/>
If we are looking for <b>spouse</b> of http://dbpedia.org/resource/John_F._Kennedy it will return http://dbpedia.org/resource/Jacqueline_Kennedy_Onassis

In [23]:
def call_sparql(keyword, URI, pType):
    # print "\nCALL SPARQL:"
    pTypes = np.array([[u'ontology', 'dbo'],[u'property', 'dbp']])
    
    for p in pTypes:
        if p[0] == pType:
            pType = p[1]

    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    query = """
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        SELECT ?"""+ keyword +"""
        WHERE { <""" + URI + """>
        """+pType+""":"""+keyword+""" ?"""+keyword+""" }
    """
    if log_details:
        print query
        
    sparql.setQuery(query)

    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    
    return [r[keyword]["value"] for r in results["results"]["bindings"]]

### getResources(text)

The function <b>"getResources"</b> returns URI for a marked entity in a sentence.
<br/><br/>
For example: <br/>
If the provided text is <b>Who are the family members of [Peter Griffin]?</b>, it returns http://dbpedia.org/resource/Peter_Griffin

In [62]:
def getResources(text):
    cookies = {
        'JSESSIONID': 'BC9CD43D9E1AE3E7CF51E00D3A3A7702',
    }

    headers = {
        'Origin': 'http://agdistis.aksw.org',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'en-GB,en-US;q=0.8,en;q=0.6',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.68 Safari/537.36',
        'Content-Type': 'application/json;charset=UTF-8',
        'Accept': 'application/json, text/plain, */*',
        'Referer': 'http://agdistis.aksw.org/demo/',
        'Connection': 'keep-alive',
    }
    
    data = '{"text": "'+text+'"}'
    response = requests.post('http://agdistis.aksw.org/demo/agdistis', headers=headers, cookies=cookies, data=data)
    
    return response.json()

### answer(question)

The function <b>"answer"</b> returns an answer for a question using the functions above.
<br/><br/>
For example: <br/>
If we are looking for an answer for <b>"What was the religion of the wife of JFK?"</b>, it should return the URI  http://dbpedia.org/resource/Catholic_Church.

In [61]:
def answer(question):
    print "..."
    path_to_jar = 'stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0.jar'
    path_to_models_jar = 'stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0-models.jar'

    dependency_parser = StanfordDependencyParser(path_to_jar=path_to_jar,
                                                 path_to_models_jar=path_to_models_jar)


    json_unit = {}
    json_unit["question"] = question
    if log_details:
        print "\nQUESTION:"
        print question

    result = dependency_parser.raw_parse(question)
    dep = result.next()

    model = list(dep.triples())

    json_unit["model"] = model
    if log_details:
        print '\nMODEL:'
        print model


    URI = "http://dbpedia.org/resource/"
    keyword = None
    handle = None
    trail = None
    stack = []
    
    for i in range(len(model)):
        if model[i][1] == u'nsubj':
            stack.append(model[i][2][0])

    for i in range(len(model)):
        if model[i][0][0] == stack[len(stack) - 1] and (model[i][1] == u'nmod' or model[i][1] == u'nmod:poss') :
            stack.append(model[i][2][0])

    json_unit["stack"] = list(stack)
    if log_details:
        print '\nSTACK:'
        print stack
    handle = stack.pop()
    handle = fetch_compound(handle, model)
    
    compound = ' '.join(handle)
    markers = ["[","]"]
    # markers = ["<entity>", "<entity>"] 
    text = question[:question.index(compound)] \
                + markers[0] + question[question.index(compound):question.index(compound)+len(compound)] + markers[1] \
                + question[(question.index(compound)+len(compound)):]
    
    resjson = getResources(text)
    URI = resjson['namedEntities'][0]['disambiguatedURL']
    json_unit["primaryURL"] = {}
    json_unit["primaryURL"]["src"] = "agdistis"
    json_unit["primaryURL"]["value"] = URI

    if log_details:
        print "\nAGISDISTIS URI:"
        print URI
   
    if URI == None:
        json_unit["answer"] = "Failed to fetch answer"
        return []

    results = []
    if len(stack) == 0:
        results.append(URI)
        
    i = len(stack)
    json_unit["stackDetails"] = []
    
    while i > 0:
        old_results = results
        trail = stack.pop()
        keyword = fetch_compound(trail, model)

        keyword = ' '.join(keyword)
        pair = get_closest_keyword(keyword, URI, json_unit)
        
        
        if type(pair) != np.ndarray and pair == None:
            results = old_results
            break
            
        if log_details:
            print pair
        keyword = pair[0]
        pType = pair[1]

        if log_details:
            print keyword, ", ", URI
        results = call_sparql(keyword, URI, pType)
        if log_details:
            print results

        for res in results:
            parsed_url = urlparse(res)

            if parsed_url.scheme == u'http' or parsed_url.scheme == u'https':
                URI = res
            else:
                break

        i -= 1
    
    json_unit["answer"] = results
    bigJson.append(json_unit)
    return results


### Set of Questions

In [60]:
questions = [
    'Where is the birth place of Bal Gangadhar Tilak?',
    'What is the capital of Germany?',
    'What are the population of capital of Germany?',
    'Who is the president of United States?',
    'When is the birth date of Tom Cruise?',
    'What is the color of the flag of Germany?',
    'Who is Donald Trump?',
    'Where is birth place of wife of Mahatma Gandhi?',
    'Who is the vice president of John Kennedy?',
    'What is the birth place of wife of John Kennedy?',
    'What is the municipality of Roberto Clemente Bridge',
    'What is the nationality of the prime minister of Thanong Bidaya?',
    'What is the party of Mumbai North?', # Failing due to agidistis
    'Who is the founder of Facebook?',
    'Who is the supreme leader of North Korea?',
    'Who is the writer of family guy?',
    "What is Peter Griffin's nationality?",
    "Who is Stewie Griffin's old man?",
    "Who are the family members of Peter Griffin?",
    "What are the oceans of Earth?", # Failing due to agidistis
    "What are satellites of Earth?" # Failing due to agidistis
]

### Write your question here

Set log_details to True if you want to see the details of extraction.

In [82]:
log_details = False

In [74]:
question = "Who is the vice president of John Kennedy?"

In [75]:
results = answer(question)
if len(results) == 0:
    print "Failed to fetch answer."
else:
    for res in results:
        print res

...

QUESTION:
Who is the vice president of John Kennedy?

MODEL:
[((u'Who', u'WP'), u'cop', (u'is', u'VBZ')), ((u'Who', u'WP'), u'nsubj', (u'president', u'NN')), ((u'president', u'NN'), u'det', (u'the', u'DT')), ((u'president', u'NN'), u'compound', (u'vice', u'NN')), ((u'president', u'NN'), u'nmod', (u'Kennedy', u'NNP')), ((u'Kennedy', u'NNP'), u'case', (u'of', u'IN')), ((u'Kennedy', u'NNP'), u'compound', (u'John', u'NNP'))]

STACK:
[u'president', u'Kennedy']

AGISDISTIS URI:
http://dbpedia.org/resource/John_F._Kennedy

PROPERTIES:
[[u'deathPlace', u'ontology', u'http://dbpedia.org/ontology/deathPlace', u'death place'], [u'deathDate', u'ontology', u'http://dbpedia.org/ontology/deathDate', u'death date'], [u'birthPlace', u'ontology', u'http://dbpedia.org/ontology/birthPlace', u'birth place'], [u'birthDate', u'ontology', u'http://dbpedia.org/ontology/birthDate', u'birth date'], [u'wikiPageID', u'ontology', u'http://dbpedia.org/ontology/wikiPageID', u'Wikipage page ID'], [u'wikiPageRevis


RESPONSE FROM W2V SERVER:
[['vice president' '' '0.0']
 ['vice president' 'death place' '0.346756523218']
 ['vice president' 'death date' '0.600482700681']
 ['vice president' 'birth place' '0.437443310447']
 ['vice president' 'birth date' '0.691169487909']
 ['vice president' 'Wikipage page ID' '0.263200086939']
 ['vice president' 'Wikipage revision ID' '0.2404390917']
 ['vice president' 'Link from a Wikipage to another Wikipage'
  '0.07450532734']
 ['vice president' 'Link from a Wikipage to an external page'
  '0.26311840247']
 ['vice president' 'name' '0.634497213844']
 ['vice president' 'thumbnail' '0.0']
 ['vice president' 'alma mater' '0.0']
 ['vice president' 'battle' '0.748832643969']
 ['vice president' 'child' '0.47229922576']
 ['vice president' 'military branch' '0.654710988808']
 ['vice president' 'military rank' '0.678124581344']
 ['vice president' 'military unit' '0.642768615754']
 ['vice president' 'office' '0.605230739546']
 ['vice president' 'party' '0.52298895322']
 ['v

### Run the following for the series of questions declared above

<b style="color: #1204bc">Preferable to set log_details to false since it will be annoying to surf through all the data for a series of 20 questions</b>

In [85]:
log_details = False

Empty bigJson to create fresh list of details. This stores all the details of the answer extraction of executed by the following code snippet.

In [86]:
bigJson = []

Run the following code to check answers of given list of question.

In [87]:
for q in questions:
    print "\nQUESTION:", q
    print "\nANSWER:"
    results = answer(q)

    if len(results) == 0:
        print "Failed to fetch answer."
    else:
        for res in results:
            print res
    
    print "_____________________________"


QUESTION: Where is the birth place of Bal Gangadhar Tilak?

ANSWER:
...

KEYWORD
birth place
http://dbpedia.org/resource/India
http://dbpedia.org/resource/Ratnagiri
http://dbpedia.org/resource/Bombay_State
http://dbpedia.org/resource/British_India
_____________________________

QUESTION: What is the capital of Germany?

ANSWER:
...
http://dbpedia.org/resource/Berlin
_____________________________

QUESTION: What are the population of capital of Germany?

ANSWER:
...

KEYWORD
population total
3610156
_____________________________

QUESTION: Who is the president of United States?

ANSWER:
...

KEYWORD
leader
http://dbpedia.org/resource/Barack_Obama
http://dbpedia.org/resource/John_Roberts
http://dbpedia.org/resource/Joe_Biden
http://dbpedia.org/resource/Paul_Ryan
_____________________________

QUESTION: When is the birth date of Tom Cruise?

ANSWER:
...

KEYWORD
birth date
1962-07-03
1962-7-3
_____________________________

QUESTION: What is the color of the flag of Germany?

ANSWER:
...


In [88]:
print json.dumps(bigJson)

[{"question": "Where is the birth place of Bal Gangadhar Tilak?", "stackDetails": [{"keyword": "birth place", "word": "birth place", "properties": [["deathPlace", "ontology", "http://dbpedia.org/ontology/deathPlace", "death place"], ["deathDate", "ontology", "http://dbpedia.org/ontology/deathDate", "death date"], ["birthPlace", "ontology", "http://dbpedia.org/ontology/birthPlace", "birth place"], ["birthDate", "ontology", "http://dbpedia.org/ontology/birthDate", "birth date"], ["wikiPageID", "ontology", "http://dbpedia.org/ontology/wikiPageID", "Wikipage page ID"], ["wikiPageRevisionID", "ontology", "http://dbpedia.org/ontology/wikiPageRevisionID", "Wikipage revision ID"], ["wikiPageWikiLink", "ontology", "http://dbpedia.org/ontology/wikiPageWikiLink", "Link from a Wikipage to another Wikipage"], ["wikiPageExternalLink", "ontology", "http://dbpedia.org/ontology/wikiPageExternalLink", "Link from a Wikipage to an external page"], ["thumbnail", "ontology", "http://dbpedia.org/ontology/thu