# Dependencies

<ul>
<li>Currently uses the Stanford dependency parser provided by nltk.</li>
<li>loadW2V is a program that loads word2Vec model stored in disk. (The models are provide by gensim)</li>
<li>OxfordDictionary calls the Oxford Dictionary API to retrieve synonyms for certain words.</li>
</ul>

In [34]:
from nltk.parse.stanford import StanfordDependencyParser
from SPARQLWrapper import SPARQLWrapper, JSON
from urlparse import urlparse
import loadW2V
import numpy as np
import OxfordDictionary as od

If you want to see details of the extraction of answers, set the following to 'True':

In [59]:
log_details = False

# Functions

### redirect(URI)

The function <b>"redirect"</b> returns the proper DBpedia URI by fetching the property <b>wikiPageRidirects</b>.
<br/><br/>
For example: <br/>
If the provided URI is http://dbpedia.org/page/JFK, it returns http://dbpedia.org/page/John_F._Kennedy

In [54]:
def redirect(URI):
    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    query = """
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        SELECT ?wikiPageRedirects
        WHERE { <""" + URI + """>
        dbo:wikiPageRedirects ?wikiPageRedirects }
    """
    if log_details:
        print query
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    # print results
    new_URI = [r["wikiPageRedirects"]["value"] for r in results["results"]["bindings"]]
    if log_details:
        print "\nREDIRECT:"
    if len(new_URI) > 0:
        if log_details:
            print new_URI[0]
        return new_URI[0]
    else:
        if log_details:
            print URI
        return URI

### get_closest_word(word, URI)

The function <b>"get_closest_word"</b> returns the property of the URI which is closest to the provided word.
<br/><br/>
For example: <br/>
If the provided word is <b>wife</b> for http://dbpedia.org/page/John_F._Kennedy, it returns the property <b>spouse</b>.

In [38]:
def get_closest_keyword(word, URI):
    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    query = """
        select distinct ?prop where {
             <"""+ URI +""">
             ?prop ?ent }
    """
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    properties = []
    if log_details:
        print "\nPROPERTIES:"
    for result in results["results"]["bindings"]:
        if log_details:
            print result
        arr = result["prop"]["value"].split('/')
        # properties.append(arr[len(arr) - 1])
        if arr[len(arr) - 1] != 'abstract':
            properties.append([arr[len(arr) - 1], arr[len(arr) - 2]])

    properties = np.array(properties)
    if log_details:
        print "To be matched with: ", word
    # print properties
    for p in properties:
        if log_details:
            print p[0]
        if p[0] == word:
            return p

    synonyms = od.getSynonyms(word)

    log = []
    if synonyms == False:
        synonyms = []

    synonyms.append((word, word))
    if log_details:
        print "\nSYNONYMS:"
        print synonyms
        
    for synonym in synonyms:
        for p in properties:
            if p[0] == synonym[0]:
                return p

    
    prop_syn_avg = []
    for p in properties:
        avg = 0
        for synonym in synonyms:
            try:
                s = loadW2V.b.similarity(p[0], synonym[0])
                # print '(', p[0], ',' , synonym[0], ') ', s
                avg += s
            except KeyError:
                log.append(KeyError.message)

        prop_syn_avg.append([p[0], avg/len(synonyms)])

    sorted = np.argsort(np.array(prop_syn_avg), 0)[:,1]

    if log_details:
        print "\nSYNONYM SIMILARITIES:"
        for s in sorted:
            print prop_syn_avg[s]

    keyword = prop_syn_avg[np.argmax(np.array(prop_syn_avg), 0)[1]][0]

    print keyword
    for p in properties:
        if p[0] == keyword:
            if log_details:
                print "\nKEYWORD:"
                print p
            return p

### fetch_compound(word, model)

The function <b>"fetch_compound"</b> returns compound words for words from the given model after dependency parsing.
<br/><br/>
For example: <br/>
If the provided word is <b>date</b>, whereas we are actually looking for <b>birth date</b> in the question, it will return so by looking up the model

In [39]:
def fetch_compound(word, model):
    compound_word = []
    for m in model:
        if m[1] == u'compound' and m[0][0] == word:
            compound_word.append(m[2][0])

    compound_word.append(word)
    return compound_word

### call_sparql(keyword, URI, pType)

The function <b>"call_sparql"</b> returns values of properties provided as keywords in the given URI.
<br/><br/>
For example: <br/>
If we are looking for <b>spouse</b> of http://dbpedia.org/page/John_F._Kennedy it will return http://dbpedia.org/page/Jacqueline_Kennedy_Onassis

In [40]:
def call_sparql(keyword, URI, pType):
    # print "\nCALL SPARQL:"
    pTypes = np.array([[u'ontology', 'dbo'],[u'property', 'dbp']])
    
    for p in pTypes:
        if p[0] == pType:
            pType = p[1]

    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    query = """
        PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
        SELECT ?"""+ keyword +"""
        WHERE { <""" + URI + """>
        """+pType+""":"""+keyword+""" ?"""+keyword+""" }
    """
    if log_details:
        print query
        
    sparql.setQuery(query)

    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    # answers = []
    # for result in results:
    # print results["results"]["bindings"]
    return [r[keyword]["value"] for r in results["results"]["bindings"]]

### answer(question)

The function <b>"answer"</b> returns an answer for a question using the functions above.
<br/><br/>
For example: <br/>
If we are looking for an answer for <b>"What was the religion of the wife of JFK?"</b>, it should return the <b>abstract for catholic church</b>.

In [57]:
def answer(question):
    print "..."
    path_to_jar = 'stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0.jar'
    path_to_models_jar = 'stanford-corenlp-full-2017-06-09/stanford-corenlp-3.8.0-models.jar'

    dependency_parser = StanfordDependencyParser(path_to_jar=path_to_jar,
                                                 path_to_models_jar=path_to_models_jar)


    if log_details:
        print "\nQUESTION:"
        print question

    result = dependency_parser.raw_parse(question)
    dep = result.next()

    model = list(dep.triples())

    if log_details:
        print '\nMODEL:'
        print model


    URI = "http://dbpedia.org/resource/"
    keyword = None
    handle = None
    trail = None
    stack = []

    # print models
    for i in range(len(model)):
        if model[i][1] == u'nsubj':
            stack.append(model[i][2][0])

    for i in range(len(model)):
        # print model[i][0][0], stack[len(stack) - 1]
        if model[i][0][0] == stack[len(stack) - 1] and model[i][1] == u'nmod':
            stack.append(model[i][2][0])

    if log_details:
        print '\nSTACK:'
        print stack
    handle = stack.pop()
    handle = fetch_compound(handle, model)
    URI = URI + '_'.join(handle)
    URI = redirect(URI)
    # print URI
    results = []
    if len(stack) == 0:
        results.append(URI)
        
    i = len(stack)
    while i > 0:
        #if len(stack) == 0:
        #    keyword = "label"
        #else:
        
        trail = stack.pop()
        keyword = fetch_compound(trail, model)

        # print URI
        keyword = ' '.join(keyword)
        keyword = ''.join(x for x in keyword.title() if not x.isspace())
        keyword = list(keyword)
        keyword[0] = keyword[0].lower()
        keyword = ''.join(keyword)
        pair = get_closest_keyword(keyword, URI)
        if log_details:
            print pair
        keyword = pair[0]
        pType = pair[1]

        if log_details:
            print keyword, ", ", URI
        results = call_sparql(keyword, URI, pType)
        if log_details:
            print results

        for res in results:
            parsed_url = urlparse(res)

            if parsed_url.scheme == u'http' or parsed_url.scheme == u'https':
                # res = call_sparql(keyword, res)
                URI = res
            else:
                break

        i -= 1

    
    return results


### Set of Questions

In [42]:
questions = [
    # 'Where is the birth place of Bal Gangadhar Tilak?'
    # 'What is the capital of Germany?'
    # 'Who is the president of United States?'
    # 'When is the birth date of Tom Cruise?'
    # 'What is the color of the flag of Germany?'
    # 'Who is Donald Trump?'
    # 'Where is birth place of wife of Mahatma Gandhi?'
    # 'Who is the vice president of John Kennedy?'
    # 'What is the birth place of wife of John Kennedy?'
    # 'What is the municipality of Roberto Clemente Bridge'
    # 'What is the nationality of the prime minister of Thanong Bidaya?'
    # 'which are the films of Richard Gere and Julia Roberts?'
    # 'What is the party of Mumbai North?'
    #'Who is the founder of Facebook?'
]

### Write your question here

In [70]:
question = 'Where is birth place of wife of Mahatma Gandhi?'

In [71]:
results = answer(question)

for res in results:
    print res

...
http://dbpedia.org/resource/Gujarat
http://dbpedia.org/resource/Porbandar
http://dbpedia.org/resource/Kathiawar_Agency
http://dbpedia.org/resource/British_Raj
