# Querying Microsoft Academic Knowledge Graph

We want to create links with Microsoft Academic Knowledge Graph on a paper basis. 
Therefore, we need to query the papers and then automatically link them with the articles we extracted. 
In the query we look specifically for the artile names we have used in our reasoning set. 

The final integration of the information in our graph is only done after entity linking when we add all additional knowledge.  

In [None]:
query ="""PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX magp: <http://ma-graph.org/property/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fabio: <http://purl.org/spar/fabio/>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX datacite: <http://purl.org/spar/datacite/>
 

select distinct * where {
 ?p datacite:doi ?doi.zz
 FILTER (?doi = \"10.1007/978-3-322-81546-0_5\"^^xsd:string || ?doi = \"10.1123/ijsnem.11.s1.s128\"^^xsd:string)
}LIMIT 100"""
query

In [None]:
query_template ="""PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX magp: <http://ma-graph.org/property/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fabio: <http://purl.org/spar/fabio/>
PREFIX org: <http://www.w3.org/ns/org#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX datacite: <http://purl.org/spar/datacite/>
 

select distinct * where {
 ?paper datacite:doi ?doi.
 FILTER ("""
 
query_template_end = """)
}LIMIT 100"""

In [None]:
with open("paper_dois.txt", 'r') as paper_f:
    content = paper_f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
doi_list = [x.strip().replace(".txt","").replace("_","/") for x in content] 
doi_list[1:10]

In [None]:
import math
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd

In [None]:
sparql = SPARQLWrapper("http://ma-graph.org/sparql")

def run_query(doi_list, s,e):
    dois =[ "?doi = \"" + doi + "\"^^xsd:string" for doi in doi_list[s:e]]
    query = query_template + ' || '.join(dois) + query_template_end
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    result_list = [{'paper':r['paper']['value'], 'doi':r['doi']['value']} for r in results['results']['bindings']]
    df = pd.DataFrame(result_list)
    return df

In [None]:
dfs = []
for i in range(0, len(doi_list), 100):
    s = i
    e = min(len(doi_list), i+100)
    print("Running" + str(s) + " to " + str(e))
    dfs.append(run_query(doi_list,s,e))


In [None]:
sum([len(x) for x in dfs])

In [None]:
df = pd.concat(dfs)

In [None]:
df.to_csv('ma_papers.csv.gz', compression='gzip')

In [None]:
df.drop_duplicates().shape