# Why choose Wikidata, but not DBpedia?

In this notebook, we show the reason for choosing Wikidata SPARQL queries instead of DBpedia SPARQL queries. 

We trained a model using the same setting as our baseline, only change the queries to queries for DBpedia. For DBpedia, there are more queries than for Wikidata, because of the difference of the knowledge in these two knowledge graphs. However, this has no major impact on the results.

In [6]:
import sys
sys.path.append('../../code/')
from utils.query import init_summarizer, predict_query
from utils.query import postprocess_sparql

In [20]:
summarizer = init_summarizer("../../fine-tuned_models/mt5-base-qald9-dbpedia/checkpoint-20000")
def predict_and_print_language_query(question):
    for lang, question_string in question.items():
        query = predict_query(summarizer, question_string)
        query = postprocess_sparql(query)
        print(f"{lang}: {query}")

We first show the predicted queries of a simple example question. 

In [21]:
question = {
    "en": "Who is the mayor of Berlin?",
    "de": "Wer ist Bürgermeister von Berlin?",
    "ru": "Кто мэр Берлина?",
    "uk": "Хто є мером Берліну?",
    "lt": "Kas yra Berlyno meras?",
    "be": "Хто мэр Берліна?",
    "ba": "Берлин ҡалаһының мэры кем?"
}

In [22]:
predict_and_print_language_query(question)

Your max_length is set to 128, but your input_length is only 8. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)
Your max_length is set to 128, but your input_length is only 8. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=4)


en: SELECT DISTINCT  ?uri WHERE  {  dbr:Berlin dbo:leaderName  ?uri  }  


Your max_length is set to 128, but your input_length is only 7. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=3)


de: SELECT DISTINCT  ?uri WHERE  {  dbr:Berlin dbo:leaderName  ?uri  }  


Your max_length is set to 128, but your input_length is only 10. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)


ru: SELECT DISTINCT  ?uri WHERE  {  dbr:Berlin dbo:leaderName  ?uri  }  


Your max_length is set to 128, but your input_length is only 10. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)


uk: SELECT DISTINCT  ?uri WHERE  {  dbr:Brooklyn_Bridge dbo:leaderName  ?uri  }  


Your max_length is set to 128, but your input_length is only 7. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=3)


lt: SELECT DISTINCT  ?uri WHERE  {  dbr:Berlin dbo:leaderName  ?uri  }  


Your max_length is set to 128, but your input_length is only 11. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)


be: SELECT DISTINCT  ?uri WHERE  {  dbr:Berlin dbo:leaderName  ?uri  }  
ba: SELECT DISTINCT  ?uri WHERE  {  dbr:Berlin dbo:mayor  ?uri  }  


Our language model did a good job to predict `dbr:Berlin` and `dbo:leaderName` for the most languages.

Let's try with another more complicated question. 

In [23]:
question = {
    "en": "Which country was Bill Gates born in?",
    "de": "In welchen Land wurde Bill Gates geboren?",
    "ru": "В какой стране родился Билл Гейтс?",
    "uk": "У якій країні народився Білл Гейтс?",
    "be": "У якой краіне нарадзіўся Біл Гейтс?",
    "lt": "Kokioje šalyje gimė Billas Gatesas?",
    "ba": "Билл гейтс тыуған илде ниндәй?"
}
predict_and_print_language_query(question)

Your max_length is set to 128, but your input_length is only 11. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=5)
Your max_length is set to 128, but your input_length is only 12. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=6)


en: SELECT DISTINCT  ?uri WHERE  {  dbr:Bill_Gates dbo:birthPlace  ?uri  }  


Your max_length is set to 128, but your input_length is only 13. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=6)


de: SELECT DISTINCT  ?uri WHERE  {  dbr:Bill_Gates dbo:birthPlace  ?uri  }  


Your max_length is set to 128, but your input_length is only 13. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=6)


ru: SELECT DISTINCT  ?uri WHERE  {  dbr:Bill_Eisenberg dbo:birthPlace  ?uri .  ?uri rdf:type dbo:Country  }  


Your max_length is set to 128, but your input_length is only 14. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=7)


uk: SELECT DISTINCT  ?uri WHERE  {  dbr:William_Eisenberg dbo:birthPlace  ?uri  }  


Your max_length is set to 128, but your input_length is only 14. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=7)


be: SELECT DISTINCT  ?uri WHERE  {  dbr:Bile_Getchie dbo:birthPlace  ?uri  }  


Your max_length is set to 128, but your input_length is only 17. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=8)


lt: SELECT DISTINCT  ?uri WHERE  {  dbr:William_Gates dbo:birthPlace  ?uri  }  
ba: SELECT DISTINCT  ?uri WHERE  {  dbr:William_Gereots dbo:birthPlace  ?uri .  ?uri rdf:type dbo:Country  }  
