# NLP - Semantic Enrichment

- **Created by Andrés Segura Tinoco**
- **Created on June 13, 2019**

**Semantic Enrichment** refers in general terms to the technologies and practices used to add semantic metadata to content.

**<a href='https://en.wikipedia.org/wiki/SPARQL' target='_blank' >SPARQL</a>** is an RDF query language (a semantic query language for databases) able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web.

**<a href='https://wiki.dbpedia.org/about' target='_blank' >DBpedia</a>** is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web.

In this document, you can see how to enrich entities of the type:
- Person
- Country
- Place
- Company
- Movie

In [1]:
# Load SPARQL libraries
from SPARQLWrapper import SPARQLWrapper, JSON

In [2]:
# Returns a clean entity
def get_clean_entity(text):
    return text.replace("'", "").replace('"', '')

In [3]:
# Make a query to sparql
def exec_sparql_query(query):
    entry_point = "https://dbpedia.org/sparql"
    header = """
             PREFIX dbo:<http://dbpedia.org/ontology/>
             PREFIX geo:<http://www.georss.org/georss/>
            """
    try:
        query = header + query
        
        sparql = SPARQLWrapper(entry_point)
        sparql.setQuery(query)
        sparql.setReturnFormat(JSON)
        result = sparql.query().convert()
        
        if result and 'results' in result:
            result = result['results']['bindings']
        
        return result
    except:
        return None

### Person

In [4]:
# Query in DBpedia the information associated with the person
def query_person(person_name):
    person_name = get_clean_entity(person_name)
    
    query = """
            SELECT (SAMPLE (?name) AS ?name) (SAMPLE (?birthPlace) AS ?birthPlace) (SAMPLE (?birthDate) AS ?birthDate)
                   (SAMPLE (?person) AS ?person) (SAMPLE(?description) AS ?description)
            WHERE {
                ?person dbo:birthPlace ?birthPlace.
                ?person dbo:birthDate ?birthDate.
                ?person foaf:name ?name.
                ?person rdfs:comment ?description. 
                FILTER (?name like "%""" + person_name + """%"^^xsd:char).
                FILTER (langMatches(lang(?description), "en")).
            }
            GROUP BY ?person
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [5]:
# Ask DBpedia for 'Elon Musk' person
person_name = 'Elon Musk'
result = query_person(person_name)
result[0]

{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Elon Musk'},
 'birthPlace': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/Pretoria'},
 'birthDate': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#date',
  'value': '1971-06-28'},
 'person': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Elon_Musk'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'Elon Reeve Musk (/ˈiːlɒn ˈmʌsk/; born June 28, 1971) is a South African-born Canadian-American business magnate, investor, engineer and inventor. He is the founder, CEO, and CTO of SpaceX; co-founder, CEO, and product architect of Tesla Motors; co-founder and chairman of SolarCity; co-chairman of OpenAI; co-founder of Zip2; and founder of X.com which merged with PayPal of Confinity. As of June 2016, he has an estimated net worth of US$12.7 billion, making him the 83rd wealthiest person in the world.'}}

### Country

In [6]:
# Query in DBpedia the information associated with the country
def query_country(country):
    country = get_clean_entity(country)
    
    query = """
            SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?poblation) AS ?poblation) (SAMPLE(?geoloc) AS ?geoloc)
                   (SAMPLE(?place) AS ?place) (SAMPLE(?description) AS ?description)
            WHERE {
                ?place rdf:type dbo:Country.
                ?place dbo:abstract ?description.
                ?place rdfs:label ?name.
                ?place dbo:populationTotal ?poblation.
                ?place geo:point  ?geoloc.
                FILTER (langMatches(lang(?name),"en")).
                FILTER (langMatches(lang(?description),"en")).
                FILTER (?name like "%""" + country + """%"^^xsd:char).
            }
            GROUP BY ?place
            ORDER BY DESC(?poblation)
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [7]:
# Ask DBpedia for 'Malaysia' country
country_name = 'Malaysia'
result = query_country(country_name)
result[0]

{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Malaysia'},
 'poblation': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger',
  'value': '28334'},
 'geoloc': {'type': 'literal', 'value': '2.5 112.5'},
 'place': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Malaysia'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'Malaysia (/məˈleɪʒə/ mə-LAY-zhə or /məˈleɪsiə/ mə-LAY-see-ə; Malaysian pronunciation: [məlejsiə]) is a federal constitutional monarchy located in Southeast Asia. It consists of thirteen states and three federal territories and has a total landmass of 330,803 square kilometres (127,720 sq mi) separated by the South China Sea into two similarly sized regions, Peninsular Malaysia and East Malaysia (Malaysian Borneo). Peninsular Malaysia shares a land and maritime border with Thailand and maritime borders with Singapore, Vietnam, and Indonesia. East Malaysia shares land and maritime borders with Brun

### Place or Location

In [8]:
# Query in DBpedia the information associated with the city
def query_place(place):
    place = get_clean_entity(place)
    
    query = """
            SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?poblation) AS ?poblation) (SAMPLE(?geoloc) AS ?geoloc)
                   (SAMPLE(?place) AS ?place) (SAMPLE(?description) AS ?description)
            WHERE {
                ?place rdf:type dbo:Place.
                ?place dbo:abstract ?description.
                ?place rdfs:label ?name.
                ?place dbo:populationTotal ?poblation.
                ?place geo:point ?geoloc.
                FILTER (langMatches(lang(?name),"en")).
                FILTER (langMatches(lang(?description),"en")).
                FILTER (?name like "%""" + place + """%"^^xsd:char).
            }
            GROUP BY ?place
            ORDER BY DESC(?poblation)
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [9]:
# Ask DBpedia for 'Houston' place
city_name = 'Houston'
result = query_place(city_name)
result[0]

{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Houston'},
 'poblation': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger',
  'value': '2099451'},
 'geoloc': {'type': 'literal',
  'value': '29.762777777777778 -95.38305555555556'},
 'place': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Houston'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'Houston (/ˈhjuːstən/ HYOO-stən) is the most populous city in Texas and the fourth-most populous city in the United States, located in Southeast Texas near the Gulf of Mexico. With a census-estimated 2014 population of 2.239 million within a land area of 599.6 square miles (1,553 km2), it also is the largest city in the Southern United States, as well as the seat of Harris County. It is the principal city of Houston–The Woodlands–Sugar Land, which is the fifth-most populated metropolitan area in the United States. Houston was founded on August 28, 1836 near the banks 

### Company

In [10]:
# Query in DBpedia the information associated with the company
def query_company(company):
    company = get_clean_entity(company)
    
    query = """
            SELECT (SAMPLE (?company) AS ?company) (SAMPLE(?name) AS ?name) (SAMPLE(?industry) AS ?industry)
                   (SAMPLE (?foundingYear) AS ?foundingYear) (SAMPLE(?description) AS ?description)
            WHERE {
               ?company rdf:type dbo:Company.
               ?company foaf:name ?name.
               ?company dbo:industry ?industry.
               ?company dbo:foundingYear ?foundingYear.
               ?company dbo:abstract ?description.
               FILTER (?name like "%"""+ company + """%"^^xsd:char).
               FILTER (langMatches(lang(?name),"en")). 
            }
            GROUP BY ?company
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [11]:
# Ask DBpedia for 'Linux' company
company_name = 'Linux'
result = query_company(company_name)
result[0]

{'company': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/Linux_Game_Publishing'},
 'name': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'Linux Game Publishing, Ltd.'},
 'industry': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/Video_game_industry'},
 'foundingYear': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#gYear',
  'value': '2001'},
 'description': {'type': 'literal',
  'xml:lang': 'de',
  'value': 'Linux Game Publishing (kurz LGP) ist ein Software-Unternehmen mit Sitz in Nottingham, England, das darauf spezialisiert ist, Computerspiele nach Linux zu portieren.'}}

### Movie or Film

In [12]:
# Query in DBpedia the information associated with the movie
def query_movie(movie):
    company = get_clean_entity(movie)
    
    query = """
            SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?movie) AS ?movie) (SAMPLE(?actor) AS ?actor)
                   (SAMPLE(?director) AS ?director) (SAMPLE(?description) AS ?description)
            WHERE {
               ?movie rdf:type dbo:Film.
               ?movie foaf:name ?name.
               ?movie dbo:starring ?actor.
               ?movie dbo:director ?director.
               ?movie dbo:abstract ?description.
               FILTER (?name like "%"""+ movie +"""%"^^xsd:char ).
               FILTER (langMatches(lang(?description),"en")).
            }
            GROUP BY ?movie ?actor
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [13]:
# Ask DBpedia for 'Linux' company
movie_name = 'John Wick'
result = query_movie(movie_name)
result[0]

{'name': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'John Wick: Chapter 2'},
 'movie': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/John_Wick:_Chapter_2'},
 'actor': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Keanu_Reeves'},
 'director': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/Chad_Stahelski'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'John Wick: Chapter 2 is an upcoming American action thriller film directed by Chad Stahelski and written by Derek Kolstad. It is a sequel to 2014 film John Wick. The film stars Keanu Reeves, Common, Bridget Moynahan, Ian McShane, and John Leguizamo. Principal photography began on October 26, 2015, in New York City and the film is set to be released on February 10, 2017.'}}

<hr>
<p><a href="https://ansegura7.github.io/NLP/">« Home</a></p>