# NLP - Semantic Enrichment

- **Created by Andrés Segura Tinoco**
- **Created on June 13, 2019**
- **Updated on April 06, 2021**

**Semantic Enrichment** refers in general terms to the technologies and practices used to add semantic metadata to content.

**SPARQL** is an RDF query language (a semantic query language for databases) able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web <a href="#link_one">[1]</a>.

**DBpedia** is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web <a href="#link_two">[2]</a>.

In this document, you can see how to enrich entities of the type:
- Person
- Country
- Place
- Company
- Movie

## Semantic Enrichment using SPARQL

In [1]:
# Load SPARQL libraries
from SPARQLWrapper import SPARQLWrapper, JSON

In [2]:
# Returns a clean entity
def get_clean_entity(text):
    return text.replace("'", "").replace('"', '')

In [3]:
# Make a query to sparql
def exec_sparql_query(query, verbose=False):
    entry_point = "https://dbpedia.org/sparql"
    header = """
             PREFIX dbo:<http://dbpedia.org/ontology/>
             PREFIX geo:<http://www.georss.org/georss/>
            """
    try:
        query = header + query
        if verbose:
            print(query)
        
        sparql = SPARQLWrapper(entry_point)
        sparql.setQuery(query)
        sparql.setReturnFormat(JSON)
        result = sparql.query().convert()
        
        if result and 'results' in result:
            result = result['results']['bindings']
        
        return result
    except:
        return None

### 1. Person

In [4]:
# Query in DBpedia the information associated with the person
def query_person(person_name):
    person_name = get_clean_entity(person_name)
    
    query = """
            SELECT (SAMPLE (?name) AS ?name) (SAMPLE (?birthPlace) AS ?birthPlace) (SAMPLE (?birthDate) AS ?birthDate)
                   (SAMPLE (?person) AS ?person) (SAMPLE(?description) AS ?description)
            WHERE {
                ?person dbo:birthPlace ?birthPlace.
                ?person dbo:birthDate ?birthDate.
                ?person foaf:name ?name.
                ?person rdfs:comment ?description. 
                FILTER (?name like "%""" + person_name + """%"^^xsd:char).
                FILTER (langMatches(lang(?description), "en")).
            }
            GROUP BY ?person
            ORDER BY ?person
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [5]:
# Ask DBpedia for 'Steve Jobs' person
person_name = 'Steve Jobs'
result = query_person(person_name)
result[0]

{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Steve Jobs'},
 'birthPlace': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/California'},
 'birthDate': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#date',
  'value': '1955-02-24'},
 'person': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Steve_Jobs'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': "Steven Paul Jobs (; February 24, 1955 – October 5, 2011) was an American business magnate, industrial designer, investor, and media proprietor. He was the chairman, chief executive officer (CEO), and co-founder of Apple Inc., the chairman and majority shareholder of Pixar, a member of The Walt Disney Company's board of directors following its acquisition of Pixar, and the founder, chairman, and CEO of NeXT. Jobs is widely recognized as a pioneer of the personal computer revolution of the 1970s and 1980s, along with Apple co-founder Steve Wozniak."}}

### 2. Country

In [6]:
# Query in DBpedia the information associated with the country
def query_country(country):
    country = get_clean_entity(country)
    
    query = """
            SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?poblation) AS ?poblation) (SAMPLE(?geoloc) AS ?geoloc)
                   (SAMPLE(?place) AS ?place) (SAMPLE(?description) AS ?description)
            WHERE {
                ?place rdf:type dbo:Country.
                ?place dbo:abstract ?description.
                ?place rdfs:label ?name.
                ?place dbo:populationTotal ?poblation.
                ?place geo:point  ?geoloc.
                FILTER (langMatches(lang(?name),"en")).
                FILTER (langMatches(lang(?description),"en")).
                FILTER (?name like "%""" + country + """%"^^xsd:char).
            }
            GROUP BY ?place
            ORDER BY DESC(?poblation)
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [7]:
# Ask DBpedia for 'Malaysia' country
country_name = 'Malaysia'
result = query_country(country_name)
result[0]

{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Malaysia'},
 'poblation': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger',
  'value': '32730000'},
 'geoloc': {'type': 'literal', 'value': '2.5 112.5'},
 'place': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Malaysia'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': "Malaysia ( () mə-LAY-zee-ə, -\u2060zhə; Malay: [məlejsiə]) is a country in Southeast Asia. The federal constitutional monarchy consists of thirteen states and three federal territories, separated by the South China Sea into two regions, Peninsular Malaysia and Borneo's East Malaysia. Peninsular Malaysia shares a land and maritime border with Thailand and maritime borders with Singapore, Vietnam, and Indonesia. East Malaysia shares land and maritime borders with Brunei and Indonesia and a maritime border with the Philippines and Vietnam. Kuala Lumpur is the national capital and largest city whi

### 3. Place or Location

In [8]:
# Query in DBpedia the information associated with the city
def query_place(place):
    place = get_clean_entity(place)
    
    query = """
            SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?poblation) AS ?poblation) (SAMPLE(?geoloc) AS ?geoloc)
                   (SAMPLE(?place) AS ?place) (SAMPLE(?description) AS ?description)
            WHERE {
                ?place rdf:type dbo:Place.
                ?place dbo:abstract ?description.
                ?place rdfs:label ?name.
                ?place dbo:populationTotal ?poblation.
                ?place geo:point ?geoloc.
                FILTER (langMatches(lang(?name),"en")).
                FILTER (langMatches(lang(?description),"en")).
                FILTER (?name like "%""" + place + """%"^^xsd:char).
            }
            GROUP BY ?place
            ORDER BY DESC(?poblation)
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [9]:
# Ask DBpedia for 'Houston' place
city_name = 'Houston'
result = query_place(city_name)
result[0]

{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Houston'},
 'poblation': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger',
  'value': '2100263'},
 'geoloc': {'type': 'literal',
  'value': '29.762777777777778 -95.38305555555556'},
 'place': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Houston'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'Houston ( () HEW-stən) is the most populous city in the U.S. state of Texas, fourth most populous city in the United States, most populous city in the Southern United States, as well as the sixth most populous in North America, with an estimated 2019 population of 2,320,268. Located in Southeast Texas near Galveston Bay and the Gulf of Mexico, it is the seat of Harris County and the principal city of the Greater Houston metropolitan area, which is the fifth most populous metropolitan statistical area in the United States and the second most populous in Texas after th

### 4. Company

In [10]:
# Query in DBpedia the information associated with the company
def query_company(company):
    company = get_clean_entity(company)
    
    query = """
            SELECT (SAMPLE (?company) AS ?company) (SAMPLE(?name) AS ?name) (SAMPLE(?industry) AS ?industry)
                   (SAMPLE (?foundingYear) AS ?foundingYear) (SAMPLE(?description) AS ?description)
            WHERE {
               ?company rdf:type dbo:Company.
               ?company foaf:name ?name.
               ?company dbo:industry ?industry.
               ?company dbo:foundingYear ?foundingYear.
               ?company dbo:abstract ?description.
               FILTER (?name like "%"""+ company + """%"^^xsd:char).
               FILTER (langMatches(lang(?name),"en")). 
            }
            GROUP BY ?company
            ORDER BY ?company
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [11]:
# Ask DBpedia for 'Linux' company
company_name = 'Amazon Books'
result = query_company(company_name)
result[0]

{'company': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/Amazon_Books'},
 'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Amazon Books'},
 'industry': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Retail'},
 'foundingYear': {'type': 'typed-literal',
  'datatype': 'http://www.w3.org/2001/XMLSchema#gYear',
  'value': '2015'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'Amazon Books is a chain of retail bookstores owned by online retailer Amazon. The first store opened on November 2, 2015, in Seattle, Washington. As of 2018, Amazon Books has a total of seventeen stores, with plans to expand to more locations.'}}

### 5. Movie or Film

In [12]:
# Query in DBpedia the information associated with the movie
def query_movie(movie):
    company = get_clean_entity(movie)
    
    query = """
            SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?movie) AS ?movie) (SAMPLE(?actor) AS ?actor)
                   (SAMPLE(?director) AS ?director) (SAMPLE(?description) AS ?description)
            WHERE {
               ?movie rdf:type dbo:Film.
               ?movie foaf:name ?name.
               ?movie dbo:starring ?actor.
               ?movie dbo:director ?director.
               ?movie dbo:abstract ?description.
               FILTER (?name like "%"""+ movie +"""%"^^xsd:char ).
               FILTER (langMatches(lang(?description),"en")).
            }
            GROUP BY ?movie ?actor
            ORDER BY ?movie 
            """
    
    # Run query against DBpedia
    return exec_sparql_query(query)

In [13]:
# Ask DBpedia for 'John Wick' movie
movie_name = 'John Wick'
result = query_movie(movie_name)
result[0]

{'name': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'John Wick: Chapter 2'},
 'movie': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/John_Wick:_Chapter_2'},
 'actor': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/John_Leguizamo'},
 'director': {'type': 'uri',
  'value': 'http://dbpedia.org/resource/Chad_Stahelski'},
 'description': {'type': 'literal',
  'xml:lang': 'en',
  'value': 'John Wick: Chapter 2 (also known as simply John Wick 2) is a 2017 American neo-noir action-thriller film directed by Chad Stahelski and written by Derek Kolstad. It is the second installment in the John Wick film series, and the sequel to the 2014 film John Wick. It stars Keanu Reeves, Common, Laurence Fishburne, Riccardo Scamarcio, Ruby Rose, John Leguizamo, and Ian McShane. The plot follows hitman John Wick (Reeves), who goes on the run after a bounty is placed on him. Principal photography began on October 26, 2015, in New York City. The film premiered in Los Angeles on Januar

## Reference

<a name='link_one' href='https://en.wikipedia.org/wiki/SPARQL' target='_blank' >[1]</a> Wikipedia - SPARQL.  
<a name='link_two' href='https://wiki.dbpedia.org/about' target='_blank' >[2]</a> DBpedia home page.  

<hr>
<p><a href="https://ansegura7.github.io/NLP/">« Home</a></p>