# Reproducible Workflow
This notebook is intended to be a walkthrough of the paper results with examples that convey the main idea. We show the workflow for the entire paper. 
* To load the LUBM Graph we use a public endpoint on [Dydra](http://dydra.com). 
However for larger test cases we run our tests on Apache Jena. In case you have a LUBM on your local/public endpoint, you can load it as well. Please note that the LUBM that we use is _materialized_. The inferencing is **RDFS**. 

* The DBpedia endpoint is a local endpoint, and if you want to test it, please replace it with either the public endpoint or your own local endpoint

* For any other questions or suggestions or concerns with the procedures, please do contact me kannaa@rpi.edu, I'll be glad to include your suggestions.

# Relaxation
Relaxation is the standard baseline for reformulating SPARQL queries. A lot of related work exists on reformulating SPARQL queries. The ideas are inherently based on *flexible* querying. In this notion, the different conditions in the input query are *loosened* or *relaxed* to give more results. This can also be looked at as *exploratory* querying. 

## SPARQL Query Relaxation
Lets have a look at this hierarchy from LUBM on all teaching faculty in a University.
* Employee 
* * Faculty 
* * * Professor 
* * * * VisitingProfessor 
* * * * FullProfessor 
* * * * Dean 
* * * * Chair 
* * * * AssociateProfessor 
* * * * AssistantProfessor 
* * * * PostDoc 
* * * * Lecturer 

Lets look how relaxation helps in getting more answers. In this case consider the following query on LUBM


```Select ?teacher where {
    ?teacher a Lecturer .
} ```

Lets see the results that we get from LUBM

In [1]:
from rdflib import Graph
g = Graph()
g.parse('lubm_saturated.nt',format="nt")
print("Total triple statements in LUBM is " + str(len(g)))

Total triple statements in LUBM is 144299


In [2]:
from SPARQLWrapper import SPARQLWrapper, JSON
from IPython.display import display
import pandas as pd
import json
import numpy as np
from pandas.io.json import json_normalize

pd.options.display.max_colwidth = 100
pd.options.display.max_rows = 999

#Procedure to execute a SPARQL Query and get a pandas object out of it
def execute_query(sparqlQuery,endpoint):
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(sparqlQuery)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    results_df = json_normalize(results["results"]["bindings"])
    return results_df



In [3]:
%%time
query  = """

PREFIX  ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
Select ?teacher where {
    ?teacher a ub:Lecturer .
}

"""
endpoint  = "https://dydra.com/amarviswanathan/lubm/sparql"
results_df = execute_query(query,endpoint)

#Show the top-5 results
results_df.head()

CPU times: user 8 ms, sys: 0 ns, total: 8 ms
Wall time: 540 ms


In [4]:
#Total number of lecturers
shape = results_df.shape
print("There are " + str(shape[0]) + " lecturers")

There are 93 lecturers


But if the user is not satisfied with these 93 lecturers and wants to find more of them, a simple way is to relax by moving up in the hierarchy. So we move from Lecturer to Professor. This gives us the following query : 

``` 
Select ?teacher where {
    ?teacher a ub:Professor .
}
```

In [5]:
%%time 
relaxed_query  = """

PREFIX  ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
Select ?teacher where {
    ?teacher a ub:Professor .
}
"""
results_df = execute_query(relaxed_query,endpoint)

#Show the top-5 results
results_df.head()

CPU times: user 12 ms, sys: 4 ms, total: 16 ms
Wall time: 703 ms


In [6]:
#Total number of lecturers
shape = results_df.shape
print("There are " + str(shape[0]) + " Professors")

There are 447 Professors


The above result gives us 447 Professors, each of whom may be any type under the hierarchy of **Professor**.

## Instance Query 
Let us look at an instance of a **Professor** i.e. **<http://www.Department14.University0.edu/FullProfessor4>** and see what courses this Professor teaches.

In [7]:
%%time 
entity_query = """
PREFIX  ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
Select ?course where {
   <http://www.Department14.University0.edu/FullProfessor4> ub:teacherOf  ?course .
}
"""

results_df = execute_query(entity_query,endpoint)
results_df = results_df[['course.value']]
# Show all the results
results_df

CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 482 ms


So the **FullProfessor4** teaches `2 Graduate Courses` and `1 Course`.  **If** the user decides to want more answers, an automatic way would be to relax the query. Thus the system wouldrelax this entity value. However the entity has **no hierarchy**. Which means this entity ends up being relaxed to a variable. This is known as `simple relaxation`. The relaxed query then becomes 


```Select ?course where {
    ?teacher ub:teacherOf ?course .
} ```


In [8]:
%%time 
entity_relaxed_query = """
PREFIX  ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
Select ?teacher ?course where {
   ?teacher ub:teacherOf  ?course .
}
"""

relax_df = execute_query(entity_relaxed_query,endpoint)

# Show all the results
relax_df
print("The total courses are " + str(relax_df.shape[0]))

relax_df.head()

The total courses are 1627
CPU times: user 64 ms, sys: 4 ms, total: 68 ms
Wall time: 1.31 s


# Motivation
In this case, we end up relaxing the query to find _Anybody who takes any course_. Now this ends up giving `1627 results` and is very _generalized_. While this is logically right, wouldn't it be more beneficial if the system resulted in courses are more similar to what **Professor14** teaches? 


# Goal

To address this issue, we present a technique where we utilize the _entity_ statements present in the graph to suggest reformulations. Let us see how this makes sense. Entities have properties(_predicate_) and values (_object_) in the graph. For example the entity **Professor14** has this triple associated with it

| Subject        | Predicate           | Object  |
| ------------- |:-------------:| -----:|
|**Professor14**|teacherOf|GraduateCourse5|


One could easily utilize these values much more effectively to create _triple patterns_ that can be appended back to the original query. This can then be used to suggest reformulations. We call the **predicate** and **object** value pair as a _feature_ or a _fact_. This is because they provide more contextual information about the entity in the RDF or RDFS graph. Since we utilize these features(facts) to create reformulations, we call our method 
**Feature based reformulation of entities in triple pattern queries**. 

The features provide more _information_ and _context_ about an entity. Let us see how features can be used. To do that we print out the features of **FullProfessor4** from LUBM.

In [9]:
entity_statement_query = """
PREFIX  ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
Select ?p ?o where {
   <http://www.Department14.University0.edu/FullProfessor4> ?p ?o .
}
"""

results_df = execute_query(entity_statement_query,endpoint)

# Show all the results
results_df


Unnamed: 0,o.datatype,o.type,o.value,p.type,p.value
0,,uri,http://www.w3.org/2000/01/rdf-schema#Resource,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
1,,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#FullProfessor,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
2,,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Faculty,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
3,,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Person,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
4,,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Professor,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
5,,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Employee,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
6,http://www.w3.org/2001/XMLSchema#string,literal,FullProfessor4,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#name
7,,uri,http://www.Department14.University0.edu/GraduateCourse5,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf
8,,uri,http://www.Department14.University0.edu/Course4,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf
9,,uri,http://www.Department14.University0.edu/GraduateCourse4,uri,http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf


From the above result we see that the `literal` values don't add more information to the triple except that they are string values for an entity. Morever, they don't have any statements associated with them. So we filter the literal values out first.

In [10]:
results_df = results_df[results_df['o.type'] == 'uri']
results_df = results_df[['p.value','o.value']]
results_df

Unnamed: 0,p.value,o.value
0,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://www.w3.org/2000/01/rdf-schema#Resource
1,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://swat.cse.lehigh.edu/onto/univ-bench.owl#FullProfessor
2,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Faculty
3,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Person
4,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Professor
5,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,http://swat.cse.lehigh.edu/onto/univ-bench.owl#Employee
7,http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf,http://www.Department14.University0.edu/GraduateCourse5
8,http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf,http://www.Department14.University0.edu/Course4
9,http://swat.cse.lehigh.edu/onto/univ-bench.owl#teacherOf,http://www.Department14.University0.edu/GraduateCourse4
10,http://swat.cse.lehigh.edu/onto/univ-bench.owl#undergraduateDegreeFrom,http://www.University214.edu


The element in `row 0`, is too generic because it tags anything as a resource. So instead of that lets eyeball some interesting properties and change our initial query. For the sake of this example I pick 

| Predicate        | Object           |
| ------------- |:-------------:|
|**mastersDegreeFrom**|University912.edu|
|**memberOf**|Department14.University0.edu|

The above two properties say something about the entity **Professor14** i.e. it says that **Professor14** got his `doctoralDegreeFrom University801.edu` and is `memberOf Department14.University0.edu`. To utilize these features in a query, one just has to convert them to a pattern so that it becomes a valid _triple pattern_. This is shown in the table below : 


### Entity Statements


|Entity| Predicate | Value|
|--|--|--|
|**Professor14**| **mastersDegreeFrom**|University912.edu|
|**Professor14**| **memberOf**|Department14.University0.edu|

Replacing the entity **Professor14** with a variable `?x` we get the following :
### Entity Feature Patterns

|Variable|Predicate|Value| 
|--|--|--|
|?x|**mastersDegreeFrom**|University912.edu|
|?x|**memberOf**|Department14.University0.edu|


Now let us pick the first pattern and add it back to the original query. Then lets add the second pattern to the original query independently.  This results in a  reformulated queries that looks like :

```
Select ?course where {
   ?x ub:teacherOf  ?course .
   ?x mastersDegreeFrom University912.edu .
}
```

```
Select ?course where {
   ?x ub:teacherOf  ?course .
   ?x memberOf Department14.University0.edu .
}
```


The inital query was :

* Select courses taught by **FullProfessor4**

Adding the two new features the query becomes :
* Select courses taught by ?x who has a mastersDegreeFrom `University912.edu` 
* Select courses taught by ?x who is a member of `Department14.Univeristy0.edu` .

Both the above queries are more contextual and give precise answers than the initial relaxation which read as 
* Select all courses taught by any teacher

So lets run the reformulation to see the results.

In [11]:
%%time 
entity_reformulated_query = """
PREFIX  ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
Select ?course where {
   ?x ub:teacherOf ?course .
   ?x ub:mastersDegreeFrom <http://www.University912.edu> .
   
}
"""

ref_1 = execute_query(entity_reformulated_query,endpoint)

# Show all the results
print("The number of courses now is " + str(ref_1.shape[0]))
ref_1.head()

The number of courses now is 5
CPU times: user 4 ms, sys: 4 ms, total: 8 ms
Wall time: 457 ms


In [12]:
%%time 
entity_reformulated_query = """
PREFIX  ub: <http://swat.cse.lehigh.edu/onto/univ-bench.owl#>
Select ?course where {
   ?x ub:teacherOf ?course .
   ?x ub:memberOf <http://www.Department14.University0.edu> .
   
}
"""

ref_2 = execute_query(entity_reformulated_query,endpoint)

# Show all the results
print("The number of courses now is " + str(ref_2.shape[0]))
ref_2.head()

The number of courses now is 97
CPU times: user 8 ms, sys: 0 ns, total: 8 ms
Wall time: 591 ms


Now lets find the common results between the two reformulations

In [13]:
overall_ref = pd.merge(ref_1,ref_2,how='inner',on=['course.type','course.value'])
overall_ref

Unnamed: 0,course.type,course.value
0,uri,http://www.Department14.University0.edu/GraduateCourse5
1,uri,http://www.Department14.University0.edu/Course4
2,uri,http://www.Department14.University0.edu/GraduateCourse4


Lets find the results that are not common between the two reformulations.



In [14]:
merged_df = pd.concat([ref_1,ref_2])
merged_df  = merged_df.drop_duplicates(keep=False)
merged_df.shape[0]

96

So we now have the following comparisons between relaxation and reformulation. There are a total of `96 + 3 = 99 ` unique results out of the reformulation procedure, whereas the relaxation has a total of `1627` results. The `99` results are more related to the `?course` variable of the original query because of using the features.

### Results Comparison
| Original Query        | **Relaxation**           | Ref-1 | Ref-2 | Combined |
| ------------- |:-------------:|:--------:|:-------:|:---------|
|3|1627|5|97|99|

### Time Comparison
 | **Relaxation**           | Ref-1 | Ref-2 | 
|:-------------:|:--------:|:-------:|
|1310ms|457ms|591ms|

* Clearly the **relaxation results are higher, whereas the reformulation results are lesser**. This makes this kind of reformulation more precise.
* In addition the time calculation shows that the **resulting reformulations also run in lesser time than the relaxed version of the query**.

This can be visualized as 

![results](files/images/Chart.png)


You can access this visualization at the [jsfiddle](http://jsfiddle.net/N00bsie/hk6vjz4o/)

## Selecting Interesting Features

In the previous sections, we focused on the aspect of showing that an entity reformulation is more **contextual** and **precise** than an _entity relaxation_. However the following question remains :
* **Why do you need to pick subsets of features from the graph?**
* **How do you select relevant _features_ for an entity?** 
* **Entities in large graphs have a lot of features. How many do you select?**

## DBpedia Example

Let us take an entity query involving `dbr:Martin_Scorsese`.  Assuming that initially the user is interested in looking at movies made by people similar to `dbr:Martin_Scorsese` and he starts out by making a query about the _movies made by_ `dbr:Martin_Scorsese`. Lets run it on the DBpedia end point to see what answers we get.



In [15]:
dbpedia_query ="""
PREFIX  schema: <http://schema.org/>
PREFIX  dbr:  <http://dbpedia.org/resource/>
PREFIX  umbel-rc: <http://umbel.org/umbel/rc/>
PREFIX  dbc:  <http://dbpedia.org/resource/Category:>
PREFIX  owl:  <http://www.w3.org/2002/07/owl#>
PREFIX  yago: <http://dbpedia.org/class/yago/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  dbo:  <http://dbpedia.org/ontology/>
PREFIX  dbp:  <http://dbpedia.org/property/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dcterms: <http://purl.org/dc/terms/>
PREFIX  dbpedia-wikidata: <http://wikidata.dbpedia.org/resource/>
PREFIX  dul:  <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  dc:   <http://purl.org/dc/elements/1.1/>
PREFIX prov: <http://www.w3.org/ns/prov#>

SELECT  Distinct ?movie 
WHERE
  { 
    
     ?movie dbo:director dbr:Martin_Scorsese .
  }

"""
dbpedia_endpoint = "http://zen.cs.rpi.edu:8890/sparql"
movies_df = execute_query(dbpedia_query,dbpedia_endpoint)
print("The movies directed by dbr:Martin_Scorsese are :")
movies_df

The movies directed by dbr:Martin_Scorsese are :


Unnamed: 0,movie.type,movie.value
0,uri,http://dbpedia.org/resource/Made_in_Milan
1,uri,http://dbpedia.org/resource/The_Key_to_Reserva
2,uri,http://dbpedia.org/resource/Mean_Streets
3,uri,http://dbpedia.org/resource/Raging_Bull
4,uri,http://dbpedia.org/resource/Taxi_Driver
5,uri,http://dbpedia.org/resource/Alice_Doesn't_Live_Here_Anymore
6,uri,http://dbpedia.org/resource/The_King_of_Comedy_(1983_film)
7,uri,http://dbpedia.org/resource/Boxcar_Bertha
8,uri,http://dbpedia.org/resource/Who's_That_Knocking_at_My_Door
9,uri,http://dbpedia.org/resource/My_Voyage_to_Italy


In [16]:
print("dbr:Martin_Scorsese has directed a total of " + str(movies_df.shape[0]) + " movies")

dbr:Martin_Scorsese has directed a total of 45 movies


As before lets relaxing `dbr:Martin_Scorsese` will lead to to the following query 

```
SELECT  Distinct ?movie 
WHERE
  { 
    
     ?movie dbo:director ?y .
  }
```
Let us execute this query on DBpedia to see how many movies we get



In [17]:
%%time 
dbpedia_query ="""
PREFIX  schema: <http://schema.org/>
PREFIX  dbr:  <http://dbpedia.org/resource/>
PREFIX  umbel-rc: <http://umbel.org/umbel/rc/>
PREFIX  dbc:  <http://dbpedia.org/resource/Category:>
PREFIX  owl:  <http://www.w3.org/2002/07/owl#>
PREFIX  yago: <http://dbpedia.org/class/yago/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  dbo:  <http://dbpedia.org/ontology/>
PREFIX  dbp:  <http://dbpedia.org/property/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dcterms: <http://purl.org/dc/terms/>
PREFIX  dbpedia-wikidata: <http://wikidata.dbpedia.org/resource/>
PREFIX  dul:  <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  dc:   <http://purl.org/dc/elements/1.1/>
PREFIX prov: <http://www.w3.org/ns/prov#>

SELECT  Distinct ?movie 
WHERE
  { 
    
     ?movie dbo:director ?y .
  }

"""
dbpedia_endpoint = "http://zen.cs.rpi.edu:8890/sparql"
movies_df = execute_query(dbpedia_query,dbpedia_endpoint)
print("The toal movies directed by anybody are : " + str(movies_df.shape[0]))


The toal movies directed by anybody are : 103317
CPU times: user 2.44 s, sys: 60 ms, total: 2.5 s
Wall time: 5.01 s


Looking at the results, you would realize that not all the results are movies. For example the entity `http://dbpedia.org/resource/Indianapolis_Art_Center
n` at row index `102808` is not a movie, but rather an organization.

In [18]:
print(movies_df.loc[102808])

movie.type                                                     uri
movie.value    http://dbpedia.org/resource/Indianapolis_Art_Center
Name: 102808, dtype: object


This happens because the query is very generalized because of the _relaxation_. To address this we need to execute queries that are contextually similar to the original query.  Following what was discussed earlier, we start by looking at the properties of the entity `dbr:Martin_Scorsese`.

In [19]:
%%time 
dbpedia_entity_feature_query ="""
PREFIX  schema: <http://schema.org/>
PREFIX  dbr:  <http://dbpedia.org/resource/>
PREFIX  umbel-rc: <http://umbel.org/umbel/rc/>
PREFIX  dbc:  <http://dbpedia.org/resource/Category:>
PREFIX  owl:  <http://www.w3.org/2002/07/owl#>
PREFIX  yago: <http://dbpedia.org/class/yago/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  dbo:  <http://dbpedia.org/ontology/>
PREFIX  dbp:  <http://dbpedia.org/property/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dcterms: <http://purl.org/dc/terms/>
PREFIX  dbpedia-wikidata: <http://wikidata.dbpedia.org/resource/>
PREFIX  dul:  <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  dc:   <http://purl.org/dc/elements/1.1/>
PREFIX prov: <http://www.w3.org/ns/prov#>

SELECT ?p ?o 
WHERE
  { 
    
     dbr:Martin_Scorsese ?p ?o .
  }
"""
dbpedia_endpoint = "http://zen.cs.rpi.edu:8890/sparql"
entity_features = execute_query(dbpedia_entity_feature_query,dbpedia_endpoint)
display(entity_features)

Unnamed: 0,o.datatype,o.type,o.value,o.xml:lang,p.type,p.value
0,,uri,http://www.w3.org/2002/07/owl#Thing,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
1,,uri,http://xmlns.com/foaf/0.1/Person,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
2,,uri,http://schema.org/Person,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
3,,uri,http://dbpedia.org/ontology/Person,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
4,,uri,http://dbpedia.org/ontology/Agent,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
5,,uri,http://www.wikidata.org/entity/Q215627,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
6,,uri,http://www.wikidata.org/entity/Q5,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
7,,uri,http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
8,,uri,http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#NaturalPerson,,uri,http://www.w3.org/1999/02/22-rdf-syntax-ns#type
9,,literal,Martin Scorsese,en,uri,http://www.w3.org/2000/01/rdf-schema#label


CPU times: user 108 ms, sys: 4 ms, total: 112 ms
Wall time: 124 ms


There are a total of `172` features/facts for the entity `dbr:Martin_Scorsese`. Now how do we pick features/facts to suggest entity reformulation?

Let us formalize the problem with the following definitions :

## Entity Features Summarization

### Entity Facts
Let $\mathbb{E}$ be the set of entities in the RDF Graph. Then a **fact** or a **feature** $f_i$  of an entity  e $\in \mathbb{E}$, is the $\langle p, o \rangle$ pair from the triple statement $\langle e, p, o \rangle$. 

|Entity|Predicate|Object| 
|--|--|--|
|**dbr:Martin_Scorsese**|**http://dbpedia.org/property/parents ** | http://dbpedia.org/resource/Charles_Scorsese	|

In the above table for the entity `dbr:Martin_Scorsese` the feature $\langle p, o \rangle$ is 

|Predicate|Object| 
|--|--|
|**http://dbpedia.org/property/parents** | http://dbpedia.org/resource/Charles_Scorsese	|

### Entity Fact Set
Now entities in large graphs like DBpedia, have a lot of facts associated with them. We can denote this by the fact set of an entity $FS(e)$. The total number for `dbr:Martin_Scorsese` in this setting is `172` as we have seen.


### Optimal Feature Set
While SPARQL queries can handle a lot of feature patterns, anything more than 10, would not be easy to interpret for the user. In fact heuristically most users prefer around `1-3` patterns. This is shown by the study done in in the **L**inked **S**PARQL **Q**ueries dataset. Here is how they look

![LSQ](files/images/LSQ.png)

So to pick a subset of the large feature patterns, we define a **Entity Summary**. 

### Entity Summary
Given an entity $e$ from a knowledge graph and a positive integer $k$, we define an entity summary to be $k$ $\leq \mid FS(e) \mid $, entity summary of $e$, $\textit{Summ(e,k)} \subseteq \textit{FS(e)}$ such that $\mid\textit{Summ(e,k)}\mid = k$. For the example `dbr:Martin_Scorsese` we have 
* $\mid FS(e) \mid $ = `172`

For $k$ = 2, here the entity summary would look like

| Predicate |Object| 
|--|--|
| **http://dbpedia.org/property/parents**    | http://dbpedia.org/resource/Charles_Scorsese	|
| **http://dbpedia.org/ontology/birthPlace** | http://dbpedia.org/resource/Queens		|

Now this entity summary can be converted to an entity summary pattern. The following table shows the entity summary pattern from the summary

|Variable|Predicate|Object| 
|--|--|--|
|?e|**http://dbpedia.org/property/parents** | http://dbpedia.org/resource/Charles_Scorsese	|
|?e|**http://dbpedia.org/ontology/birthPlace** | http://dbpedia.org/resource/Queens		|

In plain English, the $k=2$ summary can be stated as _Entity Martin Scorsese who has parents Charles Scorsese and birth place Queens_. 

The summary pattern can be stated as _Any entity ?e that has parents Charles Scorsese and birth place Queens_.

### Picking Summary Sets

In the $k=2$ summary pattern for `dbr:Martin_Scorsese` we see that `parent, Charles Scorsese` is too specific, because very few entities can have `Charles Scorsese` as parent. Similarly `birthPlace, Queens` is more generic becaue many entities can have `Queens` as a birth place. Ideally one should be able to pick features for reformulation that 
* Is not too specific because that would end up not giving more results. This defeats the purpose of reformulation.
* Is not too generic because that would give up a lot of results and not make it any better than relaxation.

Ideally we want a function for the facts  $\left\lbrace f_1,f_2,\ldots f_k \right\rbrace $ in the fact set $\textit{FS(e)}$ so that they are ranked _ideally_ with a _ranking_ function such that $Rank(f_1) > Rank(f_2) > \ldots Rank(f_k)$. This _ranking function_ should be able to address the concerns of not being too _specifc_ or being too _generic_. We discuss how this can be achieved now.


### Ranking Entity Facts 

We introduce two measures that can be combined to produce a _ranking_ to give a summary. While there can be many ranking functions that can be developed, our goal is to first start with a ranking function to solve the entity reformulation problem. We will then explore further how we can design better ranking functions. So the two measures that we define are 
####  Specificity : 
To pick features that convey _interesting_ information about the entity $e$ from the feature set $FS(e)$. For example `birthPlace, Queens` is not as interesting as `almaMater, New York University`. Borrowing from the world of Information Retrieval, we define _specificity_ of a feature $f_i$ based on **IDF** as follows :


$$ \begin{align} 
\mbox{Specificity($f_i$)} = \log {\frac{\mid E \mid }{\mid e \mid \exists p,o:\langle e,p,o \rangle \in R\mid }}
\end{align} $$

Github sometimes doesn't render this so the image is here : ![specificity](files/images/specificity.png)
Here $\mid E \mid$ refers to the total number of entity resources in the knowledge graph R. 

**Note:** For this paper we utilize only entity resources, so we filter out any feature $f_i$ that has literal values.

Let us now apply this to `dbr:Martin_Scorsese` to see the kind of features we get.

In [20]:
endpoint = "http://zen.cs.rpi.edu:8890/sparql"

def specificity(column):
    return np.log(4641890/(float(column)))

def execute_augmented_query(sparqlQuery,endpoint):
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(sparqlQuery)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    results_df = json_normalize(results["results"]["bindings"])
#     print(results_df)
    results_df = results_df[results_df['o.type'] == "uri"]
    results_df = results_df.drop(['o.type','p.type'],axis=1)
    results_df = results_df[['p.value','o.value','countS.value']]
    results_df['countS.value'] = results_df['countS.value'].astype(int)
#     results_df = results_df[results_df['countS.value']]
    return results_df

## This query is used to find the count values for the 'Specificity'.
augmented_query = """
PREFIX  schema: <http://schema.org/>
PREFIX  dbr:  <http://dbpedia.org/resource/>
PREFIX  umbel-rc: <http://umbel.org/umbel/rc/>
PREFIX  dbc:  <http://dbpedia.org/resource/Category:>
PREFIX  owl:  <http://www.w3.org/2002/07/owl#>
PREFIX  yago: <http://dbpedia.org/class/yago/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  dbo:  <http://dbpedia.org/ontology/>
PREFIX  dbp:  <http://dbpedia.org/property/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dcterms: <http://purl.org/dc/terms/>
PREFIX  dbpedia-wikidata: <http://wikidata.dbpedia.org/resource/>
PREFIX  dul:  <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  dc:   <http://purl.org/dc/elements/1.1/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX  dc: <http://purl.org/dc/elements/1.1/>

SELECT  ?p ?o (COUNT(DISTINCT ?s) AS ?countS)
WHERE
  { dbr:Martin_Scorsese
              ?p  ?o .
    ?s        ?p  ?o
    FILTER ( ! strstarts(str(?p), str(dbo:wikiPageWikiLink)) )
    FILTER ( ! strstarts(str(?p), str(dbo:wikiPageID)) )
    FILTER ( ! strstarts(str(?p), str(dbo:wikiPageExternalLink)) )
    FILTER ( ! strstarts(str(?p), str(dbo:wikiPageRevisionID)) )
    FILTER ( ! strstarts(str(?p), str(dbo:wikiPageLength)) )
    FILTER ( ! strstarts(str(?p), str(owl:sameAs)) )
    FILTER ( ! strstarts(str(?p), str(dbo:viafId)) )
    FILTER ( ! strstarts(str(?p), str(dbo:wikiPageOutDegree)) )
    FILTER ( ! strstarts(str(?p), str(schema:comment)) )
    FILTER ( ! strstarts(str(?p), str(dbo:abstract)) )
    FILTER ( ! strstarts(str(?p), str(rdfs:comment)) )
    FILTER ( ! strstarts(str(?p), str(dbo:alias)) )
    FILTER ( ! strstarts(str(?p), str(rdfs:label)) )
    FILTER ( ! strstarts(str(?p), str(dbo:thumbnail)) )
    FILTER ( ! strstarts(str(?p), str(foaf:name)) )
    FILTER ( ! strstarts(str(?p), str(foaf:surname)) )
    FILTER ( ! strstarts(str(?p), str(foaf:depiction)) )
    FILTER ( ! strstarts(str(?p), str(foaf:isPrimaryTopicOf)) )
    FILTER ( ! strstarts(str(?p), str(dbp:hasPhotoCollection)) )
    FILTER ( ! strstarts(str(?p), str(dbp:wordnet_type)) )
    FILTER ( ! strstarts(str(?p), str(prov:wasDerivedFrom)) )
    FILTER ( ! strstarts(str(?p), str(dc:description)) )
   
    
    
  }
GROUP BY ?p ?o
ORDER BY ASC(?countS)
"""
results_df = execute_augmented_query(augmented_query,endpoint)
results_df.reset_index(inplace=True)
results_df['Specificity'] = results_df['countS.value'].apply(specificity)
results_df.sort_values(by=['Specificity'],ascending=False,inplace=True)
display(results_df)

Unnamed: 0,index,p.value,o.value,countS.value,Specificity
0,0,http://dbpedia.org/ontology/spouse,http://dbpedia.org/resource/Julia_Cameron,1,15.350632
2,6,http://dbpedia.org/ontology/parent,http://dbpedia.org/resource/Catherine_Scorsese,1,15.350632
3,7,http://dbpedia.org/property/spouse,http://dbpedia.org/resource/Isabella_Rossellini,1,15.350632
4,9,http://dbpedia.org/ontology/spouse,http://dbpedia.org/resource/Isabella_Rossellini,1,15.350632
5,10,http://dbpedia.org/property/spouse,http://dbpedia.org/resource/Julia_Cameron,1,15.350632
6,11,http://dbpedia.org/property/spouse,http://dbpedia.org/resource/Barbara_De_Fina,1,15.350632
7,12,http://dbpedia.org/ontology/occupation,http://dbpedia.org/resource/Martin_Scorsese__1,1,15.350632
8,15,http://dbpedia.org/property/parents,http://dbpedia.org/resource/Charles_Scorsese,1,15.350632
9,18,http://dbpedia.org/ontology/spouse,http://dbpedia.org/resource/Barbara_De_Fina,1,15.350632
10,21,http://dbpedia.org/property/parents,http://dbpedia.org/resource/Catherine_Scorsese,1,15.350632


In the above table we see from the column **countS.value** that properties like `http://dbpedia.org/ontology/spouse	` and `http://dbpedia.org/ontology/parents` are very specific and using them to reformulate will not lead us to any new results. For example let us use `http://dbpedia.org/ontology/spouse, http://dbpedia.org/resource/Barbara_De_Fina` to reformulate. The triple pattern for this becomes 
`?e http://dbpedia.org/ontology/spouse http://dbpedia.org/resource/Barbara_De_Fina`. Appending this to the original query we	get the following query 
```
Select 
?movie dbo:director ?e
?e http://dbpedia.org/ontology/spouse http://dbpedia.org/resource/Barbara_De_Fina .	
```

The query translates in plain english as _Select all movies that are directed by an entity that has spouse Barbara De Fina_. This kind of a feature is too specific for just `dbr:Martin_Scorsese`. So we would need some other measure to pick features that are related to the entity and give more results, but aren't too generic. In addition we need to remove features that have **countS.value** = 1.

### Popularity

Since it is the value i.e. the $o$ in $\langle p, o \rangle$ that contributes to being selective, we consider the notion of _popularity_, which is inspired by **TF** or is a weighting term. This utilizes the value $o_i$ frequency of a feature $f_i$ which is nothing but the tuple $\langle p_i, o_i \rangle$ and can be defined as :
$$
\begin{align}
\mbox{Popularity($o_i$)}  = \log {\mid t \mid \exists o_i:\langle e,p,o_i \rangle \in R\mid }
\end{align}
$$

If the formula hasn't been rendered here is the image ![popularity](files/images/popularity.png)
### Combined Ranking for Features

Now that we have _popularity_ and _specificity_ we combine them to form the combined ranking for a feature $f_i$. So the ranking function for a feature $f_i = \langle p_i, o_i \rangle$  can be defined as :

$$
\begin{align}
\mbox{Rank($f_i$)}  = Specificity(f_i) * Popularity(o_i)
\end{align}
$$
If the formula hasn't been rendered here is the image ![rank](files/images/rank.png)
Let us apply this and see in real time how this changes the object values.

In [21]:
import math
def nbycount(column):
    return 4641890/(1+float(column))

def execute_count_query(obj):
    if(obj.startswith('http')):
        sparqlQuery = "Select (count(*) as ?countRow)  WHERE { ?s " + "?p  <" + obj + "> }"       
    else:
        sparqlQuery = 'Select (count(*) as ?countRow)  WHERE { ?s ' + '?p "' + obj + '"}'
        
    sparql = SPARQLWrapper("http://zen.cs.rpi.edu:8890/sparql")
    sparql.setQuery(sparqlQuery)
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()
    results_df = json_normalize(results["results"]["bindings"])
    val = float(results_df['countRow.value'])
    return val
results_df = results_df[results_df['countS.value'] > 1]
results_df['Popularity'] = results_df['o.value'].apply(execute_count_query)
results_df['Popularity'] = results_df['Popularity'].apply(math.log)
results_df.sort_values(by=['Popularity'],ascending=True,inplace=True)
display(results_df)

Unnamed: 0,index,p.value,o.value,countS.value,Specificity,Popularity
11,26,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Venice_Best_Director_Silver_Lion_winners,19,12.406193,2.944439
12,29,http://purl.org/dc/terms/subject,"http://dbpedia.org/resource/Category:People_from_Corona,_Queens",24,12.172578,3.178054
13,31,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Best_Director_BAFTA_Award_winners,38,11.713046,3.637586
14,32,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:David_di_Donatello_Career_Award_winners,45,11.54397,3.806662
15,33,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:American_film_directors_of_Italian_descent,52,11.399388,3.951244
16,34,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Best_Director_Golden_Globe_winners,52,11.399388,3.970292
17,35,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Cecil_B._DeMille_Award_Golden_Globe_winners,62,11.223498,4.127134
18,37,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Best_Director_Academy_Award_winners,68,11.131124,4.234107
20,46,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:César_Award_winners,137,10.430651,4.983607
21,48,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Film_theorists,155,10.307207,5.043425


Clearly from the above results, we see that the most popular object values are ranked higher. Let us now use this to determine the combined ranking and then pick features:

In [22]:
results_df['Rank'] = results_df['Specificity'] * results_df['Popularity']
results_df.sort_values(by=['Rank'],ascending=False,inplace=True)
display(results_df)

Unnamed: 0,index,p.value,o.value,countS.value,Specificity,Popularity,Rank
33,64,http://dbpedia.org/ontology/birthPlace,http://dbpedia.org/resource/Queens,595,8.962071,7.755767,69.507734
29,58,http://dbpedia.org/property/almaMater,http://dbpedia.org/resource/New_York_University,498,9.140032,7.423568,67.851654
48,91,http://dbpedia.org/ontology/birthPlace,http://dbpedia.org/resource/New_York,8078,6.353733,10.642826,67.621672
32,63,http://dbpedia.org/ontology/almaMater,http://dbpedia.org/resource/New_York_University,581,8.985881,7.423568,66.707306
19,38,http://dbpedia.org/ontology/birthPlace,"http://dbpedia.org/resource/Flushing,_Queens",88,10.873295,6.011267,65.362283
42,82,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:American_Roman_Catholics,2393,7.570329,7.787382,58.953045
40,78,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:New_York_Democrats,1966,7.766876,7.584773,58.909991
41,81,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Grammy_Award_winners,2389,7.572002,7.779467,58.90614
39,77,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Fellows_of_the_American_Academy_of_Arts_and_Sciences,1864,7.820152,7.531016,58.893694
38,74,http://purl.org/dc/terms/subject,http://dbpedia.org/resource/Category:Male_actors_from_New_York_City,1454,8.068559,7.282761,58.761385


Although some properties are repeated, because of the nature of DBpedia, we see that features like `birthPlace, Queens` and `almaMater New_York_University` are more representative of the entities similar to `dbr:Martin_Scorsese`. Let us now pick the top 2 features to create reformulate the original query as :
```
Select Distinct movie where {
?movie dbo:director ?e
?e http://dbpedia.org/ontology/birthPlace http://dbpedia.org/resource/Queens	 .	
?e http://dbpedia.org/property/almaMater  http://dbpedia.org/resource/New_York_University .
}
```

Now running it to see the results we get :

In [23]:
%%time 
dbpedia_query ="""
PREFIX  schema: <http://schema.org/>
PREFIX  dbr:  <http://dbpedia.org/resource/>
PREFIX  umbel-rc: <http://umbel.org/umbel/rc/>
PREFIX  dbc:  <http://dbpedia.org/resource/Category:>
PREFIX  owl:  <http://www.w3.org/2002/07/owl#>
PREFIX  yago: <http://dbpedia.org/class/yago/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  dbo:  <http://dbpedia.org/ontology/>
PREFIX  dbp:  <http://dbpedia.org/property/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dcterms: <http://purl.org/dc/terms/>
PREFIX  dbpedia-wikidata: <http://wikidata.dbpedia.org/resource/>
PREFIX  dul:  <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  dc:   <http://purl.org/dc/elements/1.1/>
PREFIX prov: <http://www.w3.org/ns/prov#>

Select Distinct ?movie ?e where {
?movie dbo:director ?e .
?e <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/New_York>     .    
?e <http://dbpedia.org/property/almaMater>  <http://dbpedia.org/resource/New_York_University> .
}
"""
dbpedia_endpoint = "http://zen.cs.rpi.edu:8890/sparql"
movies_df = execute_query(dbpedia_query,dbpedia_endpoint)
print("The toal movies directed by anybody are : " + str(movies_df.shape[0]))
display(movies_df)

The toal movies directed by anybody are : 53


Unnamed: 0,e.type,e.value,movie.type,movie.value
0,uri,http://dbpedia.org/resource/Burt_Lancaster,uri,http://dbpedia.org/resource/The_Kentuckian
1,uri,http://dbpedia.org/resource/Burt_Lancaster,uri,http://dbpedia.org/resource/The_Midnight_Man_(1974_film)
2,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Made_in_Milan
3,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/The_Key_to_Reserva
4,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Mean_Streets
5,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Raging_Bull
6,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Taxi_Driver
7,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Alice_Doesn't_Live_Here_Anymore
8,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/The_King_of_Comedy_(1983_film)
9,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Boxcar_Bertha


CPU times: user 48 ms, sys: 4 ms, total: 52 ms
Wall time: 53.4 ms


Now in the above result you see 53 movies which other people have also directed. Thus, you  have other entities like `dbr:Herbert_B.Leonard` and `dbr:Jesse_Dylan` and `dbr:Burt_Lancaster`, who are both directors from `New_York` and are also _film producers_. 

Let us also try another query using the other ranked properties i.e. 

```
Select Distinct ?movie ?e where {
?movie dbo:director ?e .
?e <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:American_Roman_Catholics> .
?e <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:New_York_Democrats> .
}
```

**Note** : We have already used the `birthPlace` and `almaMater`. So we use the other properties to explicitly be diverse in results.


In [24]:
%%time 
dbpedia_query ="""
PREFIX  schema: <http://schema.org/>
PREFIX  dbr:  <http://dbpedia.org/resource/>
PREFIX  umbel-rc: <http://umbel.org/umbel/rc/>
PREFIX  dbc:  <http://dbpedia.org/resource/Category:>
PREFIX  owl:  <http://www.w3.org/2002/07/owl#>
PREFIX  yago: <http://dbpedia.org/class/yago/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  dbo:  <http://dbpedia.org/ontology/>
PREFIX  dbp:  <http://dbpedia.org/property/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dcterms: <http://purl.org/dc/terms/>
PREFIX  dbpedia-wikidata: <http://wikidata.dbpedia.org/resource/>
PREFIX  dul:  <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>
PREFIX  dc:   <http://purl.org/dc/elements/1.1/>
PREFIX prov: <http://www.w3.org/ns/prov#>

Select Distinct ?movie ?e where {
?movie dbo:director ?e .
?e <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:American_Roman_Catholics> .
?e <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:New_York_Democrats> .
}
"""
dbpedia_endpoint = "http://zen.cs.rpi.edu:8890/sparql"
movies_df = execute_query(dbpedia_query,dbpedia_endpoint)
print("The toal movies directed by anybody are : " + str(movies_df.shape[0]))
display(movies_df)

The toal movies directed by anybody are : 49


Unnamed: 0,e.type,e.value,movie.type,movie.value
0,uri,http://dbpedia.org/resource/James_Cagney,uri,http://dbpedia.org/resource/Short_Cut_to_Hell
1,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Made_in_Milan
2,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/The_Key_to_Reserva
3,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Mean_Streets
4,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Raging_Bull
5,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Taxi_Driver
6,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Alice_Doesn't_Live_Here_Anymore
7,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/The_King_of_Comedy_(1983_film)
8,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Boxcar_Bertha
9,uri,http://dbpedia.org/resource/Martin_Scorsese,uri,http://dbpedia.org/resource/Who's_That_Knocking_at_My_Door


CPU times: user 28 ms, sys: 8 ms, total: 36 ms
Wall time: 37 ms


Now in the above result you see 49 movies which other people have also directed. Thus, you  have other entities like `dbr:Robert_De_Niro` and `dbr:Alec_Baldwin`, who are related to `dbr:Martin_Scorsese`. 

So we now have the following comparisons between relaxation and reformulation. There are a total of `45 + 4 + 8 = 57 ` unique results out of the reformulation procedure, whereas the relaxation has a total of `102808` results. The `57` results are more related to the `?movie` variable of the original query because of using the _contextualized_ features.

### Results Comparison
| Original Query        | **Relaxation**           | Ref-1 | Ref-2 | Combined |
| ------------- |:-------------:|:--------:|:-------:|:---------|
|45|102808|49|53|57|

### Time Comparison
 | **Relaxation**           | Ref-1 | Ref-2 | 
|:-------------:|:--------:|:-------:|
|5010ms|53.4ms|39ms|

* Clearly the **relaxation results are higher, whereas the reformulation results are lesser**. This makes this kind of reformulation more precise.
* In addition the time calculation shows that the **resulting reformulations also run in lesser time than the relaxed version of the query**.

![results](files/images/Scorsese_chart.png)

You can access the [jsfiddle](http://jsfiddle.net/N00bsie/dynpyrLq/3/)

# Conclusions

From the above results, we believe that this technique can be used to _contextualize_ entities and be used to reformulate entities in entity queries instead of relaxation.