### Libraries

In [1]:
from rdflib import Graph
from rdflib import URIRef, BNode, Literal
from rdflib import Namespace
from rdflib.namespace import OWL, RDF, RDFS, FOAF, XSD
from rdflib.util import guess_format
import pandas as pd

### Initializations

In [5]:
g = Graph()
        
#Example namespace for this lab
lab3_ns_str= "http://www.semanticweb.org/ernesto/in3067-inm713/lab3/"
        
#Special namspaces class to create directly URIRefs in python.           
lab3 = Namespace(lab3_ns_str)
        
#Prefixes for the serialization
g.bind("lab3", lab3)
       
#Load data in dataframe  
file="../lab3_companies_file.csv"
data_frame = pd.read_csv(file, sep=',', quotechar='"',escapechar="\\")

This solution assumes the manual or automatic mapping of the CSV file to a KG like DBPedia
Such that:
- Column 0 elements are of type https://dbpedia.org/ontology/Company
- Column 2 elements are of type https://dbpedia.org/ontology/City
- Columns 0 and 1 are related via the predicate https://dbpedia.org/ontology/foundingYear
- Columns 0 and 2 are related via the predicate https://dbpedia.org/ontology/headquarter

The KG also contains the following entities that can be reused from the KG:
- http://dbpedia.org/resource/Oxford
- http://dbpedia.org/resource/London
- http://dbpedia.org/resource/DeepMind
- http://dbpedia.org/resource/Oxbotica               

**Manual mapping**. Tip: google the entity name + dbpedia: e.g. "Oxford DBpedia" and get the URI from the suggested DBPedia page.

**Automatic mapping**: Typically using a fuzzy search (aka look-up) over the KG.
        
In this lab I am just creating a very small dictionary with entities (to be used as a very basic look-up). In Week 5 we will use DBPedia look-up service that provides a fuzzy search functionality.

In [7]:
stringToURI = dict()
stringToURI["oxford"]="http://dbpedia.org/resource/Oxford"
stringToURI["london"]="http://dbpedia.org/resource/London"
stringToURI["deepmind"]="http://dbpedia.org/resource/DeepMind"
stringToURI["oxbotica"]="http://dbpedia.org/resource/Oxbotica"

### Creates DBPedia namespaces

In [9]:
#DBPedia namspaces
dbo = Namespace("http://dbpedia.org/ontology/")        
dbr = Namespace("http://dbpedia.org/resource/")
       
#Prefixes
g.bind("dbo", dbo)        
#Alternative: g.bind("dbo", "http://dbpedia.org/ontology/")        
g.bind("dbr", dbr)    
#We can the use  as entities: dbo.Company or URIRef(http://dbpedia.org/ontology/)


### Iterates over the data frame and creates triples

In [12]:
def cellToURI(cell_name):
    if cell_name.lower() in stringToURI:  #Is cell in dictionary
        return stringToURI[cell_name.lower()]
    else:
        return lab3_ns_str + cell_name

#Format csv file        
#0         1               2
#"Company","Founding year","Headquarters"                        
for row in data_frame.itertuples(index=False):
    
    #We check if entity in our small local dictionary 
    col0_entity = URIRef(cellToURI(row[0]))
    col2_entity = URIRef(cellToURI(row[2]))
            
    #Year column
    col1_literal = Literal(row[1], datatype=XSD.gYear)
            
    # We create types
    g.add((col0_entity, RDF.type, dbo.Company))
    g.add((col2_entity, RDF.type, dbo.City))
            
    #Relationship between col0 and col2
    g.add((col0_entity, dbo.headquarter, col2_entity))
            
    #Relationship between col0 and col1
    g.add((col0_entity, dbo.foundingYear, col1_literal))

### Saves graph

In [14]:
print("Saving graph to 'Solution_Task3.5_rdflib.ttl':")
    
#print(g.serialize(format="turtle").decode("utf-8"))    
g.serialize(destination='Solution_Task3.5_rdflib.ttl', format='ttl')

Saving graph to 'Solution_Task3.5_rdflib.ttl':
