<a href="https://colab.research.google.com/github/dlupu0/HLTB-unipd/blob/main/HLTB_RDF_Creator_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HowLongToBeat RDF Creator

We load the generated CSV files and we serialize all the data into ***turtle format  (TTL)*** relying on ***RDFLib*** Python library.

## Setup

We import all the necessary libraries and we set the paths to the input/output files. In particular, we create a TTL file for each type of data.

In [None]:
# Imports
import os
from pathlib import Path
import pandas as pd

# Load the required libraries
from rdflib import Graph, Literal, RDF, URIRef, Namespace

# RDFLib knows about some namespaces, like XSD
from rdflib.namespace import XSD

In [None]:
absPath = str(Path(os.path.abspath(os.getcwd())).absolute())
datasetsPath = os.path.join(absPath, "cleaned_datasets")
rdfPath = os.path.join(absPath, "rdf")

# Create dataset directory if not exists
if not os.path.exists(datasetsPath):
    os.mkdir(datasetsPath)

# Create RDF directory if not exists
if not os.path.exists(rdfPath):
    os.mkdir(rdfPath)

# Setup datasets paths
gamesPath = os.path.join(datasetsPath, "games_cleaned.csv")
vgchartzPath = os.path.join(datasetsPath, "vgchartz_cleaned.csv")

# Countries-Regions path
countriesRegionsPath = os.path.join(datasetsPath, "countries-regions.csv")

# Setup Turtle paths
genresTTLPath = os.path.join(rdfPath, "genres.ttl")
gamesTTLPath = os.path.join(rdfPath, "games.ttl")


In [None]:
# Country Ontology
CNS = Namespace("http://eulersharp.sourceforge.net/2003/03swap/countries#")

# HLTB Ontology
HLTB = Namespace("http://www.semanticweb.org/enrico/ontologies/2022/10/HLTB-db2unipd#")

In [None]:
def createGraph():
    # Create the graph
    g = Graph()

    # Bind the namespaces to a prefix for more readable output
    g.bind("xsd", XSD)
    g.bind("countries", CNS)
    g.bind("hltb", HLTB)

    return g

## Serialization

We serialize the data according to the following workflow:

1. Load the CSV file and iterate through it
2. Create a unique ID by ourself based on the name of the class.
3. Add the node to the graph using the unique ID.
4. Add all the data properties.
5. Add all the object properties.
6. Serialize the data and save them into a TTL file.

### Games

Now serializing the Game class

In [None]:
# Create Graph
g = createGraph()

In [None]:
# Load the CSV files in memory
games = pd.read_csv(gamesPath, sep=",", index_col="title")
vgchartzPath = pd.read_csv(gamesPath, sep=",", index_col="title")

In [None]:
def createGameID(title):
    # Replace all special chars with "-"
    gameID = ""
    for char in title:
        if char.isalnum():
            gameID += char
        elif len(gameID) > 0 and gameID[-1] != '-':
            gameID += '-'
    if len(gameID) > 0 and gameID[-1] == '-':
        gameID = gameID[:-1]
    #print(gameID.lower())
    return gameID.lower()

In [None]:
# Iterate over the games
games.info()
for title, row in games.iterrows():
    # Create gameID from its title
    gameID = createGameID(title)

    # Create the node to add to the Graph
    Game = URIRef(HLTB[gameID])

    # Add triples using store's add() method.
    g.add((Game, RDF.type, HLTB.Game))

    # Add the title of the game
    g.add((Game, HLTB["title"], Literal(title, datatype=XSD.string)))

<class 'pandas.core.frame.DataFrame'>
Index: 35922 entries, 688(I) Hunter/Killer to Yooka-Laylee and the Impossible Lair
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        35922 non-null  int64  
 1   id                35922 non-null  int64  
 2   main_story        17324 non-null  float64
 3   main_plus_extras  11631 non-null  float64
 4   completionist     13107 non-null  float64
 5   all_styles        21112 non-null  float64
 6   coop              183 non-null    float64
 7   versus            274 non-null    float64
 8   type              1314 non-null   object 
 9   developers        34080 non-null  object 
 10  publishers        32754 non-null  object 
 11  platforms         24285 non-null  object 
 12  genres            32843 non-null  object 
dtypes: float64(6), int64(2), object(5)
memory usage: 3.8+ MB


## Missing all other data about games

In [None]:
# Save the data in the Turtle format
with open(gamesTTLPath, "w", encoding="utf-8") as fp:
    fp.write(g.serialize(format="turtle"))

print("Saved games TTL file.")

Saved games TTL file.


### Genre

Now serializing the Genre class