# Populating and Querying a Graph Database

This series of exercises guides you through the process of populating and querying a graph database with a dataset of shows and movies. We utilize Python libraries including Pandas for data loading, RDFlib for data manipulation and triple creation, and the SPARQLUpdateStore from RDFlib for uploading those triples into a Blazegraph triplestore.

## Class Diagram Overview

The class diagram consists of two primary entities:

- **Show**: Represents movies or TV shows with detailed attributes.
- **Cast**: Contains cast members associated with each show.
A "cast" relationship connects the Show class to the Cast class, indicating that a show features multiple cast members.

![classe diagram](classes_diagram.png)

### Namespaces and URIs for Classes

For the purpose of RDF mapping, we introduce the following URIs and namespaces for our classes:

- **Show Class URI**: `http://example.org/show/Show`
- **Cast Class URI**: `http://example.org/show/Cast`

And the namespaces:

- **dcterms**: `http://purl.org/dc/terms/`
- **rdf**: `http://www.w3.org/1999/02/22-rdf-syntax-ns#`
- **schema**: `http://schema.org/`
- **foaf**: `http://xmlns.com/foaf/0.1/`

### RDF Properties for Each Attribute

For the `Show` class:

- **show_id**: `dcterms:identifier` - Uniquely identifies each show.
- **type**: `schema:additionalType` - Specifies if the entity is a movie or a TV show.
- **title**: `dcterms:title` - The title of the show.
- **director**: `schema:director` - The director of the show.
- **cast**: `schema:actor` - Indicates the connection between the show and individuals in its cast.
- **country**: `dcterms:spatial` - The country where the show was produced.
- **date_added**: `dcterms:date` - The date the show was added to the collection.
- **release_year**: `schema:datePublished` - The release year of the show.
- **rating**: `schema:contentRating` - The rating of the show.
- **duration**: `schema:duration` - The duration of the show.
- **listed_in**: `dcterms:subject` - The categories the show is listed under.
- **description**: `dcterms:description` - A brief description of the show.

For the `Cast` class:

- **name**: `foaf:name` - The name of the cast member.

## Exercise 1: Defining Classes, Attributes, and Relationships Using RDFlib

**Objective**: Define all the classes, attributes, and relationships using RDFlib based on the provided data model.

**Tasks**:

1. Import RDFlib and define the namespaces based on the URIs provided above.
2. Create the `Show` and `Cast` classes using RDFlib.
3. Define RDF predicates for all attributes listed in the RDF Properties section.
4. Create relationships between `Show` and `Cast` to represent the casting information.

## Exercise 2: Importing Data from CSV and Mapping to RDF

**Objective**: Import data from the `netflix_titles` CSV file using Pandas and map each row to the RDF data model, inserting the data into an RDFlib Graph.

**Tasks**:

1. Use Pandas to read data from the CSV file.
2. Initialize an RDFlib Graph.
3. For each row in the DataFrame, create RDF triples based on the data model provided, adding them to the graph.
4. Ensure the correct mapping of show types to their RDF `type` and the establishment of cast relationships.

## Exercise 3: Uploading Data to a Blazegraph Database

**Objective**: Use the `SPARQLUpdateStore` from RDFlib to load the RDF data into a Blazegraph database.

**Tasks**:

1. Initialize an instance of `SPARQLUpdateStore` from RDFlib.
2. Set the SPARQL endpoint to your Blazegraph instance by specifying the same URL for both reading and writing, appending '/sparql' to your Blazegraph base URL (e.g., `http://127.0.0.1:9999/blazegraph/sparql`).
3. Open a connection to the Blazegraph SPARQL endpoint.
4. Iterate over all triples in your RDFlib Graph and add them to the store using the `add(triple)` method.
5. Remember to close the connection to the store once the upload process is complete.

## Exercise 4: SPARQL Query to List All Movies

**Objective**: Write a SPARQL query to list all movies, including their titles and release years.

**Tasks**:

1. Connect to your Blazegraph endpoint using SPARQLWrapper.
2. Write a SPARQL query that selects movies (`rdf:type` of movie) and retrieves their titles and release years.
3. Execute the query and print the results.

## Exercise 5: SPARQL Query to Find Shows with a Specific Actor

**Objective**: Write a SPARQL query to find all shows featuring a specific actor by name.

**Tasks**:

1. Connect to your Blazegraph endpoint using SPARQLWrapper.
2. Write a SPARQL query that searches for shows (`rdf:type` of Show) that have a specific actor (using the `schema:actor` predicate) in their cast.
3. Replace "actor name" with the name of the actor you are searching for in your query.
4. Execute the query and display the shows that feature the specified actor.