# Introduction to Graph Datbases

The first part of this assignment is designed to give you hands-on experience with graph databases. You will start by setting up an in-memory graph database, for which the support code is already written. Once the database is running, you will execute queries of increasing complexity, exploring how relationships between nodes and edges are stored and retrieved. Through this process, you will gain practical insights into graph database concepts such as connectivity, traversal, and querying using graph-specific languages.

In [3]:
import os
import sys
import pandas as pd
pd.set_option('display.max_colwidth', 200)

sys.path.append(os.path.abspath(os.path.join(os.path.dirname('__file__'), '..')))
from utils import setup_database, download_sample_data

In [4]:
# Download sample data for the Kuzudb example
data_dir = '../data'
download_sample_data(data_dir, urls=[
    "https://kuzudb.com/data/movie-lens/movies.csv",
    "https://kuzudb.com/data/movie-lens/users.csv",
    "https://kuzudb.com/data/movie-lens/ratings.csv",
    "https://kuzudb.com/data/movie-lens/tags.csv"
])

# Set up the Kuzudb database connection
connection = setup_database('../tmp', delete_existing=True)

# Create schema
connection.execute('CREATE NODE TABLE Movie (movieId INT64, year INT64, title STRING, genres STRING, PRIMARY KEY (movieId))')
connection.execute('CREATE NODE TABLE User (userId INT64, PRIMARY KEY (userId))')
connection.execute('CREATE REL TABLE Rating (FROM User TO Movie, rating DOUBLE, timestamp INT64)')
connection.execute('CREATE REL TABLE Tags (FROM User TO Movie, tag STRING, timestamp INT64)')

# Insert data
connection.execute(f'COPY Movie FROM "{data_dir}/movies.csv" (HEADER=TRUE)')
connection.execute(f'COPY User FROM "{data_dir}/users.csv" (HEADER=TRUE)')
connection.execute(f'COPY Rating FROM "{data_dir}/ratings.csv" (HEADER=TRUE)')
connection.execute(f'COPY Tags FROM "{data_dir}/tags.csv" (HEADER=TRUE)')


Downloading sample data
Downloading https://kuzudb.com/data/movie-lens/movies.csv...
Saved https://kuzudb.com/data/movie-lens/movies.csv to ../data/movies.csv
Downloading https://kuzudb.com/data/movie-lens/users.csv...
Saved https://kuzudb.com/data/movie-lens/users.csv to ../data/users.csv
Downloading https://kuzudb.com/data/movie-lens/ratings.csv...
Saved https://kuzudb.com/data/movie-lens/ratings.csv to ../data/ratings.csv
Downloading https://kuzudb.com/data/movie-lens/tags.csv...
Saved https://kuzudb.com/data/movie-lens/tags.csv to ../data/tags.csv
Sample data downloaded successfully
Loading graph database
Removing existing database at ../tmp


<kuzu.query_result.QueryResult at 0x7ff5ad81d8b0>

## Running Queries

Now that your graph database is set up, you can begin querying it. This section includes seven queries, each increasing in complexity.

In [5]:
# Query 1: Query all nodes with the label 'Movie'. Return those movie nodes. Limit your results to 25
result = connection.execute("MATCH (m:Movie) RETURN m LIMIT 25")

df = result.get_as_df()
df

Unnamed: 0,m
0,"{'_id': {'offset': 0, 'table': 0}, '_label': 'Movie', 'movieId': 833, 'year': 1996, 'title': 'High School High (1996)', 'genres': 'Comedy'}"
1,"{'_id': {'offset': 1, 'table': 0}, '_label': 'Movie', 'movieId': 835, 'year': 1996, 'title': 'Foxfire (1996)', 'genres': 'Drama'}"
2,"{'_id': {'offset': 2, 'table': 0}, '_label': 'Movie', 'movieId': 836, 'year': 1996, 'title': 'Chain Reaction (1996)', 'genres': 'Action|Adventure|Thriller'}"
3,"{'_id': {'offset': 3, 'table': 0}, '_label': 'Movie', 'movieId': 837, 'year': 1996, 'title': 'Matilda (1996)', 'genres': 'Children|Comedy|Fantasy'}"
4,"{'_id': {'offset': 4, 'table': 0}, '_label': 'Movie', 'movieId': 838, 'year': 1996, 'title': 'Emma (1996)', 'genres': 'Comedy|Drama|Romance'}"
5,"{'_id': {'offset': 5, 'table': 0}, '_label': 'Movie', 'movieId': 839, 'year': 1996, 'title': 'Crow: City of Angels, The (1996)', 'genres': 'Action|Thriller'}"
6,"{'_id': {'offset': 6, 'table': 0}, '_label': 'Movie', 'movieId': 840, 'year': 1996, 'title': 'House Arrest (1996)', 'genres': 'Children|Comedy'}"
7,"{'_id': {'offset': 7, 'table': 0}, '_label': 'Movie', 'movieId': 841, 'year': 1959, 'title': 'Eyes Without a Face (Yeux sans visage, Les) (1959)', 'genres': 'Horror'}"
8,"{'_id': {'offset': 8, 'table': 0}, '_label': 'Movie', 'movieId': 842, 'year': 1996, 'title': 'Tales from the Crypt Presents: Bordello of Blood (1996)', 'genres': 'Comedy|Horror'}"
9,"{'_id': {'offset': 9, 'table': 0}, '_label': 'Movie', 'movieId': 848, 'year': 1996, 'title': 'Spitfire Grill, The (1996)', 'genres': 'Drama'}"


In [6]:
# Query 2: Query all nodes with the label 'Movie'. Get all connected nodes to the movie nodes. Limit your results to 50
result = connection.execute(" MATCH (:Movie)--(p) RETURN p LIMIT 50")

df = result.get_as_df()
df

Unnamed: 0,p
0,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
1,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
2,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
3,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
4,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
5,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
6,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
7,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
8,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"
9,"{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'movieId': None, 'year': None, 'title': None, 'genres': None, 'userId': 1}"


In [7]:
# Query 3: Count the total number of nodes in the database
# Hint: Use the `COUNT` function to count the number of nodes
result = connection.execute(" MATCH (m) RETURN COUNT(m) AS total_nodes")

df = result.get_as_df()
df.head()

Unnamed: 0,total_nodes
0,10352


In [8]:
# Query 4: Query all nodes with the label 'User'. Count the degree for these nodes. Filter the nodes where the user rated more than 3 movies. Return the users and the degree
# Hint: First find all users and their ratings, then count the degree, and finally filter the results to only include users with more than 3 ratings
result = connection.execute(" MATCH (u:User)-[r:Rating]-(m:Movie) WITH u, COUNT(u) AS degree WHERE degree > 3 RETURN u, degree")

df = result.get_as_df()
df

Unnamed: 0,u,degree
0,"{'_id': {'offset': 598, 'table': 1}, '_label': 'User', 'userId': 599}",2478
1,"{'_id': {'offset': 607, 'table': 1}, '_label': 'User', 'userId': 608}",831
2,"{'_id': {'offset': 608, 'table': 1}, '_label': 'User', 'userId': 609}",37
3,"{'_id': {'offset': 520, 'table': 1}, '_label': 'User', 'userId': 521}",40
4,"{'_id': {'offset': 110, 'table': 1}, '_label': 'User', 'userId': 111}",646
...,...,...
605,"{'_id': {'offset': 175, 'table': 1}, '_label': 'User', 'userId': 176}",36
606,"{'_id': {'offset': 37, 'table': 1}, '_label': 'User', 'userId': 38}",78
607,"{'_id': {'offset': 428, 'table': 1}, '_label': 'User', 'userId': 429}",58
608,"{'_id': {'offset': 373, 'table': 1}, '_label': 'User', 'userId': 374}",33


In [9]:
# Query 5: Query all nodes with the label 'Movie'. Each node has a 'genre' attribute. Count the number of nodes per genre
# Hint: Use the `WITH` clause to group by genres and count the number of movies
result = connection.execute("MATCH (m:Movie) WITH m.genres AS genre, COUNT(m) AS count RETURN genre, count")

df = result.get_as_df()
df.head()

Unnamed: 0,genre,count
0,Comedy|Horror,69
1,Horror|Thriller,135
2,Musical|Romance,5
3,Drama|Mystery|Romance|Thriller,9
4,Drama|Romance|War,24


In [10]:
# Query 6: Query all nodes with the label 'Movie' and 'User', and the edge 'Rating' between movie and user. Each edge 'Rating' has a rating. Find the top 10 rated movies by average rating score
# Hint: Use the AVG clause to calculate an average. Use the `ORDER BY` clause to sort the movies by rating in descending order
result = connection.execute("MATCH (u:User)-[r:Rating]-(m:Movie) WITH m, AVG(r.rating) AS AvgRating RETURN m, AvgRating ORDER BY AvgRating DESC LIMIT 10")

df = result.get_as_df()
df.head()
df

Unnamed: 0,m,AvgRating
0,"{'_id': {'offset': 880, 'table': 0}, '_label': 'Movie', 'movieId': 6021, 'year': 1977, 'title': 'American Friend, The (Amerikanische Freund, Der) (1977)', 'genres': 'Crime|Drama|Mystery|Thriller'}",5.0
1,"{'_id': {'offset': 1125, 'table': 0}, '_label': 'Movie', 'movieId': 26169, 'year': 1967, 'title': 'Branded to Kill (Koroshi no rakuin) (1967)', 'genres': 'Action|Crime|Drama'}",5.0
2,"{'_id': {'offset': 1624, 'table': 0}, '_label': 'Movie', 'movieId': 78836, 'year': 2009, 'title': 'Enter the Void (2009)', 'genres': 'Drama'}",5.0
3,"{'_id': {'offset': 1751, 'table': 0}, '_label': 'Movie', 'movieId': 107771, 'year': 2013, 'title': 'Only Lovers Left Alive (2013)', 'genres': 'Drama|Horror|Romance'}",5.0
4,"{'_id': {'offset': 1772, 'table': 0}, '_label': 'Movie', 'movieId': 108795, 'year': 2009, 'title': 'Wonder Woman (2009)', 'genres': 'Action|Adventure|Animation|Fantasy'}",5.0
5,"{'_id': {'offset': 1805, 'table': 0}, '_label': 'Movie', 'movieId': 147250, 'year': 1980, 'title': 'The Adventures of Sherlock Holmes and Doctor Watson', 'genres': 'Adventure|Crime|Mystery'}",5.0
6,"{'_id': {'offset': 1809, 'table': 0}, '_label': 'Movie', 'movieId': 147326, 'year': 1980, 'title': 'The Adventures of Sherlock Holmes and Doctor Watson: King of Blackmailers (1980)', 'genres': 'Cr...",5.0
7,"{'_id': {'offset': 1810, 'table': 0}, '_label': 'Movie', 'movieId': 147328, 'year': 1979, 'title': 'The Adventures of Sherlock Holmes and Dr. Watson: Bloody Signature (1979)', 'genres': 'Crime'}",5.0
8,"{'_id': {'offset': 1811, 'table': 0}, '_label': 'Movie', 'movieId': 147330, 'year': 1979, 'title': 'Sherlock Holmes and Dr. Watson: Acquaintance (1979)', 'genres': 'Crime'}",5.0
9,"{'_id': {'offset': 1852, 'table': 0}, '_label': 'Movie', 'movieId': 149508, 'year': 2011, 'title': 'Spellbound (2011)', 'genres': 'Comedy|Romance'}",5.0


In [None]:
# Query 7: Query all nodes with the label 'Movie' and 'User', and the edge 'Rating' between movie and user. Find pairs of movies often rated by the same users


# WITH u1, u2, m1, m2 WHERE m1 <> m2 AND u1 = u2 RETURN m1, m2, COUNT(u1) AS common_users ORDER BY common_users DESC LIMIT 10
result = connection.execute("MATCH (m1:Movie)-[r1:Rating]-(u1:User), (m2:Movie)-[r2:Rating]-(u2:User) WHERE m1 <> m2 AND u1 = u2 WITH m1, m2, u1 LIMIT 10 RETURN m1, m2, u1")

df = result.get_as_df()
df

Unnamed: 0,m1,m2,u1
0,"{'_id': {'offset': 3877, 'table': 0}, '_label': 'Movie', 'movieId': 3, 'year': 1995, 'title': 'Grumpier Old Men (1995)', 'genres': 'Comedy|Romance'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
1,"{'_id': {'offset': 3880, 'table': 0}, '_label': 'Movie', 'movieId': 6, 'year': 1995, 'title': 'Heat (1995)', 'genres': 'Action|Crime|Thriller'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
2,"{'_id': {'offset': 3918, 'table': 0}, '_label': 'Movie', 'movieId': 47, 'year': 1995, 'title': 'Seven (a.k.a. Se7en) (1995)', 'genres': 'Mystery|Thriller'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
3,"{'_id': {'offset': 3921, 'table': 0}, '_label': 'Movie', 'movieId': 50, 'year': 1995, 'title': 'Usual Suspects, The (1995)', 'genres': 'Crime|Mystery|Thriller'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
4,"{'_id': {'offset': 3937, 'table': 0}, '_label': 'Movie', 'movieId': 70, 'year': 1996, 'title': 'From Dusk Till Dawn (1996)', 'genres': 'Action|Comedy|Horror|Thriller'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
5,"{'_id': {'offset': 3964, 'table': 0}, '_label': 'Movie', 'movieId': 101, 'year': 1996, 'title': 'Bottle Rocket (1996)', 'genres': 'Adventure|Comedy|Crime|Romance'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
6,"{'_id': {'offset': 3972, 'table': 0}, '_label': 'Movie', 'movieId': 110, 'year': 1995, 'title': 'Braveheart (1995)', 'genres': 'Action|Drama|War'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
7,"{'_id': {'offset': 3999, 'table': 0}, '_label': 'Movie', 'movieId': 151, 'year': 1995, 'title': 'Rob Roy (1995)', 'genres': 'Action|Drama|Romance|War'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
8,"{'_id': {'offset': 4005, 'table': 0}, '_label': 'Movie', 'movieId': 157, 'year': 1995, 'title': 'Canadian Bacon (1995)', 'genres': 'Comedy|War'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
9,"{'_id': {'offset': 4011, 'table': 0}, '_label': 'Movie', 'movieId': 163, 'year': 1995, 'title': 'Desperado (1995)', 'genres': 'Action|Romance|Western'}","{'_id': {'offset': 3875, 'table': 0}, '_label': 'Movie', 'movieId': 1, 'year': 1995, 'title': 'Toy Story (1995)', 'genres': 'Adventure|Animation|Children|Comedy|Fantasy'}","{'_id': {'offset': 0, 'table': 1}, '_label': 'User', 'userId': 1}"
