# Neo4j Movie Database - Cypher Query Exercises

This notebook contains a set of exercises for practicing Cypher queries on the Neo4j **"movie"** database.

## **Prerequisites**
Before running the notebook, ensure you have:
- A running instance of **Neo4j** (local or cloud-based).
- Downloaded the latest dump available at https://github.com/neo4j-graph-examples/movies/tree/main/data.
- Uploaded the dump to your Neo4j project 
- Imported the dump into the **"movie"** example database.

## **Connecting to Neo4j**
Make sure you update the **URI, USERNAME, and PASSWORD** in the code below.


In [11]:
from neo4j import GraphDatabase
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from collections import Counter
import seaborn as sns

# Connection details (update credentials accordingly)
URI = "bolt://localhost:7687"
USERNAME = "neo4j"
PASSWORD = "password"

# Connect to the "movie" database
driver = GraphDatabase.driver(URI, auth=(USERNAME, PASSWORD), database="movies-50")

def run_query(query, params=None):
    """Execute a Cypher query and return the results as a DataFrame."""
    with driver.session() as session:
        result = session.run(query, params)
        return pd.DataFrame([dict(record) for record in result])

# Test connection
print("Connection successful. Database contains:")
print(run_query("MATCH (n) RETURN labels(n)[0] AS label, COUNT(*) AS count"))

Connection successful. Database contains:
    label  count
0   Movie     38
1  Person    133


# **Movie Statistics with Visualization**

### Retrieve all movies with additional metadata and visualize the distribution by year.


In [35]:
movies_query = """
MATCH (m:Movie)
RETURN m.title, m.released
ORDER BY m.released DESC
"""
movies_df = run_query(movies_query)
movies_df.head()

Unnamed: 0,m.title,m.released
0,Cloud Atlas,2012
1,Ninja Assassin,2009
2,Speed Racer,2008
3,Frost/Nixon,2008
4,Charlie Wilson's War,2007


### Top 5 years with most movies

In [32]:
query = '''
MATCH (m:Movie)
RETURN m.released as year, COUNT(*) as movie_count
ORDER BY movie_count DESC
LIMIT 5
'''

movies_df = run_query(query)
movies_df.head()

Unnamed: 0,year,movie_count
0,1999,4
1,1992,4
2,2000,3
3,2003,3
4,1998,3


In [38]:
query_generica = '''
match (m:Person)
return m
'''
df_generico = run_query(query_generica)
df_generico.head(1)

Unnamed: 0,m
0,"(born, name)"


# **Actor Network Analysis for "The Matrix"**
### Find actors from "The Matrix" and analyze their collaborations in other movies.


### Create a network visualization


In [45]:
query_matrix = '''
match (n:Movie)<- [:ACTED_IN] -(p:Person) - [r:ACTED_IN] - (m:Movie)
where m.title = 'The Matrix'
return p.name, n.title
'''

df_matrix = run_query(query_matrix)
df_matrix


Unnamed: 0,p.name,n.title
0,Hugo Weaving,Cloud Atlas
1,Hugo Weaving,V for Vendetta
2,Hugo Weaving,The Matrix Revolutions
3,Hugo Weaving,The Matrix Reloaded
4,Laurence Fishburne,The Matrix Revolutions
5,Laurence Fishburne,The Matrix Reloaded
6,Carrie-Anne Moss,The Matrix Revolutions
7,Carrie-Anne Moss,The Matrix Reloaded
8,Keanu Reeves,Something's Gotta Give
9,Keanu Reeves,The Replacements


# **Actor Connectivity Analysis**

### Calculate collaboration density defined as # unique co-actors/# movies

### Visualization


### Correlation analysis



# **Path Analysis Between Actors (Tom Hanks to Keanu Reeves)**

### Calculate path statistics

### Visualize a sample path


# **Actor Career Timeline Analysis**

### Create timeline visualization


### Career statistics
