# Assignment 09: Join and Merge in SQL (SQLite Version)

## Due 20 June 2025

### Introduction

For this assignment, you will continue working with SQL databases using SQLite. You should use Python to write the SQL queries. If possible, please submit your answers in PDF format. The data and questions are listed below.

In [11]:
import sqlite3
import pandas as pd

# Create in-memory database
conn = sqlite3.connect(':memory:')

# Create tables
conn.execute('''
CREATE TABLE directors (
    director_id INTEGER PRIMARY KEY AUTOINCREMENT,
    director_name TEXT,
    country TEXT,
    birth_year INTEGER,
    awards INTEGER
)''')

conn.execute('''
CREATE TABLE movies (
    movie_id INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT,
    director_id INTEGER,
    release_year INTEGER,
    box_office REAL,
    rating REAL,
    FOREIGN KEY (director_id) REFERENCES directors(director_id)
)''')

# Insert data
directors_data = [
    ('Christopher Nolan', 'UK', 1970, 5),
    ('Greta Gerwig', 'USA', 1983, 3),
    ('Bong Joon-ho', 'South Korea', 1969, 4),
    ('Sofia Coppola', 'USA', 1971, 2),
    ('Pedro Almodóvar', 'Spain', 1949, 6),
    ('Agnès Varda', 'France', 1928, 4)
]
conn.executemany('INSERT INTO directors (director_name, country, birth_year, awards) VALUES (?,?,?,?)', directors_data)

movies_data = [
    ('Oppenheimer', 1, 2023, 950000000.00, 8.5),
    ('Barbie', 2, 2023, 1440000000.00, 7.0),
    ('Parasite', 3, 2019, 258773645.00, 8.9),
    ('Lost in Translation', 4, 2003, 119723856.00, 7.7),
    ('Pain and Glory', 5, 2019, 38219573.00, 7.5),
    ('Faces Places', 6, 2017, 903996.00, 7.9),
    ('Inception', 1, 2010, 836836967.00, 8.8),
    ('Lady Bird', 2, 2017, 78965367.00, 7.4)
]
conn.executemany('''
    INSERT INTO movies (title, director_id, release_year, box_office, rating)
    VALUES (?,?,?,?,?)''', movies_data)
conn.commit()

1. Write a query using `INNER JOIN` to display the movie title, director name, and box office earnings for all movies, ordered by box office earnings in descending order

In [12]:
# Write your anwer here
query1 = """
SELECT movies.title, directors.director_name, movies.box_office
FROM movies
INNER JOIN directors ON movies.director_id = directors.director_id
ORDER BY movies.box_office DESC;
"""

display(pd.read_sql(query1, conn))

Unnamed: 0,title,director_name,box_office
0,Barbie,Greta Gerwig,1440000000.0
1,Oppenheimer,Christopher Nolan,950000000.0
2,Inception,Christopher Nolan,836837000.0
3,Parasite,Bong Joon-ho,258773600.0
4,Lost in Translation,Sofia Coppola,119723900.0
5,Lady Bird,Greta Gerwig,78965370.0
6,Pain and Glory,Pedro Almodóvar,38219570.0
7,Faces Places,Agnès Varda,903996.0


2. Using a `LEFT JOIN`, find all directors and count the number of movies they have directed.

In [15]:
# Write your answer here
query2 = """
SELECT directors.director_name, COUNT(movies.title) AS movie_count
FROM directors
LEFT JOIN movies ON movies.director_id = directors.director_id
GROUP BY directors.director_name;
"""
display(pd.read_sql(query2, conn))

Unnamed: 0,director_name,movie_count
0,Agnès Varda,1
1,Bong Joon-ho,1
2,Christopher Nolan,2
3,Greta Gerwig,2
4,Pedro Almodóvar,1
5,Sofia Coppola,1


3. Write a `SELF JOIN` query to compare the ratings of movies by the same director. Show only pairs where the second movie has a higher rating than the first.

In [16]:
# Write your answer here
query3 = """
SELECT movies1.director_id,
       movies1.title AS movie1,
       movies2.title AS movie2,
       movies1.rating AS first_movie_rating,
       movies2.rating AS second_movie_rating
FROM movies AS movies1
JOIN movies AS movies2 ON movies1.director_id = movies2.director_id
WHERE movies2.rating > movies1.rating;
"""

display(pd.read_sql(query3, conn))

Unnamed: 0,director_id,movie1,movie2,first_movie_rating,second_movie_rating
0,1,Oppenheimer,Inception,8.5,8.8
1,2,Barbie,Lady Bird,7.0,7.4


4. Using appropriate joins, find directors who have made movies with above-average box office earnings (compared to all movies in the database).

In [20]:
# Write your answer here
query4 = """
SELECT directors.director_name, movies.title, movies.box_office
FROM movies
LEFT JOIN directors ON movies.director_id = directors.director_id
WHERE movies.box_office > (SELECT AVG(box_office) AS average_box_office FROM movies);
"""
display(pd.read_sql(query4, conn))

Unnamed: 0,director_name,title,box_office
0,Christopher Nolan,Oppenheimer,950000000.0
1,Greta Gerwig,Barbie,1440000000.0
2,Christopher Nolan,Inception,836837000.0


5. Create a query using `CROSS JOIN` to show all possible combinations of directors and movies, even if they did not direct them. Limit the output to 10 rows.

In [21]:
# Write your answer here
query5 = """
SELECT directors.director_name, movies.title
FROM directors
CROSS JOIN movies
LIMIT 10;
"""
display(pd.read_sql(query5, conn))

Unnamed: 0,director_name,title
0,Christopher Nolan,Oppenheimer
1,Christopher Nolan,Barbie
2,Christopher Nolan,Parasite
3,Christopher Nolan,Lost in Translation
4,Christopher Nolan,Pain and Glory
5,Christopher Nolan,Faces Places
6,Christopher Nolan,Inception
7,Christopher Nolan,Lady Bird
8,Greta Gerwig,Oppenheimer
9,Greta Gerwig,Barbie


6. Write a query that uses `UNION` to create a list of all director names and movie titles in a single column. Label the column `name` and include a column (called `type`) indicating if it is a director or movie. Order the results by type and name.

In [22]:
# Write your answer here
query6 = """
SELECT 'director' AS type, director_name AS name
FROM directors 
UNION
SELECT 'movie' AS type, title AS name
FROM movies
ORDER BY type, name
"""

display(pd.read_sql(query6, conn))

Unnamed: 0,type,name
0,director,Agnès Varda
1,director,Bong Joon-ho
2,director,Christopher Nolan
3,director,Greta Gerwig
4,director,Pedro Almodóvar
5,director,Sofia Coppola
6,movie,Barbie
7,movie,Faces Places
8,movie,Inception
9,movie,Lady Bird


7. Using appropriate joins, find the director with the highest average movie rating. Show only the row with the director's name, average rating, and number of movies.

In [51]:
# Write your answer here
query7 = """
SELECT directors.director_name, AVG(rating) AS average_movie_rating, COUNT(movies.movie_id) AS movie_count
FROM movies
LEFT JOIN directors ON movies.director_id = directors.director_id
GROUP BY directors.director_id
ORDER BY average_movie_rating DESC
LIMIT 1;
"""
display(pd.read_sql(query7, conn))

Unnamed: 0,director_name,average_movie_rating,movie_count
0,Bong Joon-ho,8.9,1


8. Create a query using `LEFT JOIN` and `IS NULL` to find whether there are directors who have not directed any movies.

In [23]:
# Write your answer here
query8 = """
SELECT directors.director_name
FROM directors 
LEFT JOIN movies ON directors.director_id = movies.director_id
WHERE movies.movie_id IS NULL;
"""
display(pd.read_sql(query8, conn))

Unnamed: 0,director_name


9. Using appropriate joins, find pairs of movies released in the same year, along with their directors' names. Please do not match a movie with itself.

In [25]:
# Write your answer here
query9 = """
SELECT movies.title, directors.director_name, movies.release_year
FROM movies 
LEFT JOIN directors on movies.director_id = directors.director_id
WHERE release_year IN (SELECT release_year FROM movies GROUP BY release_year HAVING COUNT(*) > 1)
"""
display(pd.read_sql(query9, conn))

Unnamed: 0,title,director_name,release_year
0,Oppenheimer,Christopher Nolan,2023
1,Barbie,Greta Gerwig,2023
2,Parasite,Bong Joon-ho,2019
3,Pain and Glory,Pedro Almodóvar,2019
4,Faces Places,Agnès Varda,2017
5,Lady Bird,Greta Gerwig,2017


10. Show the age of each director when they released their movies. Create a column entitled `age_at_release` in your output. Order the results by the director's name and the movie's release year.

In [27]:
# Write your answer here
query10 = """
SELECT directors.director_name, movies.release_year, movies.title, (release_year-birth_year) AS age_at_release
FROM movies
LEFT JOIN directors ON movies.director_id = directors.director_id
ORDER BY director_name, release_year ASC
"""
display(pd.read_sql(query10, conn))

Unnamed: 0,director_name,release_year,title,age_at_release
0,Agnès Varda,2017,Faces Places,89
1,Bong Joon-ho,2019,Parasite,50
2,Christopher Nolan,2010,Inception,40
3,Christopher Nolan,2023,Oppenheimer,53
4,Greta Gerwig,2017,Lady Bird,34
5,Greta Gerwig,2023,Barbie,40
6,Pedro Almodóvar,2019,Pain and Glory,70
7,Sofia Coppola,2003,Lost in Translation,32


Good luck! 😃