# Has Many Movie Lab

### Introduction
In this lab we will continue to look at the "Has-Many" relationships in our data. The database we will be using during this lab contains information about a selection of movies and related entities such as actors, directors and writers. A movie entity will have relationships with actor, director, and writer entities. The actors, directors and writers will also have relationships with themselves (i.e. a director will have worked with many actors). In problems below, we will use our knowledge of these relationships to build SQL queries.

Let's begin by connecting to the database and reviewing the schema of the tables.

In [1]:
import sqlite3
conn = sqlite3.connect('movie_films_actors.db')
cursor = conn.cursor()

In [2]:
import pandas as pd
root_url = "https://raw.githubusercontent.com/jigsawlabs-student/curriculum-images/main/has-many-movies-lab/"
names = ['actors', 'directors', 'movies', 'writers', 'movie_actors', 'movie_directors', 'movie_writers']
loaded_dfs = [pd.read_csv(f'{root_url}{name}.csv') for name in names]

In [3]:
for index, name in enumerate(names):
    loaded_dfs[index].to_sql(f'{name}', conn, index = False)

In [4]:
cursor.execute('SELECT name from sqlite_master where type= "table"')
cursor.fetchall()

[('actors',),
 ('directors',),
 ('movies',),
 ('writers',),
 ('movie_actors',),
 ('movie_directors',),
 ('movie_writers',)]

In [5]:
cursor.execute('PRAGMA table_info(movies)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'title', 'TEXT', 0, None, 0),
 (2, 'studio', 'TEXT', 0, None, 0),
 (3, 'runtime', 'REAL', 0, None, 0),
 (4, 'description', 'TEXT', 0, None, 0),
 (5, 'release_date', 'TEXT', 0, None, 0),
 (6, 'year', 'INTEGER', 0, None, 0)]

In [6]:
cursor.execute('PRAGMA table_info(actors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'name', 'TEXT', 0, None, 0)]

In [8]:
cursor.execute('PRAGMA table_info(directors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'name', 'TEXT', 0, None, 0)]

In [7]:
cursor.execute('PRAGMA table_info(writers)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0), (1, 'name', 'TEXT', 0, None, 0)]

In [9]:
cursor.execute('PRAGMA table_info(movie_actors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'movie_id', 'INTEGER', 0, None, 0),
 (2, 'actor_id', 'INTEGER', 0, None, 0)]

In [10]:
cursor.execute('PRAGMA table_info(movie_directors)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'movie_id', 'INTEGER', 0, None, 0),
 (2, 'director_id', 'INTEGER', 0, None, 0)]

In [11]:
cursor.execute('PRAGMA table_info(movie_writers)')
cursor.fetchall()

[(0, 'id', 'INTEGER', 0, None, 0),
 (1, 'movie_id', 'INTEGER', 0, None, 0),
 (2, 'writer_id', 'INTEGER', 0, None, 0)]

Let's start off with some basic one table queries:

* What is the title, length, and id of the movie with the longest runtime?

In [13]:
query = """
select id,
      title,
      runtime
from movies
order by runtime desc
limit 1
"""


cursor.execute(query)
cursor.fetchall()

# [('Never Sleep Again: The Elm Street Legacy', 480.0, 11415)]

[(11415, 'Never Sleep Again: The Elm Street Legacy', 480.0)]

* Using your answer from the previous question, how many actors were credited for the movie with the longest runtime? Hint: Use the COUNT function with the movie ID

In [20]:
query = """
select count(*)
from movie_actors
where movie_id = 11415
"""


cursor.execute(query)
cursor.fetchall()

# [(6,)]

[(6,)]

* What was the shortest movie released in 2006?

In [18]:
query = """
select
      title
from movies
where year = 2006
order by runtime asc
limit 1
"""


cursor.execute(query)
cursor.fetchall()

# [('The Guardian',)]

[('The Guardian',)]

### Has Many

* What are the names of the actors in Toy Story?

In [23]:
query = """
select
      a.name
from movies m
join movie_actors ma on m.id = ma.movie_id
join actors a on a.id = ma.actor_id
where title = 'Toy Story'
"""


cursor.execute(query)
cursor.fetchall()

# [('Tom Hanks',),
#  ('Jim Varney',),
#  ('Wallace Shawn',),
#  ('Don Rickles',),
#  ('John Ratzenberger',),
#  ('Tim Allen',)]

[('Tom Hanks',),
 ('Tim Allen',),
 ('Jim Varney',),
 ('Wallace Shawn',),
 ('Don Rickles',),
 ('John Ratzenberger',)]

* What is the name of the director of Toy Story?

In [24]:

query = """
select
      d.name
from movies m
join movie_directors md on m.id = md.movie_id
join directors d on d.id = md.director_id
where m.title = 'Toy Story'
"""


cursor.execute(query)
cursor.fetchall()

# [('John Lasseter',)]

[('John Lasseter',)]

* What are the names of the writers of Toy Story?

In [26]:

query = """
select
      w.name
from movies m
join movie_writers mw on m.id = mw.movie_id
join writers w on w.id = mw.writer_id
where m.title = 'Toy Story'
"""


cursor.execute(query)
cursor.fetchall()


# [('Joss Whedon',), ('Joel Cohen',), ('Andrew Stanton',), ('Alec Sokolow',)]

[('Joss Whedon',), ('Joel Cohen',), ('Andrew Stanton',), ('Alec Sokolow',)]

* What is the name and actor id of the actor with the most credits in the database?

In [32]:
actor = """
select
      a.name, ma.actor_id as id,
      count(*)
from movie_actors ma
join actors a on a.id = ma.actor_id
group by ma.actor_id
order by count(*) desc
limit 1
"""

cursor.execute(actor)
cursor.fetchall()


# [('Robert De Niro', 429, 78)]

[('Robert De Niro', 429, 78)]

* What are the titles of the movies the actor from the previous question has been in, after the year 2005?

In [38]:
query = """
select
      m.title
from movies m
join movie_actors ma on m.id = ma.movie_id
join actors a on a.id = ma.actor_id
where a.id = 429 and m.year > 2005

"""

cursor.execute(query)
cursor.fetchall()

# [("New Year's Eve",),
#  ('Mr. Warmth: The Don Rickles Project',),
#  ('Hands of Stone',),
#  ('Last Vegas',),
#  ('I Knew It Was You: Rediscovering John Cazale',),
#  ('Stardust',),
#  ('Killer Elite',),
#  ("Everybody's Fine",),
#  ('Stone',),
#  ('Machete',),
#  ('Red Lights',),
#  ('Righteous Kill',),
#  ('The Good Shepherd',),
#  ('The Bag Man',),
#  ('Being Flynn',),
#  ('Joy',),
#  ('The Wizard of Lies',),
#  ('Limitless',),
#  ('Killing Season',),
#  ('The Family',),
#  ('Heist',),
#  ('Great Expectations',),
#  ('Little Fockers',),
#  ('What Just Happened?',),
#  ('The Comedian',),
#  ('The Big Wedding',),
#  ('Dirty Grandpa',),
#  ('Grudge Match',)]

[("New Year's Eve",),
 ('Mr. Warmth: The Don Rickles Project',),
 ('Hands of Stone',),
 ('Last Vegas',),
 ('I Knew It Was You: Rediscovering John Cazale',),
 ('Stardust',),
 ('Killer Elite',),
 ("Everybody's Fine",),
 ('Stone',),
 ('Machete',),
 ('Red Lights',),
 ('Righteous Kill',),
 ('The Good Shepherd',),
 ('The Bag Man',),
 ('Being Flynn',),
 ('Joy',),
 ('The Wizard of Lies',),
 ('Limitless',),
 ('Killing Season',),
 ('The Family',),
 ('Heist',),
 ('Great Expectations',),
 ('Little Fockers',),
 ('What Just Happened?',),
 ('The Comedian',),
 ('The Big Wedding',),
 ('Dirty Grandpa',),
 ('Grudge Match',)]

* What are the titles of movies with more than two directors -- order by title ascending and limit to the first five results

In [55]:
query = """
select m.title, count(*)
from movies m
join movie_directors md on m.id = md.movie_id
join directors d on d.id = md.director_id
group by md.movie_id
having count(*) > 2
order by m.title asc
limit 5
"""

cursor.execute(query)
cursor.fetchall()

# [('101 Dalmatians',),
#  ('11/8/2016',),
#  ('A Crude Awakening: The Oil Crash',),
#  ('A Farewell To Arms',),
#  ("A Liar's Autobiography - The Untrue Story of Monty Python's Graham Chapman",)]

[('101 Dalmatians', 4),
 ('11/8/2016', 3),
 ('A Crude Awakening: The Oil Crash', 3),
 ('A Farewell To Arms', 3),
 ("A Liar's Autobiography - The Untrue Story of Monty Python's Graham Chapman",
  3)]

### Has Many Through

* What is the name of the writer in the database that has been credited the most times during the year 2018?

In [61]:
query = """
select
  w.name, count(*)
from movies m
join movie_writers mw
  on m.id = mw.movie_id
join writers w
  on w.id = mw.writer_id
where m.year = 2018
 group by w.id
 order by count(*) desc
limit 1
"""

cursor.execute(query)
cursor.fetchall()

# [('Ryan Engle', 3)]

[('Ryan Engle', 3)]

* What is the name of the actor or actress in the database that has been credited the most between 2010 and 2015 (inclusive)?

In [63]:
query = """
select
  a.name, count(*)
from movies m
join movie_actors ma
  on m.id = ma.movie_id
join actors a
  on a.id = ma.actor_id
where m.year >= 2010 and m.year <= 2015
 group by a.id
 order by count(*) desc
limit 1
"""

cursor.execute(query)
cursor.fetchall()

# [('James Franco', 22)]

[('James Franco', 22)]

* What are the names of all actors who performed in more than 3 movies in 2010?

In [None]:
query = """
select
  a.name
from movies m
join movie_actors ma
  on m.id = ma.movie_id
join actors a
  on a.id = ma.actor_id
where m.year = 2010
group by
  a.id
having count(*) > 3
order by a.name
"""

cursor.execute(query)
cursor.fetchall()

# [('Aaron Taylor-Johnson',),
#  ('Adam Scott',),
#  ('Barry Pepper',),
#  ('Ben Stiller',),
#  ('Danny Huston',),
#  ('Gemma Arterton',),
#  ('Helen Mirren',),
#  ('Jay Baruchel',),
#  ('Jessica Alba',),
#  ('Jonah Hill',),
#  ('Josh Brolin',),
#  ('Josh Duhamel',),
#  ('Keith David',),
#  ('Liam Neeson',),
#  ('Matt Damon',),
#  ('Melissa Leo',),
#  ('Patricia Clarkson',),
#  ('Pierce Brosnan',),
#  ('Ralph Fiennes',),
#  ('Susan Sarandon',),
#  ('Zach Galifianakis',)]

* What studio has Steven Spielberg worked with the most?

In [78]:
query = """
select
  m.studio , count(*)
from movies m
join movie_directors md
  on m.id = md.movie_id
join directors d
  on d.id = md.director_id
where d.name = 'Steven Spielberg'
group by
  m.studio
order by count(*) desc

"""

cursor.execute(query)
cursor.fetchall()

# [('Universal Pictures', 7)]

[('Universal Pictures', 7),
 ('Paramount Pictures', 6),
 ('Warner Bros. Pictures', 2),
 ('Walt Disney Pictures', 2),
 ('MCA Universal Home Video', 2),
 ('Dreamworks Pictures', 2),
 ('Dreamworks', 2),
 ('DreamWorks SKG', 2),
 ('Warner Home Video', 1),
 ('Universal City Studios', 1),
 ('TriStar Pictures', 1),
 ('Sony Pictures Releasing', 1),
 ('Paramount', 1),
 ('Dreamworks Distribution LLC', 1),
 ('20th Century Fox', 1)]

* What years did Steven Spielberg direct 2 movies?

In [82]:
query = """
select
  m.year , count(*)
from movies m
join movie_directors md
  on m.id = md.movie_id
join directors d
  on d.id = md.director_id
where d.name = 'Steven Spielberg'
group by
  m.year
having count(*) = 2
order by count(*)

"""

cursor.execute(query)
cursor.fetchall()

# [(1989, 2), (1993, 2), (1997, 2), (2002, 2),
# (2005, 2), (2011, 2), (2018, 2)]

[(1989, 2), (1993, 2), (1997, 2), (2002, 2), (2005, 2), (2011, 2), (2018, 2)]

* How many movies has each of the actors from Toy Story been in? (movie ID is 3648)

In [88]:
query = """
select
  a.name, count(m.id)
from movies m
join movie_actors ma
  on m.id = ma.movie_id
join actors a
  on a.id = ma.actor_id
group by
  a.name
having a.id in (select a.id from movies m
                    join movie_actors ma
                    on m.id = ma.movie_id
                    join actors a
                  on a.id = ma.actor_id
                  where m.id = 3648)
order by count(*) desc

"""

cursor.execute(query)
cursor.fetchall()

# [('Tom Hanks', 46),
 # ('Jim Varney', 8),
 # ('Wallace Shawn', 27),
 # ('Don Rickles', 10),
 # ('John Ratzenberger', 7),
 # ('Tim Allen', 20)]

[('Tom Hanks', 46),
 ('Wallace Shawn', 27),
 ('Tim Allen', 20),
 ('Don Rickles', 10),
 ('Jim Varney', 8),
 ('John Ratzenberger', 7)]

* What are the names of other movies the director of Toy Story directed?

In [91]:
query = """
select
  m.title
from movies m
join movie_directors md
  on m.id = md.movie_id
join directors d
  on d.id = md.director_id
where d.id in (select d.id from movies m
                    join movie_directors md on m.id = md.movie_id
                    join directors d on d.id = md.director_id
                  where m.id = 3648)

"""

cursor.execute(query)
cursor.fetchall()

# [('Cars 2',), ('Cars',), ("A Bug's Life",), ('Toy Story 2',), ('Toy Story',)]

[('Cars 2',), ('Cars',), ("A Bug's Life",), ('Toy Story 2',), ('Toy Story',)]

* What are the names of all the directors Tom Hanks has worked with? (Actor ID 189) -- order by the director's name ascending and limit to the first five results.

In [120]:
query = """
select d.name from movies m
                    join movie_directors md on m.id = md.movie_id
                    join directors d on d.id = md.director_id
                    join movie_actors ma on m.id = ma.movie_id
                    join actors a on ma.actor_id = a.id
                    where a.id = 189
order by d.name asc
limit 5

"""

cursor.execute(query)
cursor.fetchall()

# [('Alexander Mackendrick',),
#  ('Angus MacLane',),
#  ('Brian DePalma',),
#  ('Chris Paine',),
#  ('Clint Eastwood',)]

[('Alexander Mackendrick',),
 ('Angus MacLane',),
 ('Brian DePalma',),
 ('Chris Paine',),
 ('Clint Eastwood',)]

* What is the name of the director Tom Hanks has worked with the most?

In [111]:
query = """
select
  distinct(d.name), count(*)
from movies m
                    join movie_directors md on m.id = md.movie_id
                    join directors d on d.id = md.director_id
                    join movie_actors ma on m.id = ma.movie_id
                    join actors a on ma.actor_id = a.id
                    where a.id = 189
group by d.id
order by count(*) desc
limit 1

"""

cursor.execute(query)
cursor.fetchall()

# [('Steven Spielberg', 5)]

[('Steven Spielberg', 5)]

* What are the names of all the writers Tom Hanks has worked with?

In [118]:
query = """
select distinct(w.name) from movies m
                    join movie_writers mw on m.id = mw.movie_id
                    join writers w on w.id = mw.writer_id
                    join movie_actors ma on m.id = ma.movie_id
                    join actors a on ma.actor_id = a.id
                    where a.id = 189
limit 10
"""

cursor.execute(query)
cursor.fetchall()

# [('Eric Roth',),
#  ('Nia Vardalos',),
#  ('Tom Hanks',),
#  ('Gary Ross',),
#  ('Anne Spielberg',),
#  ('Chris Paine',),
#  ('Scott Frank',),
#  ('Robert Rodat',),
#  ('Frank Darabont',),
#  ('Tom Tykwer',)]

[('Eric Roth',),
 ('Nia Vardalos',),
 ('Tom Hanks',),
 ('Gary Ross',),
 ('Anne Spielberg',),
 ('Chris Paine',),
 ('Scott Frank',),
 ('Robert Rodat',),
 ('Frank Darabont',),
 ('Tom Tykwer',)]

### Conclusion
The movie database we queried during this lab had a multitude of relationships between the tables. We saw how we could use JOIN to connect the tables, in order query information about entities in different tables.