# Movie Review (MR)
Set notebook to read SQL commands.

In [17]:
import pandas as pd
import sqlite3


In [18]:
# Import movie databases
df1 = pd.read_csv('title.basics.tsv.gz', sep='\t')
df2 = pd.read_csv('title.crew.tsv.gz', sep='\t')
df3 = pd.read_csv('title.ratings.tsv.gz', sep='\t')
# df4 = pd.read_csv('title.principals.tsv.gz', sep='\t')

  df1 = pd.read_csv('title.basics.tsv.gz', sep='\t')


In [19]:
# Connect to sql
conn = sqlite3.connect('movie.db')

# Transform csv in db
df1.to_sql('basic', conn, index = False, if_exists = 'replace')
df2.to_sql('crew', conn, index = False, if_exists = 'replace')
df3.to_sql('rating', conn, index = False, if_exists = 'replace')


1503482

In [20]:
# Load sql
## Run in Google colab
%load_ext sql
## Run in your local idle
#%load_ext sqlite3

# Connect to sql server
%sql sqlite:///movie.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [21]:
%%sql
SELECT * FROM basic LIMIT 2;

 * sqlite:///movie.db
Done.


tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
tt0000001,short,Carmencita,Carmencita,0,1894,\N,1,"Documentary,Short"
tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892,\N,5,"Animation,Short"


In [22]:
%%sql
SELECT * FROM crew LIMIT 2;

 * sqlite:///movie.db
Done.


tconst,directors,writers
tt0000001,nm0005690,\N
tt0000002,nm0721526,\N


In [23]:
%%sql
SELECT * FROM rating LIMIT 2;

 * sqlite:///movie.db
Done.


tconst,averageRating,numVotes
tt0000001,5.7,2104
tt0000002,5.6,282


## Business Questions

### 1. Genre Popularity Analysis
- What are the top 5 most popular genres based on the number of films released?
- How does the average movie rating vary by genre?
- Which genres saw the biggest increase in the number of releases over the last 5 years?

SQL Skills:
- Joins (linking title.basics with genre-related tables).
- Aggregation functions (`COUNT()`, `AVG()`).
- Subqueries (for filtering by recent years).
- Sorting (`ORDER BY`).

In [34]:
# What are the top 5 most popular genres based on the number of films released?
%%sql
SELECT genres, COUNT(genres) as count_genres
FROM basic
GROUP BY genres
ORDER BY count_genres DESC
LIMIT 5;

 * sqlite:///movie.db
Done.


genres,count_genres
Drama,1276745
Comedy,741025
Talk-Show,703731
News,586493
Documentary,540870


In [37]:
# How does the average movie rating vary by genre?
%%sql
SELECT genres, AVG(averageRating) as avg_rating
FROM basic
JOIN rating
On basic.tconst = rating.tconst
GROUP BY genres
ORDER BY avg_rating DESC
LIMIT 10; -- adding a limit so we can see better, but if you remove it you can see the entire list

 * sqlite:///movie.db
Done.


genres,avg_rating
"Documentary,Musical,Reality-TV",9.5
"Animation,Musical,Reality-TV",9.4
"Reality-TV,Short,Talk-Show",9.2547619047619
"Music,War",9.25
"Comedy,Game-Show,Musical",9.21818181818182
"Biography,Crime,Reality-TV",9.216666666666669
"Family,Game-Show,Romance",9.2
"Mystery,Sci-Fi,Talk-Show",9.2
"History,Music,News",9.2
"Family,History,Talk-Show",9.2


In [38]:
# Now just checking the bottom ratings
%%sql
SELECT genres, AVG(averageRating) as avg_rating
FROM basic
JOIN rating
On basic.tconst = rating.tconst
GROUP BY genres
ORDER BY avg_rating ASC
LIMIT 10;

 * sqlite:///movie.db
Done.


genres,avg_rating
"Drama,Family,Game-Show",1.7
"Biography,Reality-TV,Sport",2.3
"Comedy,Sport,War",2.5
"News,Short,War",2.7
"Sci-Fi,Thriller,War",3.2
"Fantasy,Sci-Fi,Western",3.4
"Horror,Sci-Fi,Western",3.5
"Adventure,Sci-Fi,Western",3.6
"Action,Music,Thriller",3.7
"Fantasy,Musical,Western",3.7


In [43]:
# Which genres saw the biggest increase in the number of releases over the last 5 years?
%%sql
SELECT genres, COUNT(genres) as count_genres
FROM basic
WHERE startYear >= DATE('now')-5
GROUP BY genres
ORDER BY count_genres DESC
LIMIT 5;

 * sqlite:///movie.db
Done.


genres,count_genres
Drama,647329
Talk-Show,293744
Comedy,253035
News,219342
Documentary,177488


### 2. Director Productivity Analysis
- Who are the directors with the most films in the catalog?
- What is the average rating for movies directed by each of the top 10 directors?
- Which director worked across the most genres?

SQL Skills:
- Joins (linking title.crew with title.basics).
- Aggregation functions (`COUNT()`, `AVG()`).
- Subqueries (to rank directors by number of movies).
- Grouping and filtering (`GROUP BY`, `HAVING`).

### 3. Yearly Production Trends
- Which year had the most film releases in total?
- What is the trend of movie releases over the last 10 years?
- How many movies and TV series were released in the same year?

SQL Skills:
- Date manipulation (using startYear and endYear).
- Aggregation functions (`COUNT()`).
- Subqueries (for year-over-year comparisons).
- Sorting and filtering (`GROUP BY`, `HAVING`).

### 4. Movie Rating Trends
- What are the top 5 highest-rated movies?
- Which genre has the highest average movie rating?
- Is there a correlation between movie runtime and average rating?

SQL Skills:
- Joins (linking title.basics and title.ratings).
- Aggregation functions (`AVG()`, `MAX()`).
- Subqueries (for filtering by ratings).
- Sorting (`ORDER BY`).

### 5. Actor and Role Analysis
- Who are the most frequent actors in the top-rated movies?
- What is the average rating of movies featuring actors with more than 10 appearances?
- Which actors appear across multiple genres?

SQL Skills:
- Joins (linking title.principals and title.ratings).
- Aggregation functions (`COUNT()`, `AVG()`).
- Subqueries (to rank actors by frequency).
- Grouping (`GROUP BY`) and filtering (`HAVING`).

### 6. TV Series and Movie Comparison
- What is the average rating of TV series compared to movies?
- Which TV series has the longest runtime and how does it compare to movies of similar genres?
- What percentage of releases in the last 5 years were TV series?

SQL Skills:
- Joins (linking title.basics, title.episode).
- Aggregation functions (`AVG()`, `COUNT()`).
- Date manipulation (using startYear and endYear).
- Sorting (`ORDER BY`) and filtering (`HAVING`).