# Movie Review (MR)
Set notebook to read SQL commands.

In [1]:
import pandas as pd
import sqlite3


In [None]:
# Import movie databases
df1 = pd.read_csv('name.basics.tsv.gz', sep='\t')
df2 = pd.read_csv('title.crew.tsv.gz', sep='\t')
df3 = pd.read_csv('title.principals.tsv.gz', sep='\t')

MemoryError: Unable to allocate 682. MiB for an array with shape (89365028,) and data type int64

In [None]:
# Connect to sql
conn = sqlite3.connect('movie.db')

# Transform csv in db
df1.to_sql('basic', conn, index = False, if_exists = 'replace')
df2.to_sql('crew', conn, index = False, if_exists = 'replace')
df3.to_sql('principals', conn, index = False, if_exists = 'replace')

# Load sql
%load_ext sqlite3

# Connect to sql server
%sql sqlite:///movie.db

NameError: name 'df2' is not defined

## Business Questions

### 1. Genre Popularity Analysis
- What are the top 5 most popular genres based on the number of films released?
- How does the average movie rating vary by genre?
- Which genres saw the biggest increase in the number of releases over the last 5 years?

SQL Skills:
- Joins (linking title.basics with genre-related tables).
- Aggregation functions (`COUNT()`, `AVG()`).
- Subqueries (for filtering by recent years).
- Sorting (`ORDER BY`).

### 2. Director Productivity Analysis
- Who are the directors with the most films in the catalog?
- What is the average rating for movies directed by each of the top 10 directors?
- Which director worked across the most genres?

SQL Skills:
- Joins (linking title.crew with title.basics).
- Aggregation functions (`COUNT()`, `AVG()`).
- Subqueries (to rank directors by number of movies).
- Grouping and filtering (`GROUP BY`, `HAVING`).

### 3. Yearly Production Trends
- Which year had the most film releases in total?
- What is the trend of movie releases over the last 10 years?
- How many movies and TV series were released in the same year?

SQL Skills:
- Date manipulation (using startYear and endYear).
- Aggregation functions (`COUNT()`).
- Subqueries (for year-over-year comparisons).
- Sorting and filtering (`GROUP BY`, `HAVING`).

### 4. Movie Rating Trends
- What are the top 5 highest-rated movies?
- Which genre has the highest average movie rating?
- Is there a correlation between movie runtime and average rating?

SQL Skills:
- Joins (linking title.basics and title.ratings).
- Aggregation functions (`AVG()`, `MAX()`).
- Subqueries (for filtering by ratings).
- Sorting (`ORDER BY`).

### 5. Actor and Role Analysis
- Who are the most frequent actors in the top-rated movies?
- What is the average rating of movies featuring actors with more than 10 appearances?
- Which actors appear across multiple genres?

SQL Skills:
- Joins (linking title.principals and title.ratings).
- Aggregation functions (`COUNT()`, `AVG()`).
- Subqueries (to rank actors by frequency).
- Grouping (`GROUP BY`) and filtering (`HAVING`).

### 6. TV Series and Movie Comparison
- What is the average rating of TV series compared to movies?
- Which TV series has the longest runtime and how does it compare to movies of similar genres?
- What percentage of releases in the last 5 years were TV series?

SQL Skills:
- Joins (linking title.basics, title.episode).
- Aggregation functions (`AVG()`, `COUNT()`).
- Date manipulation (using startYear and endYear).
- Sorting (`ORDER BY`) and filtering (`HAVING`).