# About this project

This project aims to demonstrate the use of SQL to answer hypothetical sales and marketing questions about a DVD rental store with a MySQL database named Sakila.

# Sakila Database

The Sakila database is a nicely normalised schema modelling a DVD rental store, featuring things like films, actors, film-actor relationships, and a central inventory table that connects films, stores, and rentals.

The Sakila MySQL sample database is available from http://dev.mysql.com/doc/index-other.html. 


## Sakila Database Entity Relationship Diagram(ERD)

<img src="https://www.jooq.org/img/sakila.png">


## Problem Description
   
    
- **Inventory summary**:
    - What's the total value of all the inventory and total inventory value of each store?
    - How many film are there in each category of each store, and the total inventory count?
    - How many films are there in each rating?
    - How's the film inventory level looks like?
    - Which actor/actress is in the most films in store inventory?

        
- **Consumer behavior**:
    - What are the top 10 most popular films that customers rent?
    - How many days do customer usually rent?
    - Do customer usually rent on weekdays or weekends?
    - What are the top 10 films that customers rented for the longest period?
    - Among the films that rented for the longest days, what are the top ones rented for the most time?
    - What is the average rental period?
    - Which genres are most popular? 
    - Who are identified as loyalty customers?
    - Which actors/actresses are most popular given our rental history?
    

- **Store performance**:
    - How many stores are there and how many staff in each store?
    - How many staff each store has?
    - What's the number of transaction each month for both store?
    - Which store has more customer rented the film?
    - Which store makes the most money? 
  
  
- **Sales summary**:
    - Do we make the most money from long or short rentals?
    - Monitor customers’ owing balance and find overdue DVDs ??????? 

In [1]:
#pip install ipython-sql #SQL magic function

In [2]:
# pip install pymysql

In [3]:
# Loading the SQL module
%load_ext sql

In [4]:
# Connect to database
%sql mysql+pymysql://root:rootpass@localhost/sakila

# Inventory summary

### What's the total value of all the inventory and total inventory value of each store?

In [5]:
%%sql

select distinct i.store_id,
       sum(f.replacement_cost) over (partition by i.store_id) store_total_value,
       sum(f.replacement_cost) over () total_value
from inventory i
join film f
on i.film_id = f.film_id

 * mysql+pymysql://root:***@localhost/sakila
2 rows affected.


store_id,store_total_value,total_value
1,46205.3,92621.19
2,46415.89,92621.19


### How many film are there in each category of each store, and the total inventory count?

In [6]:
%%sql

select distinct c.name category_name, 
       count(i.film_id) over (partition by c.name) category_inventory_count,
       count(i.film_id) over () total_inventory_count
from inventory i
left join film_category fc
on i.film_id = fc.film_id
left join category c
on c.category_id = fc.category_id
order by category_inventory_count desc

 * mysql+pymysql://root:***@localhost/sakila
16 rows affected.


category_name,category_inventory_count,total_inventory_count
Sports,344,4581
Animation,335,4581
Action,312,4581
Sci-Fi,312,4581
Family,310,4581
Drama,300,4581
Foreign,300,4581
Documentary,294,4581
Games,276,4581
New,275,4581


### How many films are there in each rating?

In [7]:
%%sql

select distinct f.rating, 
       count(i.inventory_id) over (partition by f.rating) num_of_film
from inventory i
left join film f
on i.film_id = f.film_id

 * mysql+pymysql://root:***@localhost/sakila
5 rows affected.


rating,num_of_film
G,791
PG,924
PG-13,1018
R,904
NC-17,944


### How's the film inventory level looks like?

- For films count < 2, low inventory
- For films count >= 2 and count < =5, medium inventory
- For films count >5, hight inventory

In [40]:
%%sql

with level as (select f.film_id, 
       case 
       when count(i.inventory_id)<2 then 'Low'
       when count(i.inventory_id)>5 then 'High'
       else 'Medium' end as inventory_level
from film f
left join inventory i
on f.film_id = i.film_id
group by f.film_id
order by inventory_level, i.film_id)

select inventory_level, count(inventory_level) num_of_film
from level
group by inventory_level

 * mysql+pymysql://root:***@localhost/sakila
3 rows affected.


inventory_level,num_of_film
High,375
Low,42
Medium,583


### Which actor/actress is in the most films in store inventory?

In [9]:
%%sql

select distinct concat(a.first_name, ' ', a.last_name) actor_name, 
       count(i.inventory_id) over (partition by fa.actor_id) num_of_film_played
from inventory i
left join film_actor fa
on i.film_id = fa.film_id
left join actor a
on fa.actor_id = a.actor_id
order by num_of_film_played desc
limit 10

 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


actor_name,num_of_film_played
GINA DEGENERES,214
MATTHEW CARREY,198
MARY KEITEL,192
WALTER TORN,186
ANGELA WITHERSPOON,184
VAL BOLGER,177
JAYNE NOLTE,177
SANDRA KILMER,174
HENRY BERRY,170
WARREN NOLTE,168


# 7: In the calendar table, how many listings charge different prices for weekends and weekdays?


# Consumer behavior

### What are the top 10 most popular films that customers rent?

In [10]:
%%sql

select f.title, 
       count(r.inventory_id) times_rented
from rental r
left join inventory i
on r.inventory_id = i.inventory_id
left join film f
on i.film_id = f.film_id
group by f.title
order by count(r.inventory_id) desc
limit 10

 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


title,times_rented
BUCKET BROTHERHOOD,34
ROCKETEER MOTHER,33
RIDGEMONT SUBMARINE,32
GRIT CLOCKWORK,32
SCALAWAG DUCK,32
JUGGLER HARDLY,32
FORWARD TEMPLE,32
HOBBIT ALIEN,31
ROBBERS JOON,31
ZORRO ARK,31


### How many days do customer usually rent?

In [26]:
%%sql 

select distinct datediff(return_date, rental_date) rental_days,
       count(datediff(return_date, rental_date)) over(partition by datediff(return_date, rental_date)) num_of_rental
from rental
order by num_of_rental desc

 * mysql+pymysql://root:***@localhost/sakila
12 rows affected.


rental_days,num_of_rental
7.0,1821
2.0,1795
6.0,1783
8.0,1762
5.0,1761
3.0,1714
9.0,1691
4.0,1681
1.0,1644
0.0,105


### Do customer usually rent on weekdays or weekends?

1=Sunday, 2=Monday, 3=Tuesday, 4=Wednesday, 5=Thursday, 6=Friday, 7=Saturday.

In [33]:
%%sql

select case 
       when dayofweek(rental_date) in (2,3,4,5,6) then 'weekday'
       when dayofweek(rental_date) in (1,7) then 'weekend'
       end as rental_day,
       count(rental_id) num_of_rental
from rental
group by rental_day

 * mysql+pymysql://root:***@localhost/sakila
2 rows affected.


rental_day,num_of_rental
weekday,11413
weekend,4631


### What are the top films that customers rented for the longest period?

In [14]:
%%sql

select f.title, 
       datediff(return_date, rental_date) rental_days
from rental r
left join inventory i
on r.inventory_id = i.inventory_id
left join film f
on i.film_id = f.film_id
order by datediff(return_date, rental_date) desc
limit 10

 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


title,rental_days
ROBBERY BRIGHT,10
UPRISING UPTOWN,10
TRADING PINOCCHIO,10
LOVE SUICIDES,10
HOME PITY,10
BAREFOOT MANCHURIAN,10
JUGGLER HARDLY,10
FORWARD TEMPLE,10
SHOW LORD,10
MOSQUITO ARMAGEDDON,10


### Among the films that rented for the longest days, what are the top ones rented for the most time?

In [15]:
%%sql

with rental_days as (
select f.title, datediff(return_date, rental_date) rental_days
from rental r
left join inventory i
on r.inventory_id = i.inventory_id
left join film f
on i.film_id = f.film_id
),

rental_times as (select f.title, count(r.inventory_id) times_rented
from rental r
left join inventory i
on r.inventory_id = i.inventory_id
left join film f
on i.film_id = f.film_id
group by f.title
)

select distinct rd.title, rental_days, times_rented
from rental_days rd
left join rental_times rt
on rd.title = rt.title
where rental_days = 10
order by times_rented desc
limit 10

 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


title,rental_days,times_rented
ROCKETEER MOTHER,10,33
FORWARD TEMPLE,10,32
JUGGLER HARDLY,10,32
TIMBERLAND SKY,10,31
HARRY IDAHO,10,30
CAT CONEHEADS,10,30
CURTAIN VIDEOTAPE,10,27
BLACKOUT PRIVATE,10,27
SWARM GOLD,10,27
FORRESTER COMANCHEROS,10,27


### What is the average rental period?

In [16]:
%%sql

select round(avg(datediff(return_date, rental_date)),0) avg_rental_days
from rental

 * mysql+pymysql://root:***@localhost/sakila
1 rows affected.


avg_rental_days
5


### Which genres are most popular?

In [17]:
%%sql

select c.name, 
       count(r.inventory_id) times_rented
from rental r
left join inventory i
on r.inventory_id = i.inventory_id
left join film f
on i.film_id = f.film_id
left join film_category fc
on f.film_id = fc.film_id
left join category c
on fc.category_id = c.category_id
group by c.name
order by count(r.inventory_id) desc

 * mysql+pymysql://root:***@localhost/sakila
16 rows affected.


name,times_rented
Sports,1179
Animation,1166
Action,1112
Sci-Fi,1101
Family,1096
Drama,1060
Documentary,1050
Foreign,1033
Games,969
Children,945


### Who are identified as loyalty customers?

In [18]:
%%sql

select concat(first_name, ' ', last_name) name, 
       sum(amount) total_payment, 
       count(rental_id) times_rented, 
       round(sum(amount)/count(rental_id),2) average_spend
from customer c
left join payment p
on c.customer_id = p.customer_id
group by name
order by sum(amount) desc, count(rental_id) desc
limit 10

 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


name,total_payment,times_rented,average_spend
KARL SEAL,221.55,45,4.92
ELEANOR HUNT,216.54,46,4.71
CLARA SHAW,195.58,42,4.66
RHONDA KENNEDY,194.61,39,4.99
MARION SNYDER,194.61,39,4.99
TOMMY COLLAZO,186.62,38,4.91
WESLEY BULL,177.6,40,4.44
TIM CARY,175.61,39,4.5
MARCIA DEAN,175.58,42,4.18
ANA BRADLEY,174.66,34,5.14


### Which actors/actresses are most popular given our rental history?

In [19]:
%%sql

select concat(first_name, ' ', last_name) name, count(r.inventory_id) num_of_rentals
from actor a
left join film_actor fa
on a.actor_id = fa.actor_id
right join inventory i 
on i.film_id = fa.film_id
right join rental r
on r.inventory_id = i.inventory_id
group by 1
order by count(r.rental_id) desc
limit 10

 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


name,num_of_rentals
SUSAN DAVIS,825
GINA DEGENERES,753
MATTHEW CARREY,678
MARY KEITEL,674
ANGELA WITHERSPOON,654
WALTER TORN,640
HENRY BERRY,612
JAYNE NOLTE,611
VAL BOLGER,605
SANDRA KILMER,604


# Store Performance

### How many stores are there?

In [20]:
%%sql

select distinct store_id
from store

 * mysql+pymysql://root:***@localhost/sakila
2 rows affected.


store_id
1
2


### How many staff each store has?

In [21]:
%%sql

select store_id, staff_id
from staff

 * mysql+pymysql://root:***@localhost/sakila
2 rows affected.


store_id,staff_id
1,1
2,2


### What's the number of transaction each month for both store?

In [22]:
%%sql

select distinct s.store_id,
       substr(p.payment_date, 1, 7) date,
       count(payment_id) over (partition by s.store_id, substr(p.payment_date, 1, 7)) num_of_transaction,
       count(payment_id) over (partition by s.store_id) store_total_transaction
from payment p
left join staff s
on s.staff_id = p.staff_id

 * mysql+pymysql://root:***@localhost/sakila
10 rows affected.


store_id,date,num_of_transaction,store_total_transaction
1,2005-08,2835,8054
1,2006-02,95,8054
1,2005-05,617,8054
1,2005-06,1163,8054
1,2005-07,3344,8054
2,2005-07,3365,7990
2,2005-08,2851,7990
2,2006-02,87,7990
2,2005-05,539,7990
2,2005-06,1148,7990


###  What is store has more customer rented the film?

In [23]:
%%sql

select i.store_id, 
       count(distinct r.rental_id) num_of_rentals
from rental r
left join inventory i
on r.inventory_id = i.inventory_id
group by i.store_id
order by count(distinct r.rental_id)

 * mysql+pymysql://root:***@localhost/sakila
2 rows affected.


store_id,num_of_rentals
1,7923
2,8121


### Which store makes the most money?

In [24]:
%%sql

select s.store_id, sum(p.amount) sale
from payment p
left join staff s
on s.staff_id = p.staff_id
group by s.store_id
order by sum(p.amount)

 * mysql+pymysql://root:***@localhost/sakila
2 rows affected.


store_id,sale
1,33482.5
2,33924.06


## Sales information

In [None]:
%%sql
# How many rentals happened from 2005-05 to 2005-08?

select count(rental_id) num_of_rental
from rental
where rental_date between '2005-05-01' and '2005-08-31'

In [None]:
%%sql
# What's the rental amount by month?

select substr(rental_date, 1, 7) month, count(rental_id) num_of_rental
from rental
group by month

In [None]:
%%sql
# Rank the staff by total rental volumes for all time period

select first_name, last_name, count(rental_id) num_of_rental
from rental r
join staff s
on r.staff_id = s.staff_id
group by first_name, last_name

## Sales 

In [None]:
%%sql
# How much revenues made from 2005-05 to 2005-08 by month?

select substr(payment_date, 1, 7) month, sum(amount) sales
from payment
where payment_date between '2005-05-01' and '2005-08-31'
group by month

In [None]:
%%sql
# How much revenues made from 2005-05 to 2005-08 by month?

select s.store_id, substr(payment_date, 1, 7) month, sum(amount) sales
from payment p
join staff s
on p.staff_id = s.staff_id
where payment_date between '2005-05-01' and '2005-08-31'
group by s.store_id, month

In [None]:
%%sql
# What are the popular film category?


In [None]:
%%sql
# What are the unpopular movies? So the manager have the option to put those for sale to free up shelf space for newer ones. 

select i.film_id, f.title, c.name category, count(r.inventory_id) num_of_rental
from inventory i
left join rental r 
on i.inventory_id = r.inventory_id
left join film f
on i.film_id = f.film_id
left join film_category fc
on f.film_id = fc.film_id
left join category c
on fc.category_id = c.category_id
group by 1,2,3
order by num_of_rental

In [None]:
%%sql 

select * 
from sales_by_store