
# SQL Exercises - DVD Rental Store

<br>

1. [Disclaimer](#disclaimer)
1. [Relevant Information](#info)
1. [Imports](#imports)
1. [Connections](#connection)
1. [Schema](#schema)
1. [Exercises](#Exercises)
    - [Easy Level](#easy)
    - [Medium Level](#medium)
    - [Hard Level](#hard)

<a id=disclaimer></a>

## Disclaimer
***

<div class="span5 alert alert-danger">
    <b>Note:</b> For these exercises I will be using a dataset that represents a DVD rental store. The dataset can be found in <a href=https://www.postgresqltutorial.com/postgresql-sample-database/>this page of postgresqltutorial.com</a>
</div>

[Completely Uninstall & Install PostgreSQL](https://medium.com/@bitadj/completely-uninstall-and-reinstall-psql-on-osx-551390904b86)

**About the exercises** 
- I will be using PostgreSQL, this is a different SQL flavour from the options you have in HackerRank.
- I will decide the questions I will try to answer. 
- I have used a lot of what was created by Zachary Thomas in his post [The Best Medium-Hard Data Analyst SQL Interview Questions](https://quip.com/2gwZArKuWk7W), there are great questions and great solutions to draw insights from.


<a id=info></a>

## Relevant Information
***

Here are some of the basic commands for macOS users

- `brew install postgresql` --> will install postgresql
- `brew services restart postgresql` --> will restart postgresql
- `initdb /usr/local/var/postgres` --> will point to the data directory
- `psql -U postgres` --> will ask for the password to enter your database
- `\du` --> will show the users
- `\l` --> will show the existing db
- `CREATE DATABASE hackerrank;` --> will create the database with the name leetcode (see complete syntax below)
- `\c hackerrank` --> will enter the database
- `\q` --> will close the connection to Postgres
- `CREATE TABLE tb_name;` --> Will create a table in your database
- `DROP TABLE tb_name;` --> Will delete a table from your database

**Complete syntax to create database**<br><br>
`CREATE DATABASE db_name
OWNER =  role_name
TEMPLATE = template
ENCODING = encoding
LC_COLLATE = collate
LC_CTYPE = ctype
TABLESPACE = tablespace_name
CONNECTION LIMIT = max_concurrent_connection`

<a id=imports></a>

## Imports
***

In [1]:
import pandas as pd
import psycopg2
import sqlalchemy

In [2]:
from sqlalchemy import Table, Column, Integer, String, MetaData, VARCHAR, insert, update
from sqlalchemy.orm import sessionmaker

<a id=connection></a>

## Connection
***

In [28]:
from sqlalchemy import create_engine

# Postgres username, password, and database name
POSTGRES_ADDRESS = 'localhost' 
POSTGRES_PORT = '5432'
POSTGRES_USERNAME = 'postgres' 
POSTGRES_PASSWORD = 'LCmd2020!'
POSTGRES_DBNAME = 'dvdrental' 

# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,
                                                                                        password=POSTGRES_PASSWORD,
                                                                                        ipaddress=POSTGRES_ADDRESS,
                                                                                        port=POSTGRES_PORT,
                                                                                        dbname=POSTGRES_DBNAME))
# Create the connection
engine = create_engine(postgres_str) 
Session = sessionmaker(bind=engine)
session = Session()

<a id=schema></a>

## Schema

<br>
<img src="img/schema.png" style="width: 500px;"/>

**Note** I create a dictionary with the table and fields to have them readily available when needed

In [4]:
tables = ['category','inventory', 'customer', 'film_category', 'rental', 'address', 
          'film', 'payment', 'staff', 'city', 'country', 'store', 'actor', 'film_actor', 'language']

In [49]:
dic = dict()
for table in tables:
    df = pd.read_sql_query('SELECT * FROM {}'.format(table), engine)
    fields = df.columns
    dic[table] = fields.to_list()


In [53]:
for key, value in dic.items():
    print(key,value)

category ['category_id', 'name', 'last_update']
inventory ['inventory_id', 'film_id', 'store_id', 'last_update']
customer ['customer_id', 'store_id', 'first_name', 'last_name', 'email', 'address_id', 'activebool', 'create_date', 'last_update', 'active']
film_category ['film_id', 'category_id', 'last_update']
rental ['rental_id', 'rental_date', 'inventory_id', 'customer_id', 'return_date', 'staff_id', 'last_update']
address ['address_id', 'address', 'address2', 'district', 'city_id', 'postal_code', 'phone', 'last_update']
film ['film_id', 'title', 'description', 'release_year', 'language_id', 'rental_duration', 'rental_rate', 'length', 'replacement_cost', 'rating', 'last_update', 'special_features', 'fulltext']
payment ['payment_id', 'customer_id', 'staff_id', 'rental_id', 'amount', 'payment_date']
staff ['staff_id', 'first_name', 'last_name', 'address_id', 'email', 'store_id', 'active', 'username', 'password', 'last_update', 'picture']
city ['city_id', 'city', 'country_id', 'last_updat

<a id=Exercises></a>

## Exercises

<a id=easy></a>

<div class="span5 alert alert-info">
    <h3> Median of inventory</h3>

**Information:** Get the median of films this dvd rental company has on stocks. Use the `inventory` table.
</div>

In [61]:
pd.read_sql_query('''
WITH cte as (SELECT film_id, COUNT(film_id) as counter 
            FROM inventory 
            GROUP BY film_id 
            ORDER BY counter )

SELECT PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY counter) as median 
FROM cte

;''', engine)


Unnamed: 0,median
0,5.0


<div class="span5 alert alert-info">
    <h3> Monthly Active Users (MAU), Retained Users per Month, Churned Users, </h3>

**Information:** Questions from [Zachary Thomas](https://quip.com/2gwZArKuWk7W) <br>
    - Task1: Get the total number of users (people who rented a movie each month)<br>
    - Task2: Write a query that gets the number of retained users per month. In this case, retention for a given month is defined as the number of users who rented in that month who also rented in the immediately previous month. <br>
    - Task3: Now we’ll take retention and turn it on its head: Write a query to find many users last month did not come back this month. i.e. the number of churned users.  <br>
</div>

In [64]:
dic['rental']

['rental_id',
 'rental_date',
 'inventory_id',
 'customer_id',
 'return_date',
 'staff_id',
 'last_update']

In [67]:
pd.read_sql_query('''
SELECT DATE_TRUNC('month', rental_date) as trunc, COUNT(DISTINCT(customer_id))
FROM rental
GROUP BY trunc
ORDER BY trunc
;''', engine)

Unnamed: 0,trunc,count
0,2005-05-01,520
1,2005-06-01,590
2,2005-07-01,599
3,2005-08-01,599
4,2006-02-01,158


In [77]:
pd.read_sql_query('''
SELECT DATE_PART('month', DATE_TRUNC('month', t1.rental_date)) as Month, COUNT(DISTINCT(t1.customer_id))
FROM rental as t1
JOIN rental as t2 ON t1.customer_id = t2.customer_id AND DATE_TRUNC('month', t1.rental_date) = DATE_TRUNC('month', t2.rental_date) + interval '1 month'
GROUP BY DATE_TRUNC('month', t1.rental_date)
;''', engine)

Unnamed: 0,month,count
0,6.0,512
1,7.0,590
2,8.0,599


In [93]:
pd.read_sql_query('''
SELECT DATE_TRUNC('month', t1.rental_date), COUNT(DISTINCT(t1.customer_id))
FROM rental as t1
LEFT JOIN rental as t2 ON t1.customer_id = t2.customer_id AND DATE_TRUNC('month', t1.rental_date) = DATE_TRUNC('month', t2.rental_date) + interval '1 month'
WHERE t2.customer_id IS NULL
GROUP BY DATE_TRUNC('month', t1.rental_date)
;''', engine)

Unnamed: 0,date_trunc,count
0,2005-05-01,520
1,2005-06-01,78
2,2005-07-01,9
3,2006-02-01,158
