In [62]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [64]:
%load_ext sql

from sqlalchemy import create_engine
engine = create_engine('postgresql://localhost/dvdrental')

%sql postgresql://localhost/dvdrental

'Connected: @dvdrental'

## Question 1:

We want to find out how the two stores compare in their count of rental orders during every month for all the years we have data for. 

Write a query that returns the store ID for the store, the year and month and the number of rental orders each store has fulfilled for that month. 

Your table should include a column for each of the following: year, month, store ID and count of rental orders fulfilled during that month.

In [9]:
%%sql
SELECT DATE_PART('month', r.rental_date)::integer AS Rental_month, 
       DATE_PART('year', r.rental_date)::integer AS Rental_year,
       st.store_id,
       COUNT(r.*) count_rentals
FROM rental r
JOIN staff sf ON sf.staff_id = r.staff_id
JOIN store st ON st.store_id = sf.store_id
GROUP BY 2,1,3
ORDER BY 4 DESC;

 * postgresql://localhost/dvdrental
10 rows affected.


rental_month,rental_year,store_id,count_rentals
7,2005,2,3367
7,2005,1,3342
8,2005,1,2892
8,2005,2,2794
6,2005,1,1163
6,2005,2,1148
5,2005,2,598
5,2005,1,558
2,2006,2,97
2,2006,1,85


## Question 2

We would like to know who were our **top 10 paying customers**, how many payments they made on a monthly basis during 2007, and what was the amount of the monthly payments. 

Can you write a query to capture the customer name, month and year of payment, and total payment amount for each month by these top 10 paying customers?

In [43]:
%%sql
WITH t1 AS
(SELECT customer_id,
       SUM(amount)
FROM payment
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10),

t2 AS
(SELECT * 
 FROM payment 
 WHERE DATE_PART('year', payment_date) = 2007)


SELECT DATE_TRUNC('month', t2.payment_date) pay_mon,
       cus.first_name||' '||cus.last_name customer,
       COUNT(t2.*) payCount,
       SUM(t2.amount) paySum
FROM customer cus
JOIN t1 ON t1.customer_id = cus.customer_id
JOIN t2 ON t2.customer_id = cus.customer_id
GROUP BY 1,2
ORDER BY 2,1;

 * postgresql://localhost/dvdrental
34 rows affected.


pay_mon,customer,paycount,paysum
2007-02-01 00:00:00,Ana Bradley,4,19.96
2007-03-01 00:00:00,Ana Bradley,16,71.84
2007-04-01 00:00:00,Ana Bradley,12,72.88
2007-05-01 00:00:00,Ana Bradley,1,2.99
2007-02-01 00:00:00,Clara Shaw,6,22.94
2007-03-01 00:00:00,Clara Shaw,16,72.84
2007-04-01 00:00:00,Clara Shaw,18,93.82
2007-02-01 00:00:00,Curtis Irby,6,22.94
2007-03-01 00:00:00,Curtis Irby,17,86.83
2007-04-01 00:00:00,Curtis Irby,14,54.86


## Question 3

Finally, for each of these **top 10 paying customers**, I would like to find out the difference across their monthly payments during 2007.

Please go ahead and **write a query to compare the payment amounts in each successive month**. Repeat this for each of these 10 paying customers. Also, it will be tremendously helpful if you can identify the customer name who paid the most difference in terms of payments.

In [77]:
%%sql

WITH t1 AS
(SELECT customer_id,
       SUM(amount)
FROM payment
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10),

t2 AS
(SELECT * 
 FROM payment 
 WHERE DATE_PART('year', payment_date) = 2007),

t3 AS
(SELECT t1.customer_id, 
       DATE_TRUNC('month', payment_date) pay_mon,
       SUM(amount) pay_amount
FROM t1
JOIN t2 ON t1.customer_id=t2.customer_id
GROUP BY 1,2)

SELECT  cus.first_name||' '||cus.last_name customer,
        pay_mon,
        pay_amount,
        COALESCE(LEAD(pay_amount, 1) OVER w1, 0) lead,
        (COALESCE(LEAD(pay_amount, 1) OVER w1, 0) - pay_amount) difference
FROM t3
JOIN customer cus
ON t3.customer_id = cus.customer_id
WINDOW w1 AS (PARTITION BY cus.first_name||' '||cus.last_name ORDER BY pay_mon)
ORDER BY customer, pay_mon

 * postgresql://localhost/dvdrental
34 rows affected.


customer,pay_mon,pay_amount,lead,difference
Ana Bradley,2007-02-01 00:00:00,19.96,71.84,51.88
Ana Bradley,2007-03-01 00:00:00,71.84,72.88,1.04
Ana Bradley,2007-04-01 00:00:00,72.88,2.99,-69.89
Ana Bradley,2007-05-01 00:00:00,2.99,0.0,-2.99
Clara Shaw,2007-02-01 00:00:00,22.94,72.84,49.9
Clara Shaw,2007-03-01 00:00:00,72.84,93.82,20.98
Clara Shaw,2007-04-01 00:00:00,93.82,0.0,-93.82
Curtis Irby,2007-02-01 00:00:00,22.94,86.83,63.89
Curtis Irby,2007-03-01 00:00:00,86.83,54.86,-31.97
Curtis Irby,2007-04-01 00:00:00,54.86,2.99,-51.87
