## Question 5
`Write a query to retrieve the names of all customers who have made a purchase within 45 days of their last purchase in 2010.`

(Hint: use the LAG() window function and the TIMESTAMPDIFF() function)

## Expected Output:

|   CustomerId |   TotalPurchase | PurchaseDate   | PreviousPurchaseDate   |
|-------------:|----------------:|:---------------|:-----------------------|
|            3 |           13.86 | 2010-04-21     | 2010-03-11             |
|            7 |           18.86 | 2010-01-18     | 2009-12-08             |
|           12 |           13.86 | 2010-12-25     | 2010-11-14             |
|           16 |           13.86 | 2010-09-23     | 2010-08-13             |
|           20 |           13.86 | 2010-06-22     | 2010-05-12             |
|           24 |           15.86 | 2010-03-21     | 2010-02-08             |
|           33 |           13.86 | 2010-11-24     | 2010-10-14             |
|           37 |           13.86 | 2010-08-23     | 2010-07-13             |
|           41 |           13.86 | 2010-05-22     | 2010-04-11             |
|           45 |           21.86 | 2010-02-18     | 2010-01-08             |
|           54 |           13.86 | 2010-10-24     | 2010-09-13             |
|           58 |           13.86 | 2010-07-23     | 2010-06-12             |

In [1]:
%run ../utils/setup_notebook.ipynb
import sys 
sys.path.append('../')
from utils.format_sql_result import format_sql_result

In [2]:
%%sql

SELECT *
FROM customers 
LIMIT 1;

CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3


In [3]:
%%sql

SELECT *
FROM invoices
LIMIT 1;

InvoiceId,CustomerId,InvoiceDate,BillingAddress,BillingCity,BillingState,BillingCountry,BillingPostalCode,Total
1,2,2009-01-01 00:00:00,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,1.98


In [9]:
%%sql 

SELECT c.CustomerId, SUM(i.Total)  AS TotalPurchase, 
DATE(i.InvoiceDate) AS PurchaseDate
FROM customers c 
JOIN invoices i ON c.CustomerId = i.CustomerId
GROUP BY c.CustomerId, PurchaseDate
LIMIT 5;



CustomerId,TotalPurchase,PurchaseDay
1,3.98,2010-03-11
1,3.96,2010-06-13
1,5.94,2010-09-15
1,0.99,2011-05-06
1,1.98,2012-10-27


In [16]:
%%sql 

-- no reference to PreviousPurchaseDate in a WHERE clause in the SELECT statement
WITH customer_purchases AS(
    SELECT c.CustomerId, 
        SUM(i.Total) AS TotalPurchase, 
        DATE(i.InvoiceDate) AS PurchaseDate
    FROM customers c 
    JOIN invoices i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId, PurchaseDate
)
SELECT 
    CustomerId,
    TotalPurchase,
    PurchaseDate,
    LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate) AS PreviousPurchaseDate
FROM customer_purchases
LIMIT 5;

CustomerId,TotalPurchase,PurchaseDate,PreviousPurchaseDate
1,3.98,2010-03-11,
1,3.96,2010-06-13,2010-03-11
1,5.94,2010-09-15,2010-06-13
1,0.99,2011-05-06,2010-09-15
1,1.98,2012-10-27,2011-05-06


#### ` `**WINDOW FUNCTIONS**`  such as  `**LAG()**`  are evaluated after  `**WHERE**` clause. Therefore, the WHERE clause cannot reference a column alias defined using a window function.`

#### ⛔️ `MUST USE THE WINDOW FUNCTION INSIDE A `**Common Table Expression(CTE)**` or `**Subquery!**``

In [19]:
%%sql 

-- results in error 
WITH customer_purchases AS(
    SELECT c.CustomerId, 
        SUM(i.Total) AS TotalPurchase, 
        DATE(i.InvoiceDate) AS PurchaseDate
    FROM customers c 
    JOIN invoices i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId, PurchaseDate
)
SELECT 
    CustomerId,
    TotalPurchase,
    PurchaseDate,
    LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate) AS PreviousPurchaseDate
FROM customer_purchases
WHERE PreviousPurchaseDate IS NOT NULL AND 
    TIMESTAMPDIFF(DAY, PreviousPurchaseDate, PurchaseDate) <= 30

(mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'PreviousPurchaseDate' in 'where clause'
[SQL: WITH customer_purchases AS(
    SELECT c.CustomerId, 
        SUM(i.Total) AS TotalPurchase, 
        DATE(i.InvoiceDate) AS PurchaseDate
    FROM customers c 
    JOIN invoices i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId, PurchaseDate
)
SELECT 
    CustomerId,
    TotalPurchase,
    PurchaseDate,
    LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate) AS PreviousPurchaseDate
FROM customer_purchases
WHERE PreviousPurchaseDate IS NOT NULL AND 
    TIMESTAMPDIFF(DAY, PreviousPurchaseDate, PurchaseDate) <= 30]
(Background on this error at: https://sqlalche.me/e/20/f405)


In [22]:
%%sql 

-- difficult to read and results in error 
WITH customer_purchases AS(
    SELECT c.CustomerId, 
        SUM(i.Total) AS TotalPurchase, 
        DATE(i.InvoiceDate) AS PurchaseDate
    FROM customers c 
    JOIN invoices i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId, PurchaseDate
)
SELECT 
    CustomerId,
    TotalPurchase,
    PurchaseDate,
    LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate) AS PreviousPurchaseDate
FROM customer_purchases
WHERE LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate) IS NOT NULL AND 
    TIMESTAMPDIFF(DAY, LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate), PurchaseDate) <= 30


(mysql.connector.errors.DatabaseError) 3593 (HY000): You cannot use the window function 'lag' in this context.'
[SQL: -- results in error 
WITH customer_purchases AS(
    SELECT c.CustomerId, 
        SUM(i.Total) AS TotalPurchase, 
        DATE(i.InvoiceDate) AS PurchaseDate
    FROM customers c 
    JOIN invoices i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId, PurchaseDate
)
SELECT 
    CustomerId,
    TotalPurchase,
    PurchaseDate,
    LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate) AS PreviousPurchaseDate
FROM customer_purchases
WHERE LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate) IS NOT NULL AND 
    TIMESTAMPDIFF(DAY, LAG(PurchaseDate, 1) OVER(
        PARTITION BY CustomerId 
        ORDER BY PurchaseDate), PurchaseDate) <= 30]
(Background on this error at: https://sqlalche.me/e/20/4xp6)


In [38]:
%%sql 

-- modify the sql_mode of the MySQL server to enable 
-- non-aggregated columns in the SELECT statement

SET sql_mode = '';

WITH customer_purchases AS(
    SELECT c.CustomerId, 
        SUM(i.Total) AS TotalPurchase, 
        DATE(i.InvoiceDate) AS PurchaseDate,
        LAG(DATE(i.InvoiceDate), 1) OVER (
            PARTITION BY c.CustomerId 
            ORDER BY DATE(i.InvoiceDate)) AS PreviousPurchaseDate
    FROM customers c 
    JOIN invoices i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId, PurchaseDate
)
SELECT 
    CustomerId,
    TotalPurchase,
    PurchaseDate,
    PreviousPurchaseDate
FROM customer_purchases
WHERE PreviousPurchaseDate IS NOT NULL 
    AND YEAR(PurchaseDate) = 2010
    AND TIMESTAMPDIFF(DAY, PreviousPurchaseDate, PurchaseDate) < 45;


CustomerId,TotalPurchase,PurchaseDate,PreviousPurchaseDate
3,13.86,2010-04-21,2010-03-11
7,18.86,2010-01-18,2009-12-08
12,13.86,2010-12-25,2010-11-14
16,13.86,2010-09-23,2010-08-13
20,13.86,2010-06-22,2010-05-12
24,15.86,2010-03-21,2010-02-08
33,13.86,2010-11-24,2010-10-14
37,13.86,2010-08-23,2010-07-13
41,13.86,2010-05-22,2010-04-11
45,21.86,2010-02-18,2010-01-08


#### Generate the `Expected Output` markdown text input.

In [39]:
%%sql lag_result << SET sql_mode = '';

WITH customer_purchases AS(
    SELECT c.CustomerId, 
        SUM(i.Total) AS TotalPurchase, 
        DATE(i.InvoiceDate) AS PurchaseDate,
        LAG(DATE(i.InvoiceDate), 1) OVER (
            PARTITION BY c.CustomerId 
            ORDER BY DATE(i.InvoiceDate)) AS PreviousPurchaseDate
    FROM customers c 
    JOIN invoices i ON c.CustomerId = i.CustomerId
    GROUP BY c.CustomerId, PurchaseDate
)
SELECT 
    CustomerId,
    TotalPurchase,
    PurchaseDate,
    PreviousPurchaseDate
FROM customer_purchases
WHERE PreviousPurchaseDate IS NOT NULL 
    AND YEAR(PurchaseDate) = 2010
    AND TIMESTAMPDIFF(DAY, PreviousPurchaseDate, PurchaseDate) < 45;

Returning data to local variable lag_result


In [40]:
format_sql_result(lag_result)

|   CustomerId |   TotalPurchase | PurchaseDate   | PreviousPurchaseDate   |
|-------------:|----------------:|:---------------|:-----------------------|
|            3 |           13.86 | 2010-04-21     | 2010-03-11             |
|            7 |           18.86 | 2010-01-18     | 2009-12-08             |
|           12 |           13.86 | 2010-12-25     | 2010-11-14             |
|           16 |           13.86 | 2010-09-23     | 2010-08-13             |
|           20 |           13.86 | 2010-06-22     | 2010-05-12             |
|           24 |           15.86 | 2010-03-21     | 2010-02-08             |
|           33 |           13.86 | 2010-11-24     | 2010-10-14             |
|           37 |           13.86 | 2010-08-23     | 2010-07-13             |
|           41 |           13.86 | 2010-05-22     | 2010-04-11             |
|           45 |           21.86 | 2010-02-18     | 2010-01-08             |
|           54 |           13.86 | 2010-10-24     | 2010-09-13             |