<img src="img/datacamp-vector-logo.png" style="width: 600px;"/>


# SQL Exercises - DataCamp

<br>

1. [Disclaimer](#disclaimer)
1. [Relevant Information](#info)
1. [Imports](#imports)
1. [Connections](#connection)
1. [Exercises](#Exercises)
    - [Delivr Company](#delivr)
     - [Revenue, Cost, Profit](#delivr)
     - [Registration and Active Users](#registration)
     - [Running Total](#running)
     - [Monthly Active Users](#mau)  
     - [Growth Over Time](#delta)  
     - [Unit Economics](#unit)  
           - [Average Revenue per User (ARPU)](#arpu)  
           - [Weekly Average Revenue per User (ARPU)](#warpu)  
     - [Histograms](#histo)           
     - [Percentiles](#percentiles)  
     - [Dates](#date)     
     - [Ranking](#rank)     
     - [Pivoting](#pivot)          

<a id=disclaimer></a>

## Disclaimer
***

<div class="span5 alert alert-danger">
    <b>Note:</b> The queries in these exercises are from the website DataCamp. Their courses are great for whoever wants to learn how to code. They have many courses on SQL so I decided to try out some of them. Their guided examples are very helpful and here I keep some queries that were useful from the courses. I recommend you go to <a href=https://www.datacamp.com/courses/tech:sql>DataCamp's website</a> and do the courses yourself 
</div>

[Completely Uninstall & Install PostgreSQL](https://medium.com/@bitadj/completely-uninstall-and-reinstall-psql-on-osx-551390904b86)

**About the exercises** 
- I will be using PostgreSQL
- Some datasets were not available for download from DataCamps website so I just write the code for them 

<a id=info></a>

## Relevant Information
***

Here are some of the basic commands for macOS users

- `brew install postgresql` --> will install postgresql
- `brew services restart postgresql` --> will restart postgresql
- `initdb /usr/local/var/postgres` --> will point to the data directory
- `psql -U postgres` --> will ask for the password to enter your database
- `\du` --> will show the users
- `\l` --> will show the existing db
- `CREATE DATABASE hackerrank;` --> will create the database with the name leetcode (see complete syntax below)
- `\c hackerrank` --> will enter the database
- `\q` --> will close the connection to Postgres
- `CREATE TABLE tb_name;` --> Will create a table in your database
- `DROP TABLE tb_name;` --> Will delete a table from your database

**Complete syntax to create database**<br><br>
`CREATE DATABASE db_name
OWNER =  role_name
TEMPLATE = template
ENCODING = encoding
LC_COLLATE = collate
LC_CTYPE = ctype
TABLESPACE = tablespace_name
CONNECTION LIMIT = max_concurrent_connection`

<a id=imports></a>

## Imports
***

In [21]:
import pandas as pd
import numpy as np
import psycopg2
import sqlalchemy

In [22]:
from sqlalchemy import Table, Column, Integer, String, MetaData, VARCHAR, insert, update
from sqlalchemy.orm import sessionmaker

<a id=connection></a>

## Connection
***

In [23]:
from config import config
params = config()

In [24]:
from sqlalchemy import create_engine

# Postgres username, password, and database name
POSTGRES_ADDRESS = params['host']
POSTGRES_PORT = params['port']
POSTGRES_USERNAME = params['username']
POSTGRES_PASSWORD = params['password']
POSTGRES_DBNAME = 'datacamp'

# A long string that contains the necessary Postgres login information
postgres_str = ('postgresql://{username}:{password}@{ipaddress}:{port}/{dbname}'.format(username=POSTGRES_USERNAME,
                                                                                        password=POSTGRES_PASSWORD,
                                                                                        ipaddress=POSTGRES_ADDRESS,
                                                                                        port=POSTGRES_PORT,
                                                                                        dbname=POSTGRES_DBNAME))
# Create the connection
engine = create_engine(postgres_str) 
Session = sessionmaker(bind=engine)
session = Session()

<a id=schema></a>

<a id=delivr></a>

## Delivr Company

<div class="span5 alert alert-info">
    <h3> Revenue,Costs, Profit</h3>

**Information:** In this exercise you had to calculate the profit that the company generated each month. To do so you needed to create two `ctes` for the revenues and costs and then calculate the final profit by month.
</div>

In [26]:
pd.read_sql_query(
'''

-- Set up the revenue CTE
WITH revenue AS ( 
    SELECT
        DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
        SUM(meal_price*order_quantity) AS revenue
    FROM meals
    JOIN orders ON meals.meal_id = orders.meal_id
    GROUP BY delivr_month),

-- Set up the cost CTE
  cost AS (
    SELECT
        DATE_TRUNC('month', stocking_date) :: DATE AS delivr_month,
        SUM(meal_cost*stocked_quantity) AS cost
    FROM meals
    JOIN stock ON meals.meal_id = stock.meal_id
    GROUP BY delivr_month)

-- Calculate profit by joining the CTEs
    SELECT
        revenue.delivr_month,
        SUM(revenue-cost) AS profit
    FROM revenue
    JOIN cost ON revenue.delivr_month = cost.delivr_month
    GROUP BY revenue.delivr_month
    ORDER BY revenue.delivr_month ASC;

;'''
, engine)

Unnamed: 0,delivr_month,profit
0,2018-06-01,4073.5
1,2018-07-01,6575.5
2,2018-08-01,9974.25
3,2018-09-01,15339.5
4,2018-10-01,23087.5
5,2018-11-01,38743.0
6,2018-12-01,70300.5


<a id=filter></a>

<a id=registration></a>

<div class="span5 alert alert-info">
    <h3> Registrations and Active Users</h3>

**Information:** Registrations is the date when a customer first became a customer (sometimes when they made their first order) and active users which counts the number of users in a period of time DAU (daily) MAU (monthly) <br>
</div>

### Registrations

In [27]:
pd.read_sql_query('''

-- Wraps the query you wrote in a CTE named reg_dates
WITH reg_dates AS (
  SELECT
    user_id,
    MIN(order_date) AS reg_date
  FROM orders
  GROUP BY user_id)

SELECT
  -- Counts the unique user IDs by registration month
  DATE_TRUNC('month', reg_date)::DATE AS delivr_month,
  COUNT(DISTINCT(user_id)) AS regs
FROM reg_dates
GROUP BY delivr_month
ORDER BY delivr_month ASC; 

;''', engine)

Unnamed: 0,delivr_month,regs
0,2018-06-01,123
1,2018-07-01,140
2,2018-08-01,157
3,2018-09-01,176
4,2018-10-01,199
5,2018-11-01,231
6,2018-12-01,278


### Monthly Active Users

In [28]:
pd.read_sql_query('''

SELECT
  -- Truncate the order date to the nearest month
  DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
  -- Count the unique user IDs
  COUNT(DISTINCT(user_id)) AS mau
FROM orders
GROUP BY delivr_month
-- Order by month
ORDER BY delivr_month ASC;

;''', engine)

Unnamed: 0,delivr_month,mau
0,2018-06-01,123
1,2018-07-01,226
2,2018-08-01,337
3,2018-09-01,489
4,2018-10-01,689
5,2018-11-01,944
6,2018-12-01,1267


<a id=running></a>

<div class="span5 alert alert-info">
    <h3> Window Functions - Running Total </h3>

**Information:** Check the increase in customers each month<br>
</div>

**Note:** unfortunately there are no different creation dates, only one month

In [29]:
pd.read_sql_query('''

WITH reg_dates AS (
  SELECT
    user_id,
    MIN(order_date) AS reg_date
  FROM orders
  GROUP BY user_id),

  regs AS (
  SELECT
    DATE_TRUNC('month', reg_date) :: DATE AS delivr_month,
    COUNT(DISTINCT user_id) AS regs
  FROM reg_dates
  GROUP BY delivr_month)

SELECT
  -- Calculate the registrations running total by month
  DATE_TRUNC('month', delivr_month) :: DATE AS delivr_month,
  SUM(regs) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS regs_rt
FROM regs
ORDER BY delivr_month ASC; 

;''', engine)

Unnamed: 0,delivr_month,regs_rt
0,2018-06-01,123.0
1,2018-07-01,263.0
2,2018-08-01,420.0
3,2018-09-01,596.0
4,2018-10-01,795.0
5,2018-11-01,1026.0
6,2018-12-01,1304.0


<a id=mau></a>

<div class="span5 alert alert-info">
    <h3> Monthly Active Users (MAU)</h3>

**Information:** In this exercise you had to calculate the amount of monthly active users and compare them with the previous month<br>
</div>

In [30]:
pd.read_sql_query('''

WITH mau AS (
  SELECT
    DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
    COUNT(DISTINCT user_id) AS mau
  FROM orders
  GROUP BY delivr_month)

SELECT
  -- Select the month and the MAU
  DATE_TRUNC('month', delivr_month) :: DATE AS delivr_month,
  mau,
  COALESCE(LAG(mau) OVER(ORDER BY delivr_month), 0) AS last_mau
FROM mau
-- Order by month in ascending order
ORDER BY delivr_month ASC;

;''', engine)

Unnamed: 0,delivr_month,mau,last_mau
0,2018-06-01,123,0
1,2018-07-01,226,123
2,2018-08-01,337,226
3,2018-09-01,489,337
4,2018-10-01,689,489
5,2018-11-01,944,689
6,2018-12-01,1267,944


<a id=delta></a>

<div class="span5 alert alert-info">
    <h3> Growth over time (deltas)</h3>

**Information:** Absolute values and relative values - Calculate the growth rate (current value - previous value / previous value). It is useful to calculate how a KPI varies over time.<br>
</div>

### Growth Rate with Absolute Value

In [31]:
pd.read_sql_query('''

WITH mau AS (
  SELECT
    DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
    COUNT(DISTINCT user_id) AS mau
  FROM orders
  GROUP BY delivr_month),

  mau_with_lag AS (
  SELECT
    delivr_month,
    mau,
    -- Fetch the previous month's MAU
    COALESCE(
      LAG(mau) OVER(ORDER BY delivr_month),
    0) AS last_mau
  FROM mau)

SELECT
  -- Calculate each month's delta of MAUs
  delivr_month,
  mau-last_mau AS mau_delta
FROM mau_with_lag
-- Order by month in ascending order
ORDER BY delivr_month ASC;

;''', engine)

Unnamed: 0,delivr_month,mau_delta
0,2018-06-01,123
1,2018-07-01,103
2,2018-08-01,111
3,2018-09-01,152
4,2018-10-01,200
5,2018-11-01,255
6,2018-12-01,323


### Growth Rate with Relative Value

In [32]:
pd.read_sql_query('''

WITH mau AS (
  SELECT
    DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
    COUNT(DISTINCT user_id) AS mau
  FROM orders
  GROUP BY delivr_month),

  mau_with_lag AS (
  SELECT
    delivr_month,
    mau,
    GREATEST(
      LAG(mau) OVER (ORDER BY delivr_month ASC),
    1) AS last_mau
  FROM mau)

SELECT
  -- Calculate the MoM MAU growth rates
  delivr_month,
  ROUND(
    ((mau-last_mau)::NUMERIC/last_mau),
  2) AS growth
FROM mau_with_lag
-- Order by month in ascending order
ORDER BY delivr_month;

;''', engine)

Unnamed: 0,delivr_month,growth
0,2018-06-01,122.0
1,2018-07-01,0.84
2,2018-08-01,0.49
3,2018-09-01,0.45
4,2018-10-01,0.41
5,2018-11-01,0.37
6,2018-12-01,0.34


<a id=retention></a>

<div class="span5 alert alert-info">
    <h3> Retention of Users</h3>

**Information:** MAU can be broken down into three categories, New Users (they join this month), Active Users (were present last month), Resurrected Users (were not present last month but returned to the service). <br>
    
**Note**: We use GREATEST in the query, because it serves to avoid dividing by 0 (if 0, then 1) 
    
</div>

In [33]:
pd.read_sql_query('''

WITH user_monthly_activity AS (
  SELECT DISTINCT
    DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
    user_id
  FROM orders)

SELECT
  -- Calculate the MoM retention rates
  previous.delivr_month,
  ROUND(
    (COUNT(DISTINCT(current.user_id))::NUMERIC) /
    GREATEST(COUNT(DISTINCT(previous.user_id)),1),
  2) AS retention_rate
FROM user_monthly_activity AS previous
LEFT JOIN user_monthly_activity AS current
-- Fill in the user and month join conditions
ON previous.user_id = current.user_id
AND previous.delivr_month = current.delivr_month - interval '1 month'
GROUP BY previous.delivr_month
ORDER BY previous.delivr_month ASC;

;''', engine)

Unnamed: 0,delivr_month,retention_rate
0,2018-06-01,0.7
1,2018-07-01,0.7
2,2018-08-01,0.76
3,2018-09-01,0.83
4,2018-10-01,0.9
5,2018-11-01,0.96
6,2018-12-01,0.0


<a id=unit></a>

<div class="span5 alert alert-info">
    <h3> Unit Economics</h3>

**Information:** Unit economics measures performance per unit, as opposed to overall performance. Example: Average Revenue per User ($\frac{revenue} {count  of users}$).
    Distributions, histograms and percentiles<br>
    
</div>

<a id=arpu></a>

### ARPU (Average Revenue per User)

In [34]:
pd.read_sql_query('''

-- Create a CTE named kpi
WITH kpi AS (
  SELECT
    -- Select the user ID and calculate revenue
    user_id,
    SUM(m.meal_price * o.order_quantity) AS revenue
  FROM meals AS m
  JOIN orders AS o ON m.meal_id = o.meal_id
  GROUP BY user_id)
-- Calculate ARPU
SELECT ROUND(AVG(revenue) :: NUMERIC, 2) AS arpu
FROM kpi;

;''', engine)

Unnamed: 0,arpu
0,199.56


<a id=warpu></a>

### Weekly ARPU (Average Revenue per User)

In [35]:
pd.read_sql_query('''

WITH kpi AS (
  SELECT
    -- Select the week, revenue, and count of users
    DATE_TRUNC('week', order_date) :: DATE AS delivr_week,
    SUM(order_quantity*meal_price) AS revenue,
    COUNT(DISTINCT(user_id)) AS users
  FROM meals AS m
  JOIN orders AS o ON m.meal_id = o.meal_id
  GROUP BY delivr_week)

SELECT
  delivr_week,
  -- Calculate ARPU
  ROUND(
    revenue :: NUMERIC / GREATEST(users,1),
  2) AS arpu
FROM kpi
-- Order by week in ascending order
ORDER BY delivr_week ASC;

;''', engine)

Unnamed: 0,delivr_week,arpu
0,2018-05-28,22.69
1,2018-06-04,28.71
2,2018-06-11,30.34
3,2018-06-18,27.19
4,2018-06-25,25.64
5,2018-07-02,25.77
6,2018-07-09,28.5
7,2018-07-16,25.12
8,2018-07-23,24.25
9,2018-07-30,27.78


### Average Orders by User

In [36]:
pd.read_sql_query('''

WITH kpi AS (
  SELECT
    -- Select the count of orders and users
    COUNT(DISTINCT(order_id)) AS orders,
    COUNT(DISTINCT(user_id)) AS users
  FROM orders)

SELECT
  -- Calculate the average orders per user
  ROUND(
    orders :: NUMERIC / GREATEST(users,1),
  2) AS arpu
FROM kpi;

;''', engine)

Unnamed: 0,arpu
0,8.7


<a id=histo></a>

<div class="span5 alert alert-info">
    <h3> Histograms</h3>

**Information:** You can create bins and count the frequency in SQL. Here is how to do it.<br>
    
</div>

### Histograms of revenue per user

In [37]:
pd.read_sql_query('''

WITH user_revenues AS (
  SELECT
    -- Select the user ID and revenue
    user_id,
    SUM(meal_price*order_quantity) AS revenue
  FROM meals AS m
  JOIN orders AS o ON m.meal_id = o.meal_id
  GROUP BY user_id)

SELECT
  -- Return the frequency table of revenues by user
  ROUND(revenue :: NUMERIC,-2) AS revenue_100,
  COUNT(user_id) AS users
FROM user_revenues
GROUP BY revenue_100
ORDER BY revenue_100 ASC;

;''', engine)

Unnamed: 0,revenue_100,users
0,0.0,47
1,100.0,426
2,200.0,458
3,300.0,261
4,400.0,96
5,500.0,14
6,600.0,2


### Histogram of Orders

In [38]:
pd.read_sql_query('''

WITH user_orders AS (
  SELECT
    user_id,
    COUNT(DISTINCT order_id) AS orders
  FROM orders
  GROUP BY user_id)

SELECT
  -- Return the frequency table of orders by user
  orders,
  COUNT(user_id) AS users
FROM user_orders
GROUP BY orders
ORDER BY orders ASC;

;''', engine)

Unnamed: 0,orders,users
0,1,7
1,2,42
2,3,65
3,4,88
4,5,123
5,6,112
6,7,130
7,8,107
8,9,110
9,10,128


### Bucketing (Binning)

#### By revenue

In [39]:
pd.read_sql_query('''

WITH user_revenues AS (
  SELECT
    -- Select the user IDs and the revenues they generate
    user_id,
    SUM(meal_price*order_quantity) AS revenue
  FROM meals AS m
  JOIN orders AS o ON m.meal_id = o.meal_id
  GROUP BY user_id)

SELECT
  -- Fill in the bucketing conditions
  CASE
    WHEN revenue < 150 THEN 'Low-revenue users'
    WHEN revenue < 300 THEN 'Mid-revenue users'
    ELSE 'High-revenue users'
  END AS revenue_group,
  COUNT(user_id) AS users
FROM user_revenues
GROUP BY revenue_group;

;''', engine)

Unnamed: 0,revenue_group,users
0,High-revenue users,225
1,Mid-revenue users,606
2,Low-revenue users,473


#### By orders

In [40]:
pd.read_sql_query('''

-- Store each user's count of orders in a CTE named user_orders
WITH user_orders AS (
  SELECT
    user_id,
    COUNT(DISTINCT(order_id)) AS orders
  FROM orders
  GROUP BY user_id)

SELECT
  -- Write the conditions for the three buckets
  CASE
    WHEN orders < 8 THEN 'Low-orders users'
    WHEN orders < 15 THEN 'Mid-orders users'
    ELSE 'High-orders users'
  END AS order_group,
  -- Count the distinct users in each bucket
  COUNT(user_id) AS users
FROM user_orders
GROUP BY order_group;

;''', engine)

Unnamed: 0,order_group,users
0,Low-orders users,567
1,High-orders users,125
2,Mid-orders users,612


<a id=percentiles></a>

<div class="span5 alert alert-info">
    <h3> Percentiles</h3>

**Information:** Allow us to determine what percentage of our data is below or above our value, it goes from 0 percentile and 99th percentile<br>
    
</div>

### Mean and Quartiles

In [41]:
pd.read_sql_query('''

WITH user_revenues AS (
  -- Select the user IDs and their revenues
  SELECT
    user_id,
    SUM(meal_price*order_quantity) AS revenue
  FROM meals AS m
  JOIN orders AS o ON m.meal_id = o.meal_id
  GROUP BY user_id)

SELECT
  -- Calculate the first, second, and third quartile
  ROUND(
    PERCENTILE_CONT(0.25) WITHIN GROUP
    (ORDER BY revenue ASC) :: NUMERIC,
  2) AS revenue_p25,
  ROUND(
    PERCENTILE_CONT(0.5) WITHIN GROUP
    (ORDER BY revenue ASC) :: NUMERIC,
  2) AS revenue_p50,
  ROUND(
    PERCENTILE_CONT(0.75) WITHIN GROUP
    (ORDER BY revenue ASC) :: NUMERIC,
  2) AS revenue_p75,
  -- Calculate the average
  ROUND(AVG(revenue) :: NUMERIC, 2) AS avg_revenue
FROM user_revenues;

;''', engine)

Unnamed: 0,revenue_p25,revenue_p50,revenue_p75,avg_revenue
0,120.69,186.5,268.31,199.56


### InterQuartile Range (number of users in the IQR)

In [42]:
pd.read_sql_query('''

WITH user_revenues AS (
  SELECT
    -- Select user_id and calculate revenue by user 
    user_id,
    SUM(m.meal_price * o.order_quantity) AS revenue
  FROM meals AS m
  JOIN orders AS o ON m.meal_id = o.meal_id
  GROUP BY user_id),

  quartiles AS (
  SELECT
    -- Calculate the first and third revenue quartiles
    ROUND(
      PERCENTILE_CONT(0.25) WITHIN GROUP
      (ORDER BY revenue ASC) :: NUMERIC,
    2) AS revenue_p25,
    ROUND(
      PERCENTILE_CONT(0.75) WITHIN GROUP
      (ORDER BY revenue ASC) :: NUMERIC,
    2) AS revenue_p75
  FROM user_revenues)

SELECT
  -- Count the number of users in the IQR
  COUNT(DISTINCT(user_id)) AS users
FROM user_revenues
CROSS JOIN quartiles
-- Only keep users with revenues in the IQR range
WHERE revenue :: NUMERIC >= revenue_p25
  AND revenue :: NUMERIC <= revenue_p75;


;''', engine)

Unnamed: 0,users
0,652


<a id=date></a>

<div class="span5 alert alert-info">
    <h3> Dates</h3>

**Information:** Sometimes we will want dates on a more readable form. We can use `TO_CHAR` to change them.<br>
    
</div>

Simplified cheatsheet for dates:

General format = `TO_CHAR(Date_column, Text)`
Text options:
- `DD` --> Day as a number (05,23)
- `Dy` --> Abbreviated Day of the week (Fri, Sat)
- `FMDay` --> Full day of week (Friday, Saturday)

- `MM` --> Month as a number (02,12)
- `Mon` --> Abbreviated Month (Jan, Aug)
- `FMMonth` --> Full month (January, August)

- `YY` --> Last two digits of a year
- `YYYY` --> Full four digits

In [43]:
pd.read_sql_query('''

SELECT DISTINCT
  -- Select the order date
  order_date,
  -- Format the order date
  TO_CHAR(order_date,'FMDay DD, FMMonth YYYY') AS format_order_date
FROM orders
ORDER BY order_date ASC
LIMIT 3;

;''', engine)

Unnamed: 0,order_date,format_order_date
0,2018-06-01,"Friday 01, June 2018"
1,2018-06-02,"Saturday 02, June 2018"
2,2018-06-03,"Sunday 03, June 2018"


<a id=rank></a>

<div class="span5 alert alert-info">
    <h3> Ranking</h3>

**Information:** Window functions are always useful to review<br>
    
</div>

In [44]:
pd.read_sql_query('''

-- Set up the user_count_orders CTE
WITH user_count_orders AS (
  SELECT
    user_id,
    COUNT(DISTINCT order_id) AS count_orders
  FROM orders
  -- Only keep orders in August 2018
  WHERE DATE_TRUNC('month', order_date) = '2018-08-01'
  GROUP BY user_id)

SELECT
  -- Select user ID, and rank user ID by count_orders
  user_id,
  RANK() OVER(ORDER BY count_orders DESC) AS count_orders_rank
FROM user_count_orders
ORDER BY count_orders_rank ASC
-- Limit the user IDs selected to 3
LIMIT 3;

;''', engine)

Unnamed: 0,user_id,count_orders_rank
0,76,1
1,296,2
2,291,3


<a id=pivot></a>

<div class="span5 alert alert-info">
    <h3> Pivoting</h3>

**Information:** Changing / Transposing a row into a column (convert a long table to a wide table). We use `CROSSTAB()` which takes a source table and pivot's it with one of its tables. It is not available by default in postgres so we need to import it `CREATE EXTENSION IF NOT EXISTS tablefunc;`<br>
    
The structure follows:
    
`SELECT * FROM CROSSTAB($$ the query $$) AS ct (col1 data_type1, col2 data_type2, etc.)`
    
</div>

In [45]:
pd.read_sql_query('''

-- Import tablefunc
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM CROSSTAB($$
  SELECT
    user_id,
    DATE_TRUNC('month', order_date) :: DATE AS delivr_month,
    SUM(meal_price * order_quantity) :: FLOAT AS revenue
  FROM meals
  JOIN orders ON meals.meal_id = orders.meal_id
 WHERE user_id IN (0, 1, 2, 3, 4)
   AND order_date < '2018-09-01'
 GROUP BY user_id, delivr_month
 ORDER BY user_id, delivr_month;
$$)
-- Select user ID and the months from June to August 2018
AS ct (user_id INT,
       "2018-06-01" FLOAT,
       "2018-07-01" FLOAT,
       "2018-08-01" FLOAT)
ORDER BY user_id ASC;

;''', engine)

Unnamed: 0,user_id,2018-06-01,2018-07-01,2018-08-01
0,0,56.5,44.75,43.25
1,1,9.25,12.0,
2,2,80.25,12.5,10.75
3,3,78.25,21.5,
4,4,43.75,,


In [46]:
pd.read_sql_query('''

-- Import tablefunc
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM CROSSTAB($$
  SELECT
    -- Select eatery and calculate total cost
    eatery,
    DATE_TRUNC('month', stocking_date) :: DATE AS delivr_month,
    SUM(meal_cost * stocked_quantity) :: FLOAT AS cost
  FROM meals
  JOIN stock ON meals.meal_id = stock.meal_id
  -- Keep only the records after October 2018
  WHERE DATE_TRUNC('month', stocking_date) > '2018-10-01'
  GROUP BY eatery, delivr_month
  ORDER BY eatery, delivr_month;
$$)

-- Select the eatery and November and December 2018 as columns
AS ct (eatery TEXT,
       "2018-11-01" FLOAT,
       "2018-12-01" FLOAT)
ORDER BY eatery ASC;

;''', engine)

Unnamed: 0,eatery,2018-11-01,2018-12-01
0,'Bean Me Up Scotty',3102.25,5810.5
1,'Burgatorio',7946.5,14197.75
2,'Leaning Tower of Pizza',3989.75,7256.0
3,'Life of Pie',523.5,946.5
4,'The Moon Wok',5825.0,10383.75


In [47]:
pd.read_sql_query('''

-- Import tablefunc
CREATE EXTENSION IF NOT EXISTS tablefunc;

-- Pivot the previous query by quarter
SELECT * FROM CROSSTAB($$
  WITH eatery_users AS  (
    SELECT
      eatery,
      -- Format the order date so "2018-06-01" becomes "Q2 2018"
      TO_CHAR(order_date, '"Q"Q YYYY') AS delivr_quarter,
      -- Count unique users
      COUNT(DISTINCT user_id) AS users
    FROM meals
    JOIN orders ON meals.meal_id = orders.meal_id
    GROUP BY eatery, delivr_quarter
    ORDER BY delivr_quarter, users)

  SELECT
    -- Select eatery and quarter
    eatery,
    delivr_quarter,
    -- Rank rows, partition by quarter and order by users
    RANK() OVER
      (PARTITION BY delivr_quarter
       ORDER BY users DESC) :: INT AS users_rank
  FROM eatery_users
  ORDER BY eatery, delivr_quarter;
$$)
-- Select the columns of the pivoted table
AS  ct (eatery TEXT,
        "Q2 2018" INT,
        "Q3 2018" INT,
        "Q4 2018" INT)
ORDER BY "Q4 2018";

;''', engine)

Unnamed: 0,eatery,Q2 2018,Q3 2018,Q4 2018
0,'The Moon Wok',1,1,1
1,'Burgatorio',2,2,2
2,'Bean Me Up Scotty',2,2,3
3,'Leaning Tower of Pizza',4,4,4
4,'Life of Pie',5,5,5


In [None]:
pd.read_sql_query('''



;''', engine)

In [None]:
pd.read_sql_query('''



;''', engine)