### Load SQL Magics

In [22]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


### Load sqlalchemy to create a local environment of PostgreSQL server

In [23]:
from sqlalchemy import create_engine
import pandas as pd

In [24]:
# %sql dialect+driver://username:password@host:port/database
%sql postgresql://jovyan:postgres@localhost:8765/rsm-docker

'Connected: jovyan@rsm-docker'

### Create engine

In [25]:
engine = create_engine("postgresql://jovyan:postgres@localhost:8765/rsm-docker")

In [26]:
engine

Engine(postgresql://jovyan:***@localhost:8765/rsm-docker)

### Get files as data frames

In [27]:
accounts = pd.read_excel("data/accounts.xlsx")
orders = pd.read_excel("data/orders.xlsx")
region = pd.read_excel("data/region.xlsx")
sales_reps = pd.read_excel("data/sales_reps.xlsx")
web_events = pd.read_excel("data/web_events.xlsx")

### Data ingestion to tables in a database (here - rsm-docker)

In [28]:
accounts.to_sql("accounts", engine, if_exists="replace")
orders.to_sql("orders", engine, if_exists="replace")
region.to_sql("region", engine, if_exists="replace")
sales_reps.to_sql("sales_reps", engine, if_exists="replace")
web_events.to_sql("web_events", engine, if_exists="replace")

In [29]:
engine.table_names()

['web_events', 'orders', 'region', 'sales_reps', 'ACCOUNTS', 'accounts']

## Queries

### 1. When did the first order take place?

In [30]:
# By date the question didn't mean the substring within the each cell*/

In [31]:
%%sql

SELECT MIN(occurred_at) 
FROM orders;

 * postgresql://jovyan:***@localhost:8765/rsm-docker
1 rows affected.


min
2013-12-04T04:22:44.000Z


#### A different version of the query above (brute coded)

In [32]:
%%sql

SELECT occurred_at
FROM orders
ORDER BY occurred_at
LIMIT 1;

 * postgresql://jovyan:***@localhost:8765/rsm-docker
1 rows affected.


occurred_at
2013-12-04T04:22:44.000Z


### 2. When did the most recent web event take place?

In [33]:
%%sql

SELECT MAX(occurred_at)
FROM web_events;

 * postgresql://jovyan:***@localhost:8765/rsm-docker
1 rows affected.


max
2017-01-01T23:51:09.000Z


#### A different version of the query above (brute coded)

In [34]:
%%sql

SELECT occurred_at
FROM web_events
ORDER BY occurred_at DESC
LIMIT 1;

 * postgresql://jovyan:***@localhost:8765/rsm-docker
1 rows affected.


occurred_at
2017-01-01T23:51:09.000Z


### 3. Find out the avg. quantity and avg. sales of standard, gloss, poster paper types?

In [35]:
%%sql

SELECT avg(standard_amt_usd) AS standard_avg_sales, 
       avg(gloss_amt_usd) AS gloss_avg_sales, 
       avg(poster_amt_usd) AS poster_avg_sales,   
       sum(standard_qty)/count(standard_qty) AS standard_avg_qty,
       avg(standard_qty) AS standard_avg_qty_1,
       sum(gloss_qty)/count(gloss_qty) AS gloss_avg_qty,
       sum(poster_qty)/count(poster_qty) AS poster_avg_qty
FROM orders;

 * postgresql://jovyan:***@localhost:8765/rsm-docker
1 rows affected.


standard_avg_sales,gloss_avg_sales,poster_avg_sales,standard_avg_qty,standard_avg_qty_1,gloss_avg_qty,poster_avg_qty
1399.35569155093,1098.54742042824,850.11653935185,280.4320023148148,280.4320023148148,146.6685474537037,104.6941550925926


* Sum and count were used to avoid NULL values within the columns in the numerator and denominator of the average/mean calculation

### 4. Calculate the median `total_amt_usd` of all the orders?

In [36]:
%%sql

SELECT *
FROM (SELECT total_amt_usd
      FROM orders
      ORDER BY total_amt_usd
      LIMIT 3457) AS Table1
ORDER BY total_amt_usd DESC
LIMIT 2;

 * postgresql://jovyan:***@localhost:8765/rsm-docker
2 rows affected.


total_amt_usd
2483.16
2482.55


* Even number of entries - Median calculation is avg. of the middle two 

#### A different version of the query above - odd number of entries

As the order table has even number of rows the below query gives out 0 rows.

In [37]:
%%sql

SELECT o.total_amt_usd AS median_total_usd
FROM orders o
WHERE (SELECT count(total_amt_usd) 
       from orders
       where total_amt_usd < o.total_amt_usd) = 
       (SELECT count(total_amt_usd) 
       from orders
       where total_amt_usd > o.total_amt_usd);

 * postgresql://jovyan:***@localhost:8765/rsm-docker
0 rows affected.


median_total_usd
