<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# SQL Together Lab

_Authors: Dave Yerrington (SF)_

---

> **This is a hybrid lecture/lab. Each exercise is intended to be completed by a student in front of the class.**


### Learning Objectives
*After this lesson, you will be able to:*
- Sort results by a column using `ORDER BY`
- Simplify our syntax using aliases (`AS`)
- Match patterns using `LIKE`
- Select distinct items using `DISTINCT`
- Aggregate values using `GROUP BY`
- Filter on aggregations using `HAVING`
- Apply `IF/THEN` logic using `CASE`
- Use `EXTRACT` to get date parts

### Lesson Guide
- [Install psycopg2](#install-psycopg2)
- [Connect to remote database](#connect-to-remote)
- [Some notes on syntax](#syntax-notes)
- [ORDER BY](#order-by)
- [Alias AS](#alias-as)
- [LIKE](#like-operator)
- [DISTINCT](#distinct)
- [LIMIT](#limit)
- [GROUP BY](#group-by)
- [HAVING](#having)
- [CASE statements](#case)
- [Working with dates](#dates)
- [Additional exercises](#additional-exercises)
- [Conclusion](#conclusion)
- [Additional resources](#additional-resources)


<a id='install-psycopg2'></a>
## Install `psycopg2`

---

Either:

`> conda install psycopg2`

Or:

`> pip install psycopg2`



<a id='connect-to-remote'></a>
## Connect to the remote database

---

In [112]:
from sqlalchemy import create_engine
import psycopg2
import pandas as pd

conn_str = "host='dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com' dbname='northwind' user='dsi_student' password='gastudents'"
conn = psycopg2.connect(conn_str)

<a id='syntax-notes'></a>

## A couple of notes on syntax

---

The [Northwind Database Schema](https://northwinddatabase.codeplex.com/) will come in handy for writing your solutions to the problems below. 

1. You want to wrap column names in double quotes **"column_name"**
2. You can comment out a line by including a double dash in front **--**
3. You want to wrap a string in single quotes **'string'**

```*.sql
SELECT "ProductID" as "PID"
FROM Products
WHERE "ProductName" like '%a' 
--AND 
```

<a id='order-by'></a>

## `ORDER BY`

---

The `ORDER BY` keyword is used to sort the result-set by one or more columns. It sorts the records in ascending order by default. To sort the records in a descending order, you can use the `DESC` keyword.

### SQL `ORDER BY` syntax

```*.sql
SELECT _column_name_,_ column_name_  
FROM _table_name_  
ORDER BY _column_name _ASC|DESC,_ column_name_ ASC|DESC;
```

### Exercise 1:

Select the `ProductID`, `ProductName`, `SupplierID`, and `UnitPrice` for all `Products` with a `UnitPrice > 25` ordered by `SupplierID` descending and then `UnitPrice` ascending.

In [10]:
SQL_STRING='''
select "ProductID" "PID", "ProductName" "PD",
"SupplierID" "SID", "UnitPrice" "UP"
from Products as p
where p."UnitPrice" >25
order by "SID" DESC , "UP" ASC

'''
df=pd.read_sql(SQL_STRING,con=conn)
df

Unnamed: 0,PID,PD,SID,UP
0,61,Sirop d'érable,29,28.5
1,62,Tarte au sucre,29,49.3
2,60,Camembert Pierrot,28,34.0
3,59,Raclette Courdavault,28,55.0
4,56,Gnocchi di nonna Alice,26,38.0
5,53,Perth Pasties,24,32.8
6,51,Manjimup Dried Apples,24,53.0
7,43,Ipoh Coffee,20,46.0
8,38,Côte de Blaye,18,263.5
9,37,Gravad lax,17,26.0


<a id='alias-as'></a>
## Alias `AS`

---

SQL aliases are used to give a database table, or a column in a table, a temporary name. Aliases are often created to make column names more readable and to make queries more concise.

### SQL alias syntax for columns

```*.sql
SELECT _column_name_ AS _alias_name_  
FROM _table_name;_
```

### SQL alias syntax for tables

```*.sql
SELECT _column_name(s)_  
FROM _table_name _AS _alias_name;_
```


### Exercise 2

Select `SupplierID` and `CompanyName` from the `Suppliers` table aliasing these columns as `Supplier No.` and `Company Name` respectively. Also alias the table as `S`. Order By `CompanyName` ascending.

In [16]:
# A:
SQL_QUERY='''
select "SupplierID" as "Supplier No.",
"CompanyName" as "Company Name"
from Suppliers  S
order by S."CompanyName" ASC
'''
df=pd.read_sql(SQL_QUERY,con=conn)
df

Unnamed: 0,Supplier No.,Company Name
0,18,Aux joyeux ecclésiastiques
1,16,Bigfoot Breweries
2,5,Cooperativa de Quesos 'Las Cabras'
3,27,Escargots Nouveaux
4,1,Exotic Liquids
5,29,Forêts d'érables
6,14,Formaggi Fortini s.r.l.
7,28,Gai pâturage
8,24,"G'day, Mate"
9,3,Grandma Kelly's Homestead


**Aliases can be useful when:**

- More than one table is involved in a query
- Functions are used in the query
- Column names are long or not very readable
- Two or more columns are combined together

<a id='like-operator'></a>
## SQL `LIKE` operator

---

The `LIKE` operator is used in a `WHERE` clause to search for a specified pattern in a column.


### SQL `LIKE` syntax

```*.sql

SELECT _column_name(s)_  
FROM _table_name_  
WHERE _column_name_ LIKE _pattern_;

```

> **Tip**: The `"%"` sign is used to define wildcards (missing letters) both before and after the pattern. Also notice that PostgreSQL is case sensitive.

### Exercise 3

Select all products from the product table with a `ProductName` that contain "ch" in descending order. Alias this column as `Ch Products`. 

In [132]:
def Q(SQL_Query,con=conn):
    return(pd.read_sql(SQL_Query,con=conn))

In [29]:
SQL_Query3='''
select "ProductName" as "Ch Products"
from products p
where "ProductName" like '%ch%'
order by "ProductName" DESC;
'''
Q(SQL_Query3,con=conn)


Unnamed: 0,Ch Products
0,Schoggi Schokolade
1,Sasquatch Ale
2,Queso Manchego La Pastora
3,Pâté chinois
4,Gumbär Gummibärchen
5,Gnocchi di nonna Alice


### Exercise 4

Select all products from the `Suppliers` table with a `City` that starts with "S" in ascending order. Alias this column as `S Cities`. 

In [34]:
# A:
SQL_Q4='''
select "City" as " S Cities"
from Suppliers 
where "City" like 'S%'
order by "City" ASC
'''
Q(SQL_Q4,con=conn)

Unnamed: 0,S Cities
0,Salerno
1,Sandvika
2,Sao Paulo
3,Singapore
4,Ste-Hyacinthe
5,Stockholm
6,Sydney


<a id='distinct'></a>
## The `DISTINCT` operator

---

The `SELECT DISTINCT` statement is used to return only distinct (unique) values. In a table, a column may contain many duplicate values; sometimes you only want to list the unique ones.

### `SELECT DISTINCT` syntax

```*.sql

SELECT DISTINCT _column_name_,_column_name_  
FROM _table_name_;

```

### Exercise 5

`SELECT DISTINCT` `SupplierID`, `ProductName` and `UnitPrice` from the `Products` table ordering by `UnitPrice` ascending (the cheapest product for each supplier).

In [39]:
Query5='''
select DISTINCT "SupplierID",
"ProductName","UnitPrice"
from Products p
order by p."UnitPrice" ASC
'''
Q(Query5,con=conn)

Unnamed: 0,SupplierID,ProductName,UnitPrice
0,15,Geitost,2.50
1,10,Guaraná Fantástica,4.50
2,6,Konbu,6.00
3,24,Filo Mix,7.00
4,25,Tourtière,7.45
5,12,Rhönbräu Klosterbier,7.75
6,9,Tunnbröd,9.00
7,8,Teatime Chocolate Biscuits,9.20
8,21,Rogede sild,9.50
9,22,Zaanse koeken,9.50


<a id='limit'></a>

## The `LIMIT` operator

---

Sometimes we may want to only retrieve a fixed number of records from the database. This is where the `LIMIT` operator comes in.


### `LIMIT` syntax

```*.sql

SELECT _column_name_,_column_name_  
FROM _table_name_
LIMIT _number_of_records;

```

### Exercise 6

Return the 5 highest priced Products that contain an **a** in the product name in ascending order. Alias the column as Top 5 A Products.

In [66]:
# A:
Query6='''
select "ProductName","UnitPrice"
from Products p
where p."ProductName" like '%a%'
order by p."ProductName" ASC
LIMIT 5
'''
Q(Query6,con=conn)

Unnamed: 0,ProductName,UnitPrice
0,Boston Crab Meat,18.4
1,Camembert Pierrot,34.0
2,Carnarvon Tigers,62.5
3,Chai,18.0
4,Chang,19.0


<a id='group-by'></a>
## `GROUP BY` Operator

---

A table may contain several records that have a common key. 

The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns. For example, we may want to know the total number of items purchased within each order.

### `GROUP BY` syntax

```*.sql
SELECT column_name, aggregate_function(column_name)  
FROM table_name  
WHERE column_name operator value  
GROUP BY column_name;
```

The aggregate functions that you can use with `GROUP BY` are:
- **`COUNT`**
- **`MIN`**
- **`MAX`**
- **`SUM`**
- **`AVG`**

### Exercise 7

From the `order_details` table show the count of orders per OrderID and the SUM of the revenue (`UnitPrice * Quantity`).  Order by the revenue.

In [47]:
Querya='''
select * from order_details
LIMIT 2
'''
Q(Querya,con=conn)

Unnamed: 0,OrderID,ProductID,UnitPrice,Quantity,Discount
0,10248,11,14.0,12,0.0
1,10248,42,9.8,10,0.0


In [57]:
Query7='''
select "OrderID",COUNT("OrderID"),
SUM("UnitPrice"* "Quantity") as "revenue"
from order_details
group by "OrderID"
order by "revenue"
'''
Q(Query7,con=conn)

Unnamed: 0,OrderID,count,revenue
0,10782,1,12.500000
1,10807,1,18.400000
2,10767,1,28.000000
3,10586,1,28.000000
4,10898,1,30.000000
5,10883,1,36.000000
6,10815,1,40.000000
7,10674,1,45.000000
8,11057,1,45.000000
9,11051,1,45.000000


<a id='having'></a>
## The `HAVING` operator

---

The `HAVING` clause was added to SQL because the `WHERE` keyword could not be used with aggregate functions. `HAVING` allows us to apply a filter to the aggregate functions. For example, if we only wanted to show companies that had revenue (calculated with an aggregate function) greater than $10,000.

### `HAVING` Syntax

``` *.sql

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;

```

### Exercise 8

Show the revenue of all orders that have more than 1 item.

In [63]:
Query8='''
select "OrderID",COUNT("OrderID"),
SUM("UnitPrice"* "Quantity") as "revenue"
from order_details
group by "OrderID"
Having COUNT("OrderID")>1
order by "revenue"

'''
Q8(Query,con=conn)

Unnamed: 0,OrderID,count,revenue
0,10620,2,57.500000
1,11019,2,76.000000
2,10281,3,86.499998
3,10753,2,88.000000
4,10308,2,88.799999
5,10288,2,89.000001
6,10710,2,93.499999
7,10259,2,100.799999
8,10415,2,102.400002
9,10366,2,135.999994


<a id='case'></a>
## `CASE` statements

---

The `CASE` statement is SQL’s way of applying if/then logic. The `CASE` statement is followed by at least one pair of `WHEN` and `THEN` statements. It must end with the `END` statement. The `ELSE` statement is optional, and provides a way to capture values not specified in the `WHEN/THEN` statements.

### `CASE` syntax

```*.sql
SELECT 
    CASE WHEN column_name operator value THEN 'string value'
        WHEN column_name operator value THEN 'string value'
        ELSE 'string value' END AS 'alias'         
FROM table_name
```

### Pseudo example

```*.sql
SELECT name
    CASE WHEN age < 1 THEN 'infant'
         WHEN age < 2 THEN 'toddler'
         WHEN age < 5 THEN 'child'
         ELSE 'old as dirt' END AS 'Persons Age'
```

### Exercise 9

Select `CompanyName`, `City`, and `Country` from the `Suppliers` table. Add a new column `D_F` which has a value of "domestic" if the supplier is from USA and "foreign" otherwise.

In [65]:
# A:
Queryb='''
select * from Suppliers
LIMIT 2
'''
Q(Queryb,con=conn)

Unnamed: 0,SupplierID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage
0,1,Exotic Liquids,Charlotte Cooper,Purchasing Manager,49 Gilbert St.,London,,EC1 4SD,UK,(171) 555-2222,,
1,2,New Orleans Cajun Delights,Shelley Burke,Order Administrator,P.O. Box 78934,New Orleans,LA,70117,USA,(100) 555-4822,,#CAJUN.HTM#


In [73]:
Query9='''
select "CompanyName","City","Country",
    CASE when "Country" like 'USA' THEN 'DOMESTIC'
    ELSE 'FORIEGN' END AS "D_F"
    FROM SUPPLIERS
'''
Q(Query9,con=conn)

Unnamed: 0,CompanyName,City,Country,D_F
0,Exotic Liquids,London,UK,FORIEGN
1,New Orleans Cajun Delights,New Orleans,USA,DOMESTIC
2,Grandma Kelly's Homestead,Ann Arbor,USA,DOMESTIC
3,Tokyo Traders,Tokyo,Japan,FORIEGN
4,Cooperativa de Quesos 'Las Cabras',Oviedo,Spain,FORIEGN
5,Mayumi's,Osaka,Japan,FORIEGN
6,"Pavlova, Ltd.",Melbourne,Australia,FORIEGN
7,"Specialty Biscuits, Ltd.",Manchester,UK,FORIEGN
8,PB Knäckebröd AB,Göteborg,Sweden,FORIEGN
9,Refrescos Americanas LTDA,Sao Paulo,Brazil,FORIEGN


<a id='dates'></a>
## Working with dates

---

[Take some time to look over the postgres date documentation](https://www.postgresql.org/docs/8.1/static/functions-datetime.html)

### Extracting date parts from a date object
```*.sql
SELECT my_date,
       EXTRACT('year'   FROM my_date) AS year,
       EXTRACT('month'  FROM my_date) AS month,
       EXTRACT('day'    FROM my_date) AS day,
       EXTRACT('hour'   FROM my_date) AS hour,
       EXTRACT('minute' FROM my_date) AS minute,
       EXTRACT('second' FROM my_date) AS second,
       EXTRACT('decade' FROM my_date) AS decade,
       EXTRACT('dow'    FROM my_date) AS day_of_week
  FROM table_name
```

### Exercise 10

Select `OrderDate` and `Freight` from the `Orders` table, along with three new columns for Year, Month, and Day. Make sure these are [cast as integers and not floats](http://www.postgresqltutorial.com/postgresql-cast/).

After extracting the dates as integers, pull out the year, month, and sum of `Freight` aliased as "FreightPerMonth", grouping by the year and month and only where the freight per month is greater than 5000.

Order this datagrame by year and month descending.

In [95]:
# A:
Query10='''
select "year","month",SUM("Freight") as "FreightPerMonth"
       from(select
       "OrderDate",
       cast(EXTRACT('year' FROM "OrderDate")as int) AS year,
       cast(EXTRACT('month'FROM "OrderDate")as int) AS month,
       cast(EXTRACT('day' FROM "OrderDate")as int) AS day,
       "Freight"
    from Orders) as sub
       group by year,month
       having SUM("Freight") >5000
       order by year desc,month desc

'''
Q(Query10,con=conn)


Unnamed: 0,year,month,FreightPerMonth
0,1998,4,6393.57
1,1998,3,5379.02
2,1998,1,5463.44


<a id='additional-exercises'></a>

### Exercise 11

From the `orders` table find the average number of days it took to ship a package per `ShipCountry`. Only include orders that have a ship date, and only show the top 5.

In [96]:
queryc='''
select * from orders
limit 2
'''
Q(queryc,con=conn)

Unnamed: 0,OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry
0,10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France
1,10249,TOMSP,6,1996-07-05,1996-08-16,1996-07-10,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany


In [101]:
query11='''
select AVG("ShippedDate"-"OrderDate") As "Avg_no_days_shipping",
"ShipCountry"
from orders
where "ShippedDate" is not null
group by "ShipCountry"
order by AVG("ShippedDate"-"OrderDate") desc
Limit 5
'''
Q(query11,con=conn)

Unnamed: 0,Avg_no_days_shipping,ShipCountry
0,11.0,Ireland
1,10.216216,Sweden
2,9.941176,Switzerland
3,9.554622,USA
4,9.285714,Argentina


### Exercise 12

Find the top 5 countries by average freight cost of products shipped in the year 1998 from the `orders` table.

In [129]:
query12='''
select AVG("Freight") as "average_freigt_cost","ShipCountry"
from orders
where Extract('year' from "ShippedDate") = 1998
group by "ShipCountry"
order by AVG("Freight") DESC
limit 5
'''
Q(query12,con=conn)

Unnamed: 0,average_freigt_cost,ShipCountry
0,339.422489,Ireland
1,199.137275,Austria
2,166.906216,USA
3,131.617501,Denmark
4,102.470003,Portugal


### Exercise 13

From the `employees` table find the 2 women that were hired the most recently. Exclude employees where the gender is ambiguous.

In [114]:
Query_d='''
select * from employees
limit  1
'''
Q(Query_d,con=conn)

Unnamed: 0,EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath
0,1,Davolio,Nancy,Sales Representative,Ms.,1948-12-08,1992-05-01,507 - 20th Ave. E.\nApt. 2A,Seattle,WA,98122,USA,(206) 555-9857,5467,[],Education includes a BA in psychology from Col...,2,http://accweb/emmployees/davolio.bmp


In [109]:
Query_e='''
select DISTINCT "TitleOfCourtesy"
from employees

'''
Q(Query_e,con=conn)

Unnamed: 0,TitleOfCourtesy
0,Mrs.
1,Mr.
2,Ms.
3,Dr.


In [119]:
query13='''
select "LastName","FirstName"
from employees
where "TitleOfCourtesy" in('Mrs.','Ms.')
order by "HireDate" desc
Limit 2
'''
Q(query13,con=conn)

Unnamed: 0,LastName,FirstName
0,Dodsworth,Anne
1,Callahan,Laura


### Exercise 14

Split products from the `Products` table into three price categories:
- Cheap: less than 10
- Fair: 10 to 50
- Expensive: greater than 50

Return the count per product price categories along with the min, max, and average. 

In [133]:
queryf='''
select * from Products
limit 1
'''
Q(queryf,con=conn)

Unnamed: 0,ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued
0,1,Chai,8,1,10 boxes x 30 bags,18.0,39,0,10,1


In [142]:
query14='''
select "Price_Indicator",count("Price_Indicator"),min("UnitPrice"),max("UnitPrice"),avg("UnitPrice")
from(select "UnitPrice",
 case
    when "UnitPrice"<10
    then 'cheap'
    when "UnitPrice" IN (10,50)
    then 'fair'
    else 'expensive' END as "Price_Indicator"

from Products) as origin
group by origin."Price_Indicator"
'''
Q(query14,con=conn)

Unnamed: 0,Price_Indicator,count,min,max,avg
0,cheap,11,2.5,9.65,7.459091
1,expensive,63,12.0,263.5,33.462857
2,fair,3,10.0,10.0,10.0


<a id='conclusion'></a>
## Conclusion

---

In this lesson we have learned many new commands to make powerful SQL queries.

In particular we learned how to:

- Sort results by a column using `ORDER BY`
- Simplify our syntax using aliases
- Match patterns using `LIKE`
- Select distinct items using `DISTINCT`
- Aggregate values using `GROUP BY`
- Filter on aggregations using `HAVING`
- Apply `IF/THEN` logic using `CASE`
- Use `EXTRACT` to get date parts

**Can you think of a few more business cases where these are useful?**

<a id='additional-resources'></a>
## Additional resources

---

- [Postgres Documenation](https://www.postgresql.org/docs/)
- [Mode Analytics Tutorial](https://community.modeanalytics.com/sql/tutorial/introduction-to-sql/)