# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) SQL JOINS
Week 6 | Lesson 1.1

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- explain what a `JOIN` operation is
- visualize a `JOIN` operation as an operation between sets
- distinguish different types of `JOIN`
- perform `JOINS` in `SQL`

### STUDENT PRE-WORK
*Before this lesson, you should already be able to:*
- connect to a local or remote relational database
- perform SQL CRUD actions and queries
- merge dataframes with Pandas.merge

### LESSON GUIDE
| TIMING  | TYPE  | TOPIC  |
|:-:|---|---|
| 5 mins | [Opening](#opening) | Opening |
| 15 mins | [Introduction](#introduction) | Joining tables |
| 15 mins | [Demo](#demo) | Demo: Different types of JOIN |
| 15 mins | [Guided-practice](#guided-practice) | Guided Practice: Other Joins |
| 15 mins | [Demo](#demo_2) | Demo: Sub-queries |
| 25 minutes | [Ind-practice](#ind-practice) | Independent Practice: Other SQL Commands |
| 5 mins | [Conclusion](#conclusion) | Conclusion |

<a name="opening"></a>
## Opening (5 mins)
Last week we learned many things about databases, including:

- how to connect to a local or remote db
- how to add, remove, edit data
- how to perform simple queries
- how to aggregate, group and sort data

> **Check:** What SQL commands did we learn last week? What do they do?

We often need to use data stored in more than one table. Last week we did this using `Pandas merge`, but time has come to learn about _`JOIN`_, which is the natural way to merge data within `SQL`.

<a name="introduction"></a>
## Joining tables (15 mins)

We will use the [Northwind sample database](https://northwinddatabase.codeplex.com/):

    psql -h dsi.c20gkj5cvu3l.us-east-1.rds.amazonaws.com -p 5432 -U dsi_student northwind
    password: gastudents

As a reminder, here are what some of the tables looks like:

`customers`:

|CustomerID |CompanyName |ContactName | ContactTitle |Address|City | Region | PostalCode | Country |Phone | Fax|
|---|
|ALFKI| Alfreds Futterkiste| Maria Anders | Sales Representative | Obere Str. 57 | Berlin|| 12209| Germany | 030-0074321| 030-0076545|
|ANATR| Ana Trujillo Emparedados y helados | Ana Trujillo | Owner| Avda. de la Constitución 2222 | México D.F. || 05021| Mexico| (5) 555-4729 | (5) 555-3745|
|ANTON| Antonio Moreno Taquería| Antonio Moreno | Owner| Mataderos2312 | México D.F. || 05023| Mexico| (5) 555-3932 |
|...|...|...|...|...|...|...|...|...|...|...|



`orders`:

|OrderID | CustomerID | EmployeeID | OrderDate| RequiredDate | ShippedDate | ShipVia | Freight | ShipName|ShipAddress |ShipCity| ShipRegion | ShipPostalCode | ShipCountry |
|----|
|10248 | VINET|5 | 1996-07-04 | 1996-08-01 | 1996-07-16| 3 | 32.38 | Vins et alcools Chevalier | 59 rue de l'Abbaye | Reims|| 51100| France|
|10249 | TOMSP|6 | 1996-07-05 | 1996-08-16 | 1996-07-10| 1 | 11.61 | Toms Spezialitäten| Luisenstr. 48| Münster|| 44087| Germany|
|10250 | HANAR|4 | 1996-07-08 | 1996-08-05 | 1996-07-12| 2 | 65.83 | Hanari Carnes | Rua do Paço, 67| Rio de Janeiro | RJ | 05454-876| Brazil|
|...|...|...|...|...|...|...|...|...|...|...|



`order_details`:

| OrderID |  ProductID |  UnitPrice | Quantity | Discount |
| ----- |
|10248|11|14|12|0|
|10248|42|9.8|10|0|
|10248|72|34.8|5|0|
|10249|14|18.6|9|0|
|10249|51|42.4|40|0|
|10250|41|7.7|10|0|
|...|...|...|...|...|



### Joins

_SQL joins_ are used when data is spread across different tables. A _join_ operation combines rows from two or more tables in a single new table. To do this, there needs to be a common field between the tables.

Join operations can be thought of as operations between two sets, where records with the same key are combined and records missing in one set are either discarded or included as NULL values.

> **Check:** where have you encountered a similar functionality in Pandas? How have you used it?


#### INNER JOIN
The most common type of join is: SQL INNER JOIN (simple join). An SQL INNER JOIN returns all rows from multiple tables where the join condition is met.

Let's consider a few columns of the `orders` table above:

|OrderID|CustomerID|OrderDate|
|---|---|---|
|10308|2|1996-09-18|
|10309|37|1996-09-19|
|10310|77|1996-09-20|



In the `customers` table, let's focus on these columns:

|CustomerID|CompanyName|ContactName|Country|
|---|---|---|---|
|1|Alfreds Futterkiste|Maria Anders|Germany|
|2|Ana Trujillo Emparedados y helados|Ana Trujillo|Mexico|
|3|Antonio Moreno Taquería|Antonio Moreno|Mexico|




Notice that the `CustomerID` column in the `Orders` table refers to the `CustomerID` in the `Customers` table. We can _JOIN_ the two tables in order to obtain a table like the following:

|OrderID|CompanyName|OrderDate|
|---|---|---|
|10308|Ana Trujillo Emparedados y helados|9/18/1996|
|10365|Antonio Moreno Taquería|11/27/1996|
|10383|Around the Horn|12/16/1996|
|10355|Around the Horn|11/15/1996|
|10278|Berglunds snabbköp|8/12/1996|

A selection of the information contained in the two tables is _joined_ in a single table, using the common key of `CustomerID`.

#### Comparison with Pandas Merge

See [here](http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html) for a comparison between pandas and SQL and [here](http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging) for an in depth review of Pandas merge.

<a name="demo"></a>
## Demo: Different types of JOIN (15 mins)

|OrderID|CompanyName|OrderDate|
|---|---|---|
|10308|Ana Trujillo Emparedados y helados|9/18/1996|
|10365|Antonio Moreno Taquería|11/27/1996|
|10383|Around the Horn|12/16/1996|
|10355|Around the Horn|11/15/1996|
|10278|Berglunds snabbköp|8/12/1996|

We created this result with this statement:


```sql
    SELECT orders."OrderID", customers."CompanyName", orders."OrderDate"
    FROM orders
    INNER JOIN customers
    ON orders."CustomerID"=customers."CustomerID";
```

An `INNER JOIN` takes the intersection of the two datasets, excluding the rows for which `CustomerID` is null in either of the two tables.


There are several types of join operations.

- INNER JOIN: Returns all rows when there is at least one match in BOTH tables
- LEFT JOIN: Return all rows from the left table, and the matched rows from the right table
- RIGHT JOIN: Return all rows from the right table, and the matched rows from the left table
- FULL JOIN: Return all rows when there is a match in ONE of the tables

(This is all rooted in [Relational Algebra](https://en.wikipedia.org/wiki/Relational_algebra). We won't go into it here, but it's an interesting topic.)



![Joins](./assets/images/joins.gif)

> Check: stop and jot! Can you annotate this diagram with examples of when you would use each? If you need a concrete starting point, imagine you have a table of Q1 orders for your company, and a second table of Q1's new sales leads.

### Left Join

The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The right-hand result is NULL when the left-hand rows do not have a match.


#### Left Join Syntax
```sql
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name=table2.column_name;
```

> **Check:** Consider the JOIN we performed between `orders` and `customers`. Which column might contain NULL values in the joined table if we performed a LEFT JOIN?

### Right Join

Similarly, the RIGHT JOIN keyword returns all rows from the right table (table2), with the matching rows in the left table (table1). The result is NULL in the left side when there is no match.

#### Right Join Syntax
```sql
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name=table2.column_name;
```
> **Check:** What use cases can you imagine for a RIGHT JOIN?

### Full (outer) Join

The FULL OUTER JOIN keyword returns all rows from the left table (table1) and from the right table (table2). The FULL OUTER JOIN keyword combines the result of both LEFT and RIGHT joins. In this case we could have NULL values on both sides.

#### Full Join Syntax

```sql
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name=table2.column_name;
```

<a name="guided-practice"></a>
## Guided Practice: Joins (15 mins)

- How many products per category does the catalog contain? Print the answer with the `CategoryName`, and `Count`.


> Answer:
```sql
SELECT "CategoryName", count("ProductID")
FROM products AS p
JOIN categories AS c
ON p."CategoryID" = c."CategoryID"
GROUP BY c."CategoryName"
```


- What 5 customers are generating the highest revenue? Print a table with `CustomerID` and `Total Revenue`.


Answer:
```sql
SELECT "CustomerID",
       CAST(
       SUM("UnitPrice" *
           "Quantity" *
           (1.0 - "Discount"))
      AS numeric(36,2)) 
      AS "Revenue"
FROM orders AS o
JOIN order_details AS od
ON o."OrderID" = od."OrderID"
GROUP BY "CustomerID"
ORDER BY "Revenue" DESC
LIMIT 5
```

> Check: can you infer what CAST(...) and numeric(P,s) do?

- In which country are the top 5 suppliers by number of units supplied? Print a table with the supplier's `CompanyName`, `Country` and total number of units supplied. (This uses four tables!)

>Answer:
```sql
SELECT s."CompanyName", s."Country", sum(od."Quantity") AS "UnitsSupplied"
FROM orders o
JOIN order_details as od
ON o."OrderID" = od."OrderID"
JOIN products p
ON od."ProductID" = p."ProductID"
JOIN suppliers s
ON s."SupplierID" = p."SupplierID"
GROUP BY s."SupplierID"
ORDER BY "UnitsSupplied" DESC
LIMIT 5
```
> Check: s."CompanyName" and s."Country" are [functionally dependent](https://en.wikipedia.org/wiki/Functional_dependency#Examples) on s."SupplierID". Why does that matter here? [(Answer)](http://stackoverflow.com/questions/5986127/do-all-columns-in-a-select-list-have-to-appear-in-a-group-by-clause)

<a name="demo_2"></a>
## Demo: Sub-queries (15 mins)

SQL is very versatile and it can be stretched a bit further than simple JOIN operations between two different tables.

### Subqueries

A _Subquery_ or _Inner query_ or _Nested query_ is a query within another SQL query. It is used to further restrict the data to be retrieved by returning data that will be used in the main query as a condition.

Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along with the operators like =, <, >, >=, <=, IN, BETWEEN etc.






#### Syntax

Here is an example of a subquery. The table resulting from the subquery is used as condition in the `WHERE` condition of the main query.

```sql
SELECT column_name1
    FROM table_name1
    WHERE column_name2 [Comparison Operator]
        (SELECT column_name3
         FROM table_name2
         WHERE condition);
```




For example, let's extract all the `orders` from `customers` based in France.

```sql
SELECT "OrderID", "OrderDate" FROM orders
WHERE "CustomerID" IN
(SELECT "CustomerID"
 FROM customers
 WHERE "Country" = 'France')
```

> **Check:** How would you get the same result with a `JOIN` operation?


```sql
SELECT "OrderID", "OrderDate" FROM orders
JOIN customers
ON orders."CustomerID" = customers."CustomerID"
WHERE "Country" = 'France'
```

<a name="ind-practice"></a>
## Independent Practice: Other SQL Commands (25 minutes)

First, working in pairs: go to http://www.w3schools.com/sql and choose a command you have not heard of. Read about it for 5 minutes, then explain it to your partner (take 2.5 minutes each).

Next, get started on the `join` exercises at https://pgexercises.com/questions/joins/. Bookmark this -- it's a good resource for brushing up on the basic syntax before job interviews!

<a name="conclusion"></a>
## Conclusion (5 mins)

In this class we have started to discover the full power of Relational databases through JOINs and sub-queries. These allow us to mix and match data from various tables, in order to extract meaningful results.


### ADDITIONAL RESOURCES

- [SQL Join Documentation](http://www.w3schools.com/sql/sql_join.asp)
- [Relational Algebra](https://en.wikipedia.org/wiki/Relational_algebra)
- [Wikipedia on JOINS](https://en.wikipedia.org/wiki/Join_(SQL))
