# DAND Lesson 8 - SQL Joins

## Database Normalisation

This concept is about how data will be stored and organised. There are essentially three aspects to it:
1. Are the tables storing logical groupings of data?
2. Can I make changes in a single location, rather than in many tables for the same information?
3. Can I access and manipulate data quickly and efficiently?




## Introduction to Joins - `JOIN` and `ON`

This is just like a VLOOKUP. Need to use a common field or primary key. They key words are:
    `JOIN` - which tables are involved
    `ON` - sets primary key.

*Example from orders and accounts tables if you wanted to pull all information from both tables:* 

`SELECT orders.*, accounts.*
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;`

If we want to pull only the account name and the dates in which that account placed an order, but none of the other columns, we can do this with the following query:

`SELECT accounts.name, orders.occurred_at
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;`

**Note the `table.column` structure of the syntax in the SELECT statement.**

Alternatively, the below query pulls all the columns from both the accounts and orders table:

`SELECT *
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;`


*Example:* Try pulling standard_qty, gloss_qty, and poster_qty from the orders table, and the website and the primary_poc from the accounts table.

`SELECT orders.standard_qty, orders.gloss_qty, 
       orders.poster_qty,  accounts.website, 
       accounts.primary_poc
FROM orders
JOIN accounts
ON orders.account_id = accounts.id`



### Primary Key

A primary key normally exists in every table as the fist column in a table. It is a column that has a unique value for every row. 

### Foreign Key

A foreign key is when we see a primary key in another table. 

### Entitly Relationship Diagram (ERD)

You should create an Entity Relatioship Diagram (ERD) to map the primary and foreign key relationships between the tables:


<img src="../SQL/ERD DAND.jpg" width="600" height="400">









    

### Primary Key - Foreign Key Link

<img src="../SQL/DAND_PK_FK_Example.png" width="600" height="400">

In the above image you can see that:

1. The **region_id** is the foreign key.
2. The region_id is **linked** to id - this is the primary-foreign key link that connects these two tables.
3. The crow's foot shows that the **FK** can actually appear in many rows in the **sales_reps** table.
4. While the single line is telling us that the **PK** shows that id appears only once per row in this table.





### `JOIN` More than Two Tables (Multiples Joins)

<img src="../SQL/Join_three_tables.png" width="600" height="400">

The same logic applies for joining multiple tables with specific columns in each:

`SELECT web_events.channel, accounts.name, orders.total
FROM web_events
JOIN accounts
ON web_events.account_id = accounts.id
JOIN orders
ON accounts.id = orders.account_id`

Note that the order that you use the ON staement for the tables does not matter. You could for the last line also do `ON orders.account_id = accounts.id`

## Using an Alias in `JOIN`

Aliases allows you to rename tables so you don't have to keep typing out the whole name. You can conveniently refer to each table by their alias to save time:

`SELECT o.*, a.*
FROM orders o  <--- Here is where you rename the tabe as an alias
JOIN accounts a <--- Here is where you rename the table as an alias
ON o.account_id = a.id`



## Using an Alias with Arithmatic Operators (not using `AS`)

Example:

`FROM tablename AS t1
JOIN tablename2 AS t2
SELECT col1 + col2 AS total, col3`

You can also write the above without the `AS` statement, which is frequently done in practice:

`FROM tablename t1
JOIN tablename2 t2
SELECT col1 + col2 total, col3`




## Using an Alias for Columns in Resulting Table

While aliasing tables is the most common use case, it can also be used to alias the columns selected to have the resulting table reflect a more readable name.

Example:

`SELECT t1.column1 aliasname, t2.column2 aliasname2
FROM tablename AS t1
JOIN tablename2 AS t2`

Thie alias name fields will be what shows up in the returned table instead of `t1.column` and `t2.column2`

<img src="../SQL/alias-rename_column_names.png" width="300" height="200">



## `JOIN` Examples with `WHERE`, `ORDER BY`, `AS`, Derived Columns, Multiple Tables, and Aliases

<img src="../SQL/ERD DAND.jpg" width="600" height="400">

*Example:* Provide a table for all web_events associated with account name of Walmart. There should be three columns. Be sure to include the primary_poc, time of the event, and the channel for each event. Additionally, you might choose to add a fourth column to assure only Walmart events were chosen. 

`SELECT a.primary_poc, w.occurred_at, w.channel, a.name
FROM web_events w
JOIN accounts a
ON w.account_id = a.id
WHERE a.name = 'Walmart';`

*Example:* Provide a table that provides the region for each sales_rep along with their associated accounts. Your final table should include three columns: the region name, the sales rep name, and the account name. Sort the accounts alphabetically (A-Z) according to account name. 

`SELECT r.name region_name, s.name sales_rep_name, a.name account_name
FROM region r
JOIN sales_reps s
ON r.id = s.region_id
JOIN accounts a
ON s.id = a.sales_rep_id
ORDER BY a.name ASC`


*Example:* Provide the name for each region for every order, as well as the account name and the unit price they paid (total_amt_usd/total) for the order. Your final table should have 3 columns: region name, account name, and unit price. A few accounts have 0 for total, so I divided by (total + 0.01) to assure not dividing by zero.

`SELECT r.name region, a.name account, o.total_amt_usd/(o.total+0.01) unit_price  
FROM accounts a
JOIN sales_reps s
ON s.id = a.sales_rep_id
JOIN orders o
ON a.id = o.account_id
JOIN region r
ON r.id = s.region_id`

## Types of Joins


<img src="../SQL/types_of_joins.png" width="600" height="400">

### `INNER JOIN`

This is where two tables of data intersect each other in venn diagram. It will return data that exists in both tables (ie. a match) based on the PK and FK. Examples of this are all of the types of joins explained above, and you only need to use `JOIN`.


### `LEFT JOIN`

<img src="../SQL/Left_join.png" width="600" height="400">

The table written in the `FROM` statement is considered the left table. The table in the `JOIN` is the right table. Then add the word `LEFT` to join the tables. This is another way of saying keep all data from the LEFT table and include any matched data from the RIGHT table.

If there is not matching information in the `JOIN`ed table, then you will have columns with empty cells. These empty cells introduce a new data type called `NULL`.

The syntax `LEFT OUTER JOIN` or `RIGHT OUTER JOIN` are the exact same commands as the `LEFT JOIN` and `RIGHT JOIN`.

**RIGHT JOIN** - This is the opposite of a `LEFT JOIN`. However, since they are interachangeable (you can swap the tables to be joined so it can become a `LEFT JOIN`),  you only need to worry about doing `LEFT JOINs`.

**FULL OUTER JOIN** - All data from both tables - both matching and not matching. This is also the same as `OUTER JOIN`.

## LEFT JOINS and Filtering using `WHERE` and `AND`

A simple rule to remember this is that, when the database executes a `JOIN` query, it executes the join and everything in the `ON` clause first. Think of this as building the new result set. That result set is then filtered using the `WHERE` clause.

<img src="../SQL/Left_join_with_WHERE.png" width="800" height="500">

The fact that the above example is a left join is important. Because inner joins only return the rows for which the two tables match, moving this filter to the ON clause of an inner join will produce the same result as keeping it in the `WHERE` clause. If you put the filter on the `ON` clause with an `AND` on a `LEFT JOIN`, then it will filter the data first and process the new `JOIN`ed table. It does not filter AFTER the `JOIN`.

<img src="../SQL/Left_join_with_AND.png" width="800" height="500">