# SQL Joins

In this webinar we will be reviewing SQL joins - the different type of joins, how they work in theory, and a couple of exercises to get used to using them. The joins we will be looking at are: 

- INNER JOIN
- LEFT JOIN
- RIGHT JOIN (brief)
- FULL OUTER JOIN
- CROSS JOIN

## What are 'joins'?

SQL Joins are used to combine data or rows from two or more tables based on a common field between them. Say we need information from two different tables, and we want to return this information in one output from a query. We need to think about how these two tables are linked, so that we can display the information we need. 

Tables are combined using primary and foreign keys of the tables. Reminder:
- A primary key is a column or group of columns in a table that uniquely identifies each row in that table. e.g student number, id number.
- A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It is a column (or columns) that references a column (most often the primary key) of another table.

The general syntax of a JOIN statement is as follows:

```sql
SELECT column(s)
FROM table1
<join_type> JOIN table2
ON table1.key = table2.key
```

## Inner Join

An inner join is the same as returning the INTERSECT of two tables. We are returning rows that have matching values along a particular column in BOTH tables we are referencing:

![](https://i.ibb.co/kJPyJh8/Inner.png)

```sql
SELECT Names.id, Names.name, Streams.stream
FROM Names
INNER JOIN Streams
ON Names.id = Streams.id;
```

## Left Join

When joining two tables, a LEFT JOIN returns all records from the left table and matched records from the right table.

If no match is found in the right table, then the result from the right table is NULL on that row.


![](https://i.ibb.co/9tK7ZwH/SQL-Joins-Left-Outer-Join.png)

```sql
SELECT Names.id, Names.name, Streams.stream
FROM Names
LEFT JOIN Streams
ON Names.id = Streams.id;
```

## Right Join

`RIGHT JOIN` is not supported in SQLite - however, a right join works in the opposite way to a left join. It will return all records from the right table, and matched records from the left table - if no match is found in the left table, then the result from the left table will be NULL on that row.

We can easily rearrange a right join to use a LEFT JOIN statement instead, simply by swapping where we place each of the tables in our query

## Full Outer Join (also 'Full Join')

A `FULL JOIN` is also not supported in SQLite, however it works the same as getting the UNION of our tables. A full join will return all the rows from both tables, with NULL values for non-matching rows. We can imitate the working of a full join by using `LEFT JOIN`s and the `UNION ALL` operator

![](https://i.ibb.co/F89sQpL/SQL-Joins-Full-Outer-Join.png)

```sql
SELECT Names.id, Names.name, Streams.stream
FROM Names
FULL OUTER JOIN Streams
ON Names.id = Streams.id;
```

Using the UNION ALL operator...


```sql
SELECT Names.id, Names.name, Streams.stream
FROM Names
LEFT JOIN Streams
    ON Names.id = Streams.id

UNION ALL

SELECT Names.id, Names.name, Streams.stream
FROM Streams 
LEFT JOIN Names
    ON Names.id = Streams.id
```

## Cross Join

The `CROSS JOIN` of two or more tables is similar to taking their cartesian product. The result is a set of rows containing all possible ordered combinations of rows from the two tables.

![](https://i.ibb.co/BPhDn1H/SQL-Joins-Cross-Join.png)

```sql
SELECT Names.id, Names.name, Streams.stream
FROM Names
CROSS JOIN Streams
ON Names.id = Streams.id;
```

---

### How to build up a query

##### Step 1 
- Create a basic query
    - SELECT * FROM table_name LIMIT 30

##### Step 2
- Does the current table have all the info you need to answer our question?
    - Yes -> then proceed to Step 4
    - No -> then proceed to Step 3
    
##### Step 3
- Let's get the information we need. How do we do this?
    - JOIN
    - INNER JOIN
    - LEFT JOIN
    - OUTER JOIN
- What table has the information we want? 
- Go back to step 2
    
##### Step 4 
- Do we need to filter our data in someway to answer our question? Do we need to isolate specific data from the rest of the data?
    - WHERE -> Done before data is aggregated or grouped
    - HAVING -> Used after a GROUP BY clause

##### Step 5 
- Do we need to know certain properties of our data? How many data points meet our condition? What is the average value of the data points?
    - COUNT -> counts all the rows of a given table or column
    - COUNT (DISTINCT) -> counts all the unique entries of a column (No Duplicates)
    - MAX -> Returns the largest value in a specified column
    - MIN -> Return the lowest value in a specified column
    - "Basic Maths Operations" -> + ; - ; * ; /
    - SUM -> Will add all value in a numeric column excluding nulls
    - AVG -> Will take the sum of all value defined by the total number of values excluding nulls
    - GROUP BY -> Concatenates results by mergeing them together based on some sort of Aggregation
    
##### Step 6 
- Finally, we may need to organise our results to find the answer we want:
    - ORDER BY -> DESC or Ascending order (NOTE this is the Default setting)
    
    
   `

```sql
- Step 1

SELECT * 
FROM table_name_1
LIMIT 30; 

- Step 3

SELECT * 
FROM table_name_1 as t1
JOIN table_name_2 as t2
ON t1.common_column = t2.common_column

--- We can keep joining until we have all our data

JOIN table_name_3 as t3
ON t3.common_column = t2.common_column

- Step 4
SELECT (*) 
FROM table_name_1 as t1
JOIN table_name_2 as t2
ON t1.common_column = t2.common_column
WHERE t1.column_name = 'Something' and t2.column_name = 'Something else'

- Step 5
SELECT Some_Aggregation (*) 
FROM table_name_1 as t1
JOIN table_name_2 as t2
ON t1.common_column = t2.common_column
WHERE t1.column_name = 'Something'

- Step 6
SELECT Some_Aggregation (*) 
FROM table_name_1 as t1
JOIN table_name_2 as t2
ON t1.common_column = t2.common_column
WHERE t1.column_name = 'Something'
ORDER BY t1.column_name 
```

---

## Let's try some exercises

Load in our extension first:

In [2]:
%load_ext sql

Load in our data - we'll be using the chinook.db database

In [3]:
%%sql
sqlite:///chinook.db

Chinook database ER diagram:

<img src="https://github.com/Explore-AI/Pictures/blob/master/sqlite-sample-database-color.jpg?raw=true" width=70%/>

_[Image source](https://www.sqlitetutorial.net/sqlite-sample-database/)_

In [4]:
%%sql
SELECT name FROM sqlite_schema WHERE type='table' ORDER BY name

 * sqlite:///chinook.db
Done.


name
albums
artists
customers
employees
genres
invoice_items
invoices
media_types
playlist_track
playlists


---

1. Display the FirstName, LastName, InvoiceID, Invoice Date and Country columns, for all customers from Brazil

**Let's think about the steps from before**

- Create a basic query

- Does the current table have all the info you need to answer our question?
    - Yes -> then proceed to Step 4
    - No -> then proceed to Step 3
    
- How do we get the information we need and from where?
    
- Do we need to filter our data in someway to answer our question? Do we need to isolate specific data from the rest of the data?

- Do we need to know certain properties of our data? How many data points meet our condition? What is the average value of the data points?
    
- Do we need to organise our results to find the answer we want?

In [5]:
%%sql
SELECT 
    c.FirstName, 
    c.LastName, 
    i.InvoiceId, 
    i.InvoiceDate, 
    c.Country 
FROM 
    Invoices i
INNER JOIN Customers c 
    ON c.CustomerId = i.CustomerId 
WHERE c.Country = "Brazil"

 * sqlite:///chinook.db
Done.


FirstName,LastName,InvoiceId,InvoiceDate,Country
Luís,Gonçalves,98,2010-03-11 00:00:00,Brazil
Luís,Gonçalves,121,2010-06-13 00:00:00,Brazil
Luís,Gonçalves,143,2010-09-15 00:00:00,Brazil
Luís,Gonçalves,195,2011-05-06 00:00:00,Brazil
Luís,Gonçalves,316,2012-10-27 00:00:00,Brazil
Luís,Gonçalves,327,2012-12-07 00:00:00,Brazil
Luís,Gonçalves,382,2013-08-07 00:00:00,Brazil
Eduardo,Martins,25,2009-04-09 00:00:00,Brazil
Eduardo,Martins,154,2010-11-14 00:00:00,Brazil
Eduardo,Martins,177,2011-02-16 00:00:00,Brazil


---

2. Let's suppose that, as part of a new business strategy, Chinook wants to develop new product categories for their media items that are based on genre and media type. To do this, we write a query that will list all possible product categories **(i.e. all possible genre and media type combinations)**.

In [17]:
%%sql 
SELECT 
    g.Name AS "Genre", 
    m.Name AS "Media Type"
FROM 
    Genres AS g
CROSS JOIN Media_types AS m
LIMIT 30;

 * sqlite:///chinook.db
Done.


Genre,Media Type
Rock,MPEG audio file
Rock,Protected AAC audio file
Rock,Protected MPEG-4 video file
Rock,Purchased AAC audio file
Rock,AAC audio file
Jazz,MPEG audio file
Jazz,Protected AAC audio file
Jazz,Protected MPEG-4 video file
Jazz,Purchased AAC audio file
Jazz,AAC audio file


---

3. How many customers are assigned to the Sales Agent with the first name 'Jane'?

In [13]:
%%sql
SELECT 
    e.FirstName, 
    COUNT(c.SupportRepId) AS NumberOfCustomers
FROM 
    Employees e
INNER JOIN Customers c
    ON c.SupportRepId = e.EmployeeId 
WHERE e.FirstName = 'Jane';

 * sqlite:///chinook.db
Done.


FirstName,NumberOfCustomers
Jane,21


---

4. How many employees did not assist customers when they made their purchase?

In [None]:
%%sql
SELECT *
FROM employees
LIMIT 3

In [None]:
%%sql
SELECT *
FROM customers
LIMIT 3

In [11]:
%%sql
SELECT 
    COUNT(e.FirstName),  
    c.SupportRepId AS CustomerHelped
FROM 
    employees AS e
LEFT JOIN 
    customers AS c
    ON e.EmployeeId = c.SupportRepId
WHERE c.SupportRepId IS NULL

 * sqlite:///chinook.db
Done.


COUNT(e.FirstName),CustomerHelped
5,


---

5. Write a query to display the number of invoices associated with each sales agent. Display each agent's **full name** as "SalesAgentName", and the number of associated invoices as "AssociatedInvoice"

In [8]:
%%sql
SELECT 
    (e.FirstName || " " || e.LastName) AS SalesAgentName, 
    count(i.InvoiceId) AS AssociatedInvoice
FROM 
    Invoices i 
JOIN Customers c 
    ON i.CustomerId = c.CustomerId
JOIN Employees e 
    ON e.EmployeeId = c.SupportRepId
GROUP BY e.FirstName || " " || e.lastName

 * sqlite:///chinook.db
Done.


SalesAgentName,AssociatedInvoice
Jane Peacock,146
Margaret Park,140
Steve Johnson,126


---

6. Which Sales Agent made the most in sales overall? (use the agent's first name only). Display the total sales column as "TotalSales"

In [9]:
%%sql
SELECT 
    e.FirstName, 
    SUM(i.Total) AS TotalSales
FROM 
    Employees e
JOIN Customers c 
    ON e.EmployeeId = c.SupportRepId
JOIN Invoices i
    ON c.CustomerId = i.CustomerId
GROUP BY e.FirstName 
ORDER BY SUM(i.Total) DESC

 * sqlite:///chinook.db
Done.


FirstName,TotalSales
Jane,833.0400000000013
Margaret,775.4000000000011
Steve,720.160000000001
