# Intro
In order to efficiently store data, we often spread related information across multiple tables.

For instance, imagine that we’re running a magazine company where users can have different types of subscriptions to different products. Different subscriptions might have many different properties. Each customer would also have lots of associated information.

We could have one table with all of the following information:

- `order_id`
- `customer_id`
- `customer_name`
- `customer_address`
- `subscription_id`
- `subscription_description`
- `subscription_monthly_price`
- `subscription_length`
- `purchase_date`

However, a lot of this information would be repeated. If the same customer has multiple subscriptions, that customer’s name and address will be reported multiple times. If the same subscription type is ordered by multiple customers, then the subscription price and subscription description will be repeated. This will make our table big and unmanageable.

So instead, we can split our data into three tables:

1. `orders` would contain just the information necessary to describe what was ordered:

    - `order_id`, `customer_id`, `subscription_id`, `purchase_date`
    
2. `subscriptions` would contain the information to describe each type of subscription:

    - `subscription_id`, `description`, `price_per_month`, `subscription_length`
    
3. `customers` would contain the information for each customer:

    - `customer_id`, `customer_name`, `address`
***

# Combining Tables Manually
Suppose we have the three tables:

- `orders`
- `subscriptions`
- `customers`

If we just look at the `orders` table, we can’t really tell what’s happened in each order. However, if we refer to the other tables, we can get a complete picture.

Let’s examine the order with an `order_id` of 2. It was purchased by the customer with a `customer_id` of 2.

To find out the customer’s name, we look at the `customers` table and look for the item with a `customer_id` value of 2. We can see that Customer 2’s name is ‘Jane Doe’ and that she lives at ‘456 Park Ave’.

Doing this kind of matching is called **joining** two tables.

***

# Combining Tables with SQL
Combining tables manually is time-consuming. Luckily, SQL gives us an easy sequence for this: it’s called a `JOIN`.

If we want to combine `orders` and `customers`, we would type:

`SELECT *
FROM orders
JOIN customers
  ON orders.customer_id = customers.customer_id;`
  
Let’s break down this command:

1. The first line selects all columns from our combined table. If we only want to select certain columns, we can specify which ones we want.
2. The second line specifies the first table that we want to look in, `orders`
3. The third line uses `JOIN` to say that we want to combine information from `orders` with `customers`.
4. The fourth line tells us how to combine the two tables. We want to match `orders` table’s `customer_id` column with `customers` table’s `customer_id` column.

Because column names are often repeated across multiple tables, we use the syntax `table_name.column_name` to be sure that our requests for columns are unambiguous. In our example, we use this syntax in the `ON` statement, but we will also use it in the `SELECT` or any other statement where we refer to column names.

For example: Instead of selecting all the columns using `*`, if we only wanted to select `orders` table’s `order_id` column and `customers` table’s `customer_name` column, we could use the following query:

`SELECT orders.order_id,
   customers.customer_name
FROM orders
JOIN customers
  ON orders.customer_id = customers.customer_id;`
  
***

# Inner Joins
Let’s revisit how we joined `orders` and `customers`. For every possible value of `customer_id` in `orders`, there was a corresponding row of `customers` with the same `customer_id`.

What if that wasn’t true?

For instance, imagine that our `customers` table was out of date, and was missing any information on customer 11. If that customer had an order in `orders`, what would happen when we joined the tables?

When we perform a simple `JOIN` (often called an *inner join*) our result only includes rows that match our `ON` condition.

Consider this [animation](https://content.codecademy.com/courses/learn-sql/multiple-tables/inner-join.gif), which illustrates an inner join of two tables on `table1.c2 = table2.c2`:

The first and last rows have matching values of `c2`. The middle rows do not match. The final result has all values from the first and last rows but does not include the non-matching middle row.

***

# Left Joins
What if we want to combine two tables and keep some of the un-matched rows?

SQL lets us do this through a command called `LEFT JOIN`. A *left join* will keep all rows from the first table, regardless of whether there is a matching row in the second table.

Consider this [animation](https://content.codecademy.com/courses/learn-sql/multiple-tables/left-join.gif)

The first and last rows have matching values of `c2`. The middle rows do not match. The final result will keep all rows of the first table but will omit the un-matched row from the second table.

This animation represents a table operation produced by the following command:

`SELECT *
FROM table1
LEFT JOIN table2
  ON table1.c2 = table2.c2;`
  
1. The first line selects all columns from both tables.
2. The second line selects `table1` (the “left” table).
3. The third line performs a `LEFT JOIN` on `table2` (the “right” table).
4. The fourth line tells SQL how to perform the join (by looking for matching values in column `c2`).

***
# Primary Key vs Foreign Key
Recall the three tables: `orders`, `subscriptions`, and `customers`.

Each of these tables has a column that uniquely identifies each row of that table:

- `order_id` for `orders`
- `subscription_id` for `subscriptions`
- `customer_id` for `customers`

These special columns are called **primary keys**.

Primary keys have a few requirements:

- None of the values can be `NULL`.
- Each value must be unique (i.e., you can’t have two customers with the same `customer_id` in the `customers` table).
- A table can not have more than one primary key column.

Let’s reexamine the `orders` table:


| order_id	| customer_id	| subscription_id | 	purchase_date |
| --- | --- | --- | --- |
| 1	 | 2	| 3 |2017-01-01
| 2	| 2	| 2	| 2017-01-01
| 3	| 3	| 1	| 2017-01-01

Note that `customer_id` (the primary key for `customers`) and `subscription_id` (the primary key for `subscriptions`) both appear in this.

When the primary key for one table appears in a different table, it is called a **foreign key**.

So `customer_id` is a primary key when it appears in `customers`, but a foreign key when it appears in `orders`.

In this example, our primary keys all had somewhat descriptive names. Generally, the primary key will just be called `id`. Foreign keys will have more descriptive names.

*Why is this important?* The most common types of joins will be joining a foreign key from one table with the primary key from another table. For instance, when we join `orders` and `customers`, we join on `customer_id`, which is a foreign key in `orders` and the primary key in `customers`.
***
# Cross Join
So far, we’ve focused on matching rows that have some information in common.

Sometimes, we just want to combine all rows of one table with all rows of another table.

For instance, if we had a table of `shirts` and a table of `pants`, we might want to know all the possible combinations to create different outfits.

Our code might look like this:

`SELECT shirts.shirt_color,
   pants.pants_color
FROM shirts
CROSS JOIN pants;`

- The first two lines select the columns `shirt_color` and `pants_color`.
- The third line pulls data from the table `shirts`.
- The fourth line performs a `CROSS JOIN` with `pants`.

Notice that cross joins don’t require an `ON` statement. *You’re not really joining on any columns!* 

If we have 3 different shirts (white, grey, and olive) and 2 different pants (light denim and black), the results might look like this:

| shirt_color |	pants_color |
| ----------- | ----------- |
| white	| light denim
| white	| black
| grey	| light denim
| grey	| black
| olive	| light denim
| olive	| black

3 shirts × 2 pants = 6 combinations!

This clothing example is fun, but it’s not very practically useful.

A more common usage of `CROSS JOIN` is when we need to compare each row of a table to a list of values.

Let’s return to our `newspaper` subscriptions. This table contains two columns that we haven’t discussed yet:

- `start_month`: the first month where the customer subscribed to the print newspaper (i.e., `2` for February)
- `end_month`: the final month where the customer subscribed to the print newspaper

Suppose we wanted to know how many users were subscribed during each month of the year. For each month (`1`, `2`, `3`) we would need to know if a user was subscribed. Follow the steps below to see how we can use a `CROSS JOIN` to solve this problem.

1. Eventually, we’ll use a cross join to help us, but first, let’s try a simpler problem.

- Let’s start by counting the number of customers who were subscribed to the `newspaper` during March.

- Use `COUNT(*)` to count the number of rows and a `WHERE` clause to restrict to two conditions:

- `start_month <= 3`
- `end_month >= 3`

> `SELECT COUNT(*)
FROM newspaper
WHERE start_month <= 3
AND end_month >= 3;`

2. The previous query lets us investigate one month at a time. In order to check across all months, we’re going to need to use a cross join.

- Our database contains another table called `months` which contains the numbers between 1 and 12.

- Select all columns from the cross join of `newspaper` and `months`.

> `SELECT *
FROM newspaper
CROSS JOIN months;`

3. Create a third query where you add a WHERE statement to your cross join to restrict to two conditions:

- `start_month <= month`
- `end_month >= month`
- This will select all months where a user was subscribed.

> `SELECT *
FROM newspaper
CROSS JOIN months
WHERE start_month <= month
AND end_month >= month;`

4. Create a final query where you aggregate over each month to count the number of subscribers.

> `SELECT month, COUNT(*) AS 'subscribers'
FROM newspaper
CROSS JOIN months
WHERE start_month <= month AND end_month >= month
GROUP BY month;`

***
# Union
Sometimes we just want to stack one dataset on top of the other. Well, the `UNION` operator allows us to do that.

Suppose we have two tables and they have the same columns.

`table1`:

| pokemon |	type |
| ------- | ---- |
| Bulbasaur	| Grass
| Charmander |	Fire
| Squirtle	| Water

`table2`:

|pokemon	|type|
|-----------|----|
|Snorlax	|normal

If we combine these two with UNION:

`SELECT *
FROM table1
UNION
SELECT *
FROM table2;`

The result would be:

| pokemon |	type |
|---------|----- |
| Bulbasaur	| Grass
| Charmander| 	Fire
| Squirtle	| Water
| Snorlax	| Normal

SQL has strict rules for appending data:

- Tables must have the same number of columns.
- The columns must have the same data types in the same order as the first table.

***
# With
Often times, we want to combine two tables, but one of the tables is the result of another calculation.

Let’s return to our magazine order example. Our marketing department might want to know a bit more about our customers. For instance, they might want to know how many magazines each customer subscribes to. We can easily calculate this using our `orders` table:

`SELECT customer_id,
   COUNT(subscription_id) AS 'subscriptions'
FROM orders
GROUP BY customer_id;`

This query is good, but a `customer_id` isn’t terribly useful for our marketing department, they probably want to know the customer’s name.

We want to be able to join the results of this query with our `customers` table, which will tell us the name of each customer. We can do this by using a `WITH` clause.

`WITH previous_results AS (
   SELECT ...
   ...
   ...
   ...
)
SELECT *
FROM previous_results
JOIN customers
  ON _____ = _____;`
  
- The `WITH` statement allows us to perform a separate query (such as aggregating customer’s subscriptions)
- `previous_results` is the alias that we will use to reference any columns from the query inside of the `WITH` clause
- We can then go on to do whatever we want with this temporary table (such as join the temporary table with another table)

Essentially, we are putting a whole first query inside the parentheses `()` and giving it a name. After that, we can use this name as if it’s a table and write a new query *using* the first query.

***
## Summary
- `JOIN` will combine rows from different tables if the join condition is true.
- `LEFT JOIN` will return every row in the left table, and if the join condition is not met, `NULL` values are used to fill in the columns from the right table.
- *Primary key* is a column that serves a unique identifier for the rows in the table.
- *Foreign key* is a column that contains the primary key to another table.
- `CROSS JOIN` lets us combine all rows of one table with all rows of another table.
- `UNION` stacks one dataset on top of another.
- `WITH` allows us to define one or more temporary tables that can be used in the final query.

***

# Intro
What happens when we query a database but we really only need a subset of the results returned? How is this situation handled when the subset of data needed spans across multiple tables?

One option that may immediately come to mind could be the use of a join. However, we can also use something called a subquery that give us the same functionality as a join, but with much more readability.

***
# Subqueries
As the name suggests, a *subquery* is an **_internal_ query nested inside of an _external_ query**. They can be nested inside of `SELECT`, `INSERT`, `UPDATE`, or `DELETE` statements. Anytime a subquery is present, it gets executed before the external statement is run.

Subqueries are very similar to joins in terms of functionality; however, joins are more efficient and subqueries are typically more readable.

For example, if we had two tables listing students in two different clubs, `book_club` and `art_club`, we could find out which students are in both tables by using a join such as:

`SELECT id, first_name, last_name
FROM book_club
JOIN art_club
  ON book_club.id = art_club.id;`
  
However, a subquery can be used to achieve the same result and is more readable:

`SELECT id, first_name, last_name
FROM book_club
WHERE id IN (
   SELECT id 
   FROM art_club);`
   
In this statement, the subquery `SELECT` statement would be executed first, resulting in a list of student ids from the `art_club` table. Then, the outer query would run and select the student ids from `book_club` table which also appear in the subquery results.


Example problem:

Complete the subquery to find students taking both band and drama.

> `SELECT first_name, last_name
FROM band_students
WHERE id IN 
  (SELECT id
  FROM drama_students);`

***
# Inserts, Updates, and Deletes
Recall that subqueries are always executed prior to the external query being run.

In the same way that the external query selects from the internal query’s results, it is important to note that this same behavior takes place when the external query is an `INSERT`, `UPDATE`, or `DELETE`. Therefore, when a subquery is nested in a `DELETE` statement, the rows to be deleted will be among the results from the *subquery*.

For example, suppose students are unable to take both history and statistics. If we wanted to delete the rows for statistics students who are also enrolled in history, we could execute a statement such as:

`DELETE FROM statistics_students
WHERE id in (
  SELECT id 
  FROM history_students);`
  
Example problem:

A memo was recently released stating that 9th grade students are unable to take both drama and band concurrently. The students currently enrolled in both classes will be dropped from drama and remain in band.

Write a DELETE query that will remove 9th grade students enrolled in both band and drama from the drama_students table.

> `DELETE FROM drama_students
WHERE id in (
   SELECT id
   FROM band_students
   WHERE grade = 9);`
   
***
# Comparison Operators
Subqueries have the unique ability to take the place of expressions in SQL queries. As such, one way of using subqueries in SQL statements is with comparison operators.

We can use operators such as `<`, `>`, `=`, and `!=` to compare the results of the external query to those of the inner query.

For example, if Olivia decided to drop statistics and take history, we could find out how many history students are at or below her grade level by performing the following query:

`SELECT * 
FROM history_students
WHERE grade <= (
  SELECT grade
  FROM statistics_students
  WHERE id = 1);`
  
Example problem:

Emlynne Torritti (`id` 20), has decided to drop band and join drama. She wants to know how many other students in her grade level are already enrolled in drama.

Use a subquery to find the students enrolled in drama that are in the same grade as Emlynne.

> `SELECT * 
FROM drama_students
WHERE grade = (
   SELECT grade
   FROM band_students
   WHERE id = 20);`
   
***

# In and Not In Clauses
One of the more common ways to use subqueries is with the use of an `IN` or `NOT IN` clause. Recall that the subquery is always executed first followed by the external query.

When an `IN` clause is used, results retrieved from the external query must appear within the subquery results. Similarly, when a `NOT IN` clause is used, results retrieved from the external query must not appear within the subquery results.

For example, we could use the below query to find out which students are enrolled in statistics and history:

`SELECT * 
FROM statistics_students
WHERE id 
IN (
  SELECT id
  FROM history_students);`
  
Example problem:

Write a query that gives the first and last names of students enrolled in band but not in drama.

> `SELECT first_name, last_name
FROM band_students
WHERE id NOT IN 
  (SELECT id 
  FROM drama_students);`
  
***
# Exists and Not Exists
While `EXISTS/NOT EXISTS` are similar to `IN/NOT IN` clauses, there are some key differences.

When a subquery is included, the inner query runs before the external query. 

When the inner query is included using an `IN` or `NOT IN` clause, all rows meeting the inner query’s criteria are returned and then compared against the external query’s criteria. 

However, when the inner query is included using an `EXISTS` or `NOT EXISTS` clause, we are only checking for the presence of rows meeting the specified criteria, so the inner query only returns a true or false.

If we compare this functionality in terms of efficiency, `EXISTS/NOT EXISTS` are usually more efficient than `IN/NOT IN` clauses; this is because the `IN/NOT IN` clause has to return all rows meeting the specific criteria whereas the `EXISTS/NOT EXISTS` only needs to find the presence of one row to determine if a true or false value needs to be returned.

Example problem:

Write a query to find out which grade levels are represented in both band and drama.

> `SELECT first_name, last_name
FROM band_students
WHERE id NOT IN 
  (SELECT id 
  FROM drama_students);`