## Inner Join Introduction
The most comman way to join data using SQL is using an **inner join**. The syntax for inner join is:

```mysql
SELECT [column_names] FROM [table_name_one]
INNER JOIN [table_name_two] ON [join_constraint];
```
- <code><font color="red">INNER JOIN</font></code>, the name of the table to join in query 
- <code><font color="red">ON</font></code>, what columns to use to join the two tables
- <code><font color="red">FROM</font></code>, Joins usually used in query after



```mysql
SELECT * FROM facts
INNER JOIN cities ON cities.facts_id = facts.id
LIMIT 5
```

- <code><font color="blue">INNER JOIN cities</font></code>, Use inner join to join table <code><font color="blue">cities</font></code>
- <code><font color="blue">cities ON cities.facts_id = facts.id</font></code>, tells the SQL engine which columns to use when joining the data, following the syntax <code><font color="blue">table_name.column_name</font></code>


You might presume that ```SELECT * FROM``` facts will mean that the query returns only columns from the facts table, however the ```*``` wildcard when used with a join will give you all columns from both tables. Here is the result of this query:

This query gives us all columns from both tables and every row where there is a match between the id column from facts and the facts_id from cities, limited to the first 5 rows. We'll look at how the join itself works in detail in a moment, but first let's practice writing our first join.

## Schema for two tables
<img src="images/schema.svg"/>


    id - A unique ID for each city.
    name - The name of the city.
    population - The population of the city.
    capital - Whether the city is a capital city: 1 if it is, 0 if it isn't.
    facts_id - The ID of the country, from the facts table.


## Inner Join
After 
```mysql
SELECT * FROM facts
INNER JOIN cities ON cities.facts_id = facts.id
LIMIT 10
```
We have 
<img src="images/inner_join.svg"/>

Our inner join will include:

    Rows from the cities table that have a cities.facts_id that matches a facts.id from facts.

Our inner join will not include:

    Rows from the cities table that have a cities.facts_id that doesn't match any facts.id from facts.
    Rows from the facts table that have a facts.id that doesn't match any cities.facts_id from cities.
    
You can see this represented as a Venn diagram:
<img src="images/venn_inner.svg"/>


In the SQL fundamentals course, we learned how to use [aliases](https://www.tutorialspoint.com/sqlite/sqlite_alias_syntax.htm) to specify custom names for columns, eg:

SELECT AVG(population) AS AVERAGE_POPULATION

We can also create aliases for table names, which makes queries with joins easier to both read and write. Instead of:
```mysql
SELECT * FROM facts
INNER JOIN cities ON cities.facts_id = facts.id
```
We can write:
```mysql
SELECT * FROM facts AS f
INNER JOIN cities AS c ON c.facts_id = f.id
```
Just like with column names, using AS is optional. We can get the same result by writing:
```mysql
SELECT * FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
```



We can also combine aliases with wildcards - for instance, using the aliases created above, c.* would give us all columns from the table cities.

While our query from the previous screen included both columns from the ON clause, we don't need to use either column from our ON clause in our final list of columns. This is useful as it means we can show only the information we're interested in, rather than having to include the two join columns every time.

Let's use what we've learned to build on our original query.

**Instructions**

Write a query that:

    Joins cities to facts using an INNER JOIN.
    Uses aliases for table names.
    Includes, in order:
        All columns from cities.
        The name column from facts aliased to country_name.
    Includes only the first 5 rows.


```mysql
SELECT c.*, f.name AS country_name FROM cities AS c 
INNER JOIN facts AS f 
ON c.facts_id = f.id 
LIMIT 5
```
Or 

```mysql
SELECT c.*, f.name country_name FROM cities c 
INNER JOIN facts f 
ON c.facts_id = f.id 
LIMIT 5
```

## Practicing Inner Join
Let's practice writing a query to answer a question from our database using an inner join. Say we want to produce a table of countries and their capital cities from our database using what we've learned so far. Our first step is to think about what columns we'll need in our final query. We'll need:

    The name column from facts
    The name column from cities

Given that we've identified that we need data from two tables, we need to think about how to join them. The schema diagram from earlier indicated that there is only one column in each table that links them together, so we can use an inner join with those columns to join the data.

So far, thinking through our question we can already write most of our query:
```mysql
SELECT f.name, c.name FROM cities c
INNER JOIN facts f ON f.id = c.facts_id
```

The last part of our process is to make sure we have the correct rows. From the previous two screens we know that a query like this will return all rows from cities that have a corresponding match from facts in the facts_id column. We're only interested in the capital cities from the cities table, so we'll need to use a WHERE clause on the capital column, which has a value of 1 if the city is a capital, and 0 if it isn't:
```mysql
WHERE c.capital = 1
```
We can now put this all together to write a query that answers our question.

**Instructions**

    Write a query that returns, in order:
        A column of country names, called country.
        A column of each country's capital city, called capital_city.
    Use an INNER JOIN to join the two tables in your query.
    
```mysql
SELECT f.name country, c.name capital_city FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
WHERE c.capital = 1
```


## Left Join
As we mentioned earlier, an inner join will not include any rows where there is not a mutual match from both tables. This means there could be information we are not seeing in our query where rows don't match.

We can use the SQL console to run some queries to explore this:
```mysql
>>> SELECT COUNT(DISTINCT(name)) FROM facts;

   [["COUNT(DISTINCT(name))"], [261]]
```
```mysql
>>> SELECT COUNT(DISTINCT(facts_id)) FROM cities;

    [["COUNT(DISTINCT(facts_id))"], [210]]
```
By running these two queries, we can see that there are some countries in the facts table that don't have corresponding cities in the cities table, which indicates we may have some incomplete data.

Let's look at how we can create a query to explore the missing data using a new type of join— the left join.

A left join includes all the rows that an inner join will select, plus any joins from the first (or left) table that don't have a match in the second table. We can see this represented as a Venn diagram.

<img src="images/venn_left.svg"/>

Let's look at an example by replacing INNER JOIN with LEFT JOIN from the first query we wrote, and looking at the same selection of rows from our earlier diagram

```mysql
SELECT * FROM facts
LEFT JOIN cities ON cities.facts_id = facts.id 
```
<img src="images/left_join.svg"/>

Here we can see that for the rows where facts.id doesn't match any values in cities.facts_id (237, 238, 240, and 244), the rows are still included in the results. When this happens, all of the columns from the cities table are populated with null values.

We can use these null values to filter our results to just the countries that don't exist in cities with a WHERE clause. When making a comparison to null in SQL, we use the IS keyword, rather than the = sign. If we want to select rows where a column is null we can write:

```mysql 
WHERE column_name IS NULL
```


If we want to select rows where a column name isn't null, we use:
```mysql 
WHERE column_name IS NOT NULL
```

Let's use a left join to explore the countries that don't exist in the cities table.




**Instructions**

    Write a query that returns the countries that don't exist in cities:
        Your query should return two columns:
            The country names, with the alias country.
            The country population.
        Use a LEFT JOIN to join cities to facts.
        Include only the countries from facts that don't have a corresponding value in cities.
        
```mysql
SELECT f.name country, f.population population 
FROM facts f
LEFT JOIN cities c ON  c.facts_id = f.id
WHERE c.name IS NULL;
```


## Right Joins and Outer Joins
Looking through the results of the query we wrote in the previous screen, we can see a number of different reasons that countries don't have corresponding values in cities:

    Countries with small populations and/or no major urban areas (which are defined as having populations of over 750,000), eg San Marino, Kosovo, and Nauru.
    City-states, such as Monaco and Singapore.
    Territories that are not themselves countries, such as Hong Kong, Gibraltar, and the Cook Islands.
    Regions & Oceans that aren't countries, such as the European Union and the Pacific Ocean.
    Genuine cases of missing data, such as Taiwan.

It's important whenever you use inner joins to be mindful that you might be excluding important data, especially if you are joining based on columns that aren't linked in the database schema.

There are two less-common join types SQLite does not support that you should be aware of. The first is a right join. A right join, as the name indicates, is exactly the opposite of a left join. Where the left join includes all rows in the table before the JOIN clause, the right join includes all rows in the new table in the JOIN clause. We can see a right join in the Venn diagram below:

<img src="venn_right.svg"/>

The following two queries, one using a left join and one using a right join, produce identical results.

```mysql
SELECT f.name country, c.name city
FROM facts f
LEFT JOIN cities c ON c.facts_id = f.id
LIMIT 5;
```

```mysql
SELECT f.name country, c.name city
FROM cities c
RIGHT JOIN facts f ON f.id = c.facts_id
LIMIT 5;
```

The main reason a right join would be used is in a complex query where you are joining more than two tables. In these cases, using a right join is preferable because it can avoid restructuring your whole query to join one table. Outside of this, right joins are used reasonably rarely, so for simple joins it's better to use a left join than a right as it will be easier for your query to be read and understood by others.

The other join type not supported by SQLite is a **full outer join**. A full outer join will include all rows from the tables on both sides of the join. We can see a full outer join in the Venn diagram below:
<img src="venn_full.svg"/>

Like right joins, full outer joins are reasonably uncommon, and similar results can be achieved using a union clause (which we will teach in the next mission). The standard SQL syntax for an full outer join is:

```mysql
SELECT f.name country, c.name city
FROM cities c
FULL OUTER JOIN facts f ON f.id = c.facts_id
LIMIT 5;
```

When joining cities and facts with a full outer join, the result will be be the same as our left and right joins above, because there are no values in cities.facts_id that don't exist in facts.id.

Let's look at the Venn diagrams of each join type side by side, which should help you compare the differences of each of the four joins we've discussed so far.

<img src="join_venn_diagram.svg"/>
Next, let's practice using joins to answer some questions about our data.

## Finding the Most Populous Capital Cities
Previously, we've used column names when specifying order for our query results, like so:
```mysql
SELECT name, migration_rate FROM FACTS
ORDER BY migration_rate desc;
```

There is a handy shortcut we can use in our queries which lets us skip the column names, and instead use the order in which the columns appear in the SELECT clause. In this instance, migration_rate is the second column in our SELECT clause so we can just use 2 instead of the column name:

```mysql
SELECT name, migration_rate FROM FACTS
ORDER BY 2 desc;
```

You can use this shortcut in either the ORDER BY or GROUP BY clauses. Be mindful that you want to ensure your queries are still readable, so typing the full column name may be better for more complex queries.

Let's use what we've learned to produce a list of the top 10 capital cities by population. Remember that capital is a boolean column containing 1 or 0, depending on whether a city is a capital or not. We won't specify which join type you should use - you will need to think about what results you require and select an appropriate join type.

**Instructions**

- Write a query that returns the 10 capital cities with the highest population ranked from biggest to smallest population.
- You should include the following columns, in order:
        capital_city, the name of the city.
        country, the name of the country the city is from.
        population, the population of the city.
- **Hint**: Because we are not interested in countries from facts that don't have corresponding cities in cities, we should use an INNER JOIN.

```mysql
SELECT c.name capital_city, f.name country, c.population
FROM facts f
INNER JOIN cities c ON c.facts_id = f.id
WHERE capital = 1
ORDER BY c.population DESC 
LIMIT 10
```



## Combining Joins with Subqueries
As we learned in the SQL fundamentals course, subqueries can be used to substitute parts of queries, allowing us to find the answers to more complex questions. We can also join to the result of a subquery, just like we could a table.

Here's an example of a using a join and a subquery to produce a table of countries and their capital cities, like we did earlier in the mission.

<img src="explain_subquery.svg"/>

Reading subqueries can be overwhelming at first, so we'll break down what happens in this example in several steps. The important thing to remember is that the result of any subqueries are always calculated first, so we read from the inside out.

- The subquery, in the red box, is calculated first. This simple query selects all columns from cities, filtering rows that are marked as capital cities by having a value for capital of 1.
- The INNER JOIN joins the subquery result, aliased as c, to the facts table based on the ON clause.
- Two columns are selected from the results of the join:
    - f.name, aliased as country.
    - c.name, aliased as capital_city.
- The results are limited to the first 10 rows.

Below is the output of this query:
<img src="7-inner-join-query.png"/>
Using this example as a model, we'll write a similar query to find the capital cities with populations of over 10 million.

**Instructions**
- Using a join and a subquery, write a query that returns capital cities with populations of over 10 million ordered from largest to smallest. Include the following columns:
    - capital_city - the name of the city.
    - country - the name of the country the city is the capital of.
    - population - the population of the city.
    
    
```mysql
SELECT c.name capital_city, f.name country, c.population
FROM facts f
INNER JOIN (SELECT * 
            FROM cities 
            WHERE capital = 1 ) c 
ON c.facts_id = f.id 
WHERE c.population > 10000000
ORDER BY c.population DESC
```

## Challenge: Complex Query with Joins and Subqueries
Let's take everything we've learned before and use it to write a more complex query. It's not uncommon to find that 'thinking in SQL' takes a bit of getting used to, so don't be discouraged if this challenge takes you a while. It will get easier with practice!

When you're writing complex queries with joins and subqueries, it helps to follow this process:

- Think about what data you need in your final output
- Work out which tables you'll need to join, and whether you will need to join to a subquery.
    - If you need to join to a subquery, write the subquery first.
- Then start writing your SELECT clause, followed by the join and any other clauses you will need.
- Don't be afraid to write your query in steps, running it as you go— for instance you can run your subquery as a 'stand alone' query first to make sure it looks like you want before writing the outer query.

We will be writing a query to find the countries where the urban center (city) population is more than half of the country's total population. Our final results will look like this:

To help you out, the query you will write will include:

    A join to a subquery.
    A subquery to make a calculation.
    An aggregate function.
    A WHERE clause.
    A CAST expression.

Remember that there are multiple ways to write this query, and the list above is based on the approach we took in our solution.

**Instructions**

- Write a query that generates output as shown above. The query should include:

- The following columns, in order:
    - country, the name of the country.
    - urban_pop, the sum of the population in major urban areas belonging to that country.
    - total_pop, the total population of the country.
    - urban_pct, the percentage of the popularion within urban areas, calculated by dividing urban_pop by total_pop.
- Only countries that have an urban_pct greater than 0.5.
- Rows should be sorted by urban_pct in ascending order.




```mysql
SELECT
  f.name country,
  c.urban_pop,
  f.population total_pop,
  (1.0 * c.urban_pop) / f.population urban_pct
FROM
  facts f
  INNER JOIN (
    SELECT
      facts_id,
      SUM(population) urban_pop
    FROM
      cities
    GROUP BY
      facts_id
  ) c ON c.facts_id = f.id
WHERE
  urban_pct > 0.5
ORDER BY
  urban_pct ASC;
```

OR 

```mysql
SELECT
  f.name country,
  SUM(c.population) urban_pop,
  f.population total_pop,
  SUM(1.0 * c.population) / f.population urban_pct
FROM
  facts f
  INNER JOIN cities c ON f.id = c.facts_id
GROUP BY 
  country
HAVING
  urban_pct > 0.5
ORDER BY
  urban_pct
    
```
