# 1. Introducing Joins

In [1]:
%%capture

%load_ext sql

%sql sqlite:///factbook.db

In the SQL Fundamentals course, we worked exclusively with data that existed in a single table. In the real world, it's much more common for databases to have data in more than one table. If we want to be able to work with that data, we'll have to combine multiple tables within a query. The way we do this in SQL is using joins. As in the SQL Fundamentals course, we'll continue to use SQLite throughout this course.

In [6]:
%%sql

SELECT * FROM cities
LIMIT 10;

 * sqlite:///factbook.db
Done.


id,name,population,capital,facts_id
1,Oranjestad,37000,1,216
2,Saint John'S,27000,1,6
3,Abu Dhabi,942000,1,184
4,Dubai,1978000,0,184
5,Sharjah,983000,0,184
6,Kabul,3097000,1,1
7,Algiers,2916000,1,3
8,Oran,783000,0,3
9,Baku,2123000,1,11
10,Tirana,419000,1,2


* id - A unique ID for each city.
* name - The name of the city.
* population - The population of the city.
* capital - Whether the city is a capital city: 1 if it is, 0 if it isn't.
* facts_id - The ID of the country, from the facts table.

In [8]:
%%sql

SELECT *
    FROM facts
INNER JOIN cities on cities.facts_id = facts.id
LIMIT 10;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,id_1,name_1,population_1,capital,facts_id
216,aa,Aruba,180,180,0,112162,1.33,12.56,8.18,8.92,1,Oranjestad,37000,1,216
6,ac,Antigua and Barbuda,442,442,0,92436,1.24,15.85,5.69,2.21,2,Saint John'S,27000,1,6
184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,3,Abu Dhabi,942000,1,184
184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,4,Dubai,1978000,0,184
184,ae,United Arab Emirates,83600,83600,0,5779760,2.58,15.43,1.97,12.36,5,Sharjah,983000,0,184
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51,6,Kabul,3097000,1,1
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92,7,Algiers,2916000,1,3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92,8,Oran,783000,0,3
11,aj,Azerbaijan,86600,82629,3971,9780780,0.96,16.64,7.07,0.0,9,Baku,2123000,1,11
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3,10,Tirana,419000,1,2


# 2. Understanding Inner Joins

We've now joined the two tables to give us extra information about each row in cities. Let's take a closer look at how this inner join works.

An inner join works by including only rows from each table that have a match as specified using the ON clause. Let's look at a diagram of how our join from the previous screen works. We have included a selection of rows which best illustrate the join:


![](https://s3.amazonaws.com/dq-content/179/inner_join.svg)

Our inner join will include:

* Rows from the cities table that have a cities.facts_id that matches a facts.id from facts.

Our inner join will not include:

* Rows from the cities table that have a cities.facts_id that doesn't match any facts.id from facts.
* Rows from the facts table that have a facts.id that doesn't match any cities.facts_id from cities.

You can see this represented as a Venn diagram:


![](https://s3.amazonaws.com/dq-content/179/venn_inner.svg)

In [22]:
%%sql

SELECT c.*, f.name country_name
    FROM facts AS f
INNER JOIN cities AS C ON c.facts_id = f.id
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,name,population,capital,facts_id,country_name
1,Oranjestad,37000,1,216,Aruba
2,Saint John'S,27000,1,6,Antigua and Barbuda
3,Abu Dhabi,942000,1,184,United Arab Emirates
4,Dubai,1978000,0,184,United Arab Emirates
5,Sharjah,983000,0,184,United Arab Emirates


#  3. Practicing Inner Joins

In [24]:
%%sql

SELECT f.name AS country, c.name AS capital_city
    FROM facts as f
INNER JOIN cities AS c ON c.facts_id = f.id
WHERE c.capital=1
LIMIT 5;

 * sqlite:///factbook.db
Done.


country,capital_city
Aruba,Oranjestad
Antigua and Barbuda,Saint John'S
United Arab Emirates,Abu Dhabi
Afghanistan,Kabul
Algeria,Algiers


# 4. Left Joins

Let's look at how we can create a query to explore the missing data using a new type of join— the left join.

A left join includes all the rows that an inner join will select, plus any rows from the first (or left) table that don't have a match in the second table. We can see this represented as a Venn diagram.

![](https://s3.amazonaws.com/dq-content/179/venn_left.svg)

In [28]:
%%sql

SELECT f.name AS country, f.population
    FROM facts AS f
LEFT JOIN cities AS c ON c.facts_id = f.id
WHERE c.name IS NULL
LIMIT 10;

 * sqlite:///factbook.db
Done.


country,population
Kosovo,1870981.0
Monaco,30535.0
Nauru,9540.0
San Marino,33020.0
Singapore,5674472.0
Holy See (Vatican City),842.0
Taiwan,23415126.0
European Union,513949445.0
Ashmore and Cartier Islands,
Christmas Island,1530.0


# 5. Right Joins and Outer Joins

Looking through the results of the query we wrote in the previous screen, we can see a number of different reasons that countries don't have corresponding values in cities:

* Countries with small populations and/or no major urban areas (which are defined as having populations of over 750,000), eg San Marino, Kosovo, and Nauru.
* City-states, such as Monaco and Singapore.
* Territories that are not themselves countries, such as Hong Kong, Gibraltar, and the Cook Islands.
* Regions & Oceans that aren't countries, such as the European Union and the Pacific Ocean.
* Genuine cases of missing data, such as Taiwan.

It's important whenever you use inner joins to be mindful that you might be excluding important data, especially if you are joining based on columns that aren't linked in the database schema.

There are two less-common join types SQLite does not support that you should be aware of. The first is a right join. A right join, as the name indicates, is exactly the opposite of a left join. While the left join includes all rows in the table before the JOIN clause, the right join includes all rows in the new table in the JOIN clause. We can see a right join in the Venn diagram below:

![](https://s3.amazonaws.com/dq-content/179/venn_right.svg)

The main reason a right join would be used is when you are joining more than two tables. In these cases, using a right join is preferable because it can avoid restructuring your whole query to join one table. Outside of this, right joins are used reasonably rarely, so for simple joins it's better to use a left join than a right as it will be easier for your query to be read and understood by others.

The other join type not supported by SQLite is a full outer join. A full outer join will include all rows from the tables on both sides of the join. We can see a full outer join in the Venn diagram below:

![](https://s3.amazonaws.com/dq-content/179/venn_full.svg)

Like right joins, full outer joins are reasonably uncommon, and similar results can be achieved using a union clause (which we will teach in the next lesson). The standard SQL syntax for a full outer join is:

    SELECT f.name country, c.name city
    FROM cities c
    FULL OUTER JOIN facts f ON f.id = c.facts_id
    LIMIT 5;
    
![](https://s3.amazonaws.com/dq-content/179/join_venn_diagram.svg)

# 6. Finding the Most Populous Capital Cities

In [None]:
SELECT c.name, 

LIMIT 10;