## In this chapter, you'll be introduced to the concept of joining tables, and explore the different ways you can enrich your queries using inner joins and self-joins. You'll also see how to use the case statement to split up a field into different categories.

# Inner join

Throughout this course, you'll be working with the countries database containing information about the most populous world cities as well as country-level economic data, population data, and geographic data. This countries database also contains information on languages spoken in each country.

In [1]:
import numpy as np
import pandas as pd

In [3]:
cities = pd.read_csv('cities.csv')
cities.head(10)

Unnamed: 0,name,country_code,city_proper_pop,metroarea_pop,urbanarea_pop
0,Abidjan,CIV,4765000,,4765000
1,Abu Dhabi,ARE,1145000,,1145000
2,Abuja,NGA,1235880,6000000.0,1235880
3,Accra,GHA,2070463,4010054.0,2070463
4,Addis Ababa,ETH,3103673,4567857.0,3103673
5,Ahmedabad,IND,5570585,,5570585
6,Alexandria,EGY,4616625,,4616625
7,Algiers,DZA,3415811,5000000.0,3415811
8,Almaty,KAZ,1703481,,1703481
9,Ankara,TUR,5271000,4585000.0,5271000


In [4]:
countries = pd.read_csv('countries.csv')
countries.head(10)

Unnamed: 0,code,country_name,continent,region,surface_area,indep_year,local_name,gov_form,capital,cap_long,cap_lat
0,AFG,Afghanistan,Asia,Southern and Central Asia,652090.0,1919.0,Afganistan/Afqanestan,Islamic Emirate,Kabul,69.1761,34.5228
1,NLD,Netherlands,Europe,Western Europe,41526.0,1581.0,Nederland,Constitutional Monarchy,Amsterdam,4.89095,52.3738
2,ALB,Albania,Europe,Southern Europe,28748.0,1912.0,Shqiperia,Republic,Tirane,19.8172,41.3317
3,DZA,Algeria,Africa,Northern Africa,2381740.0,1962.0,Al-Jazair/Algerie,Republic,Algiers,3.05097,36.7397
4,ASM,American Samoa,Oceania,Polynesia,199.0,,Amerika Samoa,US Territory,Pago Pago,-170.691,-14.2846
5,AND,Andorra,Europe,Southern Europe,468.0,1278.0,Andorra,Parliamentary Coprincipality,Andorra la Vella,1.5218,42.5075
6,AGO,Angola,Africa,Central Africa,1246700.0,1975.0,Angola,Republic,Luanda,13.242,-8.81155
7,ATG,Antigua and Barbuda,North America,Caribbean,442.0,1981.0,Antigua and Barbuda,Constitutional Monarchy,Saint John's,-61.8456,17.1175
8,ARE,United Arab Emirates,Asia,Middle East,83600.0,1971.0,Al-Imarat al-´Arabiya al-Muttahida,Emirate Federation,Abu Dhabi,54.3705,24.4764
9,ARG,Argentina,South America,South America,2780400.0,1816.0,Argentina,Federal Republic,Buenos Aires,-58.4173,-34.6118


In [5]:
currencies = pd.read_csv('currencies.csv')
currencies.head()

Unnamed: 0,curr_id,code,basic_unit,curr_code,frac_unit,frac_perbasic
0,1,AFG,Afghan afghani,AFN,Pul,100.0
1,2,ALB,Albanian lek,ALL,Qindarke,100.0
2,3,DZA,Algerian dinar,DZD,Santeem,100.0
3,4,AND,Euro,EUR,Cent,100.0
4,5,AGO,Angolan kwanza,AOA,Centimo,100.0


In [6]:
economies = pd.read_csv('economies.csv')
economies.head()

Unnamed: 0,econ_id,code,year,income_group,gdp_percapita,gross_savings,inflation_rate,total_investment,unemployment_rate,exports,imports
0,1,AFG,2010,Low income,539.667,37.133,2.179,30.402,,46.394,24.381
1,2,AFG,2015,Low income,615.091,21.466,-1.549,18.602,,-49.11,-7.294
2,3,AGO,2010,Upper middle income,3599.27,23.534,14.48,14.433,,-3.266,-21.076
3,4,AGO,2015,Upper middle income,3876.2,-0.425,10.287,9.552,,6.721,-21.778
4,5,ALB,2010,Upper middle income,4098.13,20.011,3.605,31.305,14.0,10.645,-8.013


In [7]:
languages = pd.read_csv('languages.csv')
languages.head()

Unnamed: 0,lang_id,code,name,percent,official
0,1,AFG,Dari,50.0,True
1,2,AFG,Pashto,35.0,True
2,3,AFG,Turkic,11.0,False
3,4,AFG,Other,4.0,False
4,5,ALB,Albanian,98.8,True


In [8]:
population = pd.read_csv('populations.csv')
population.head()

Unnamed: 0,pop_id,country_code,year,fertility_rate,life_expectancy,size
0,20,ABW,2010,1.704,74.953537,101597.0
1,19,ABW,2015,1.647,75.573585,103889.0
2,2,AFG,2010,5.746,58.970829,27962207.0
3,1,AFG,2015,4.653,60.717171,32526562.0
4,12,AGO,2010,6.416,50.654171,21219954.0


## Answer:
```SQL
-- 1. Select name fields (with alias) and region 
SELECT cities.name AS city, countries.name AS country, region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;
```

## Inner join (2)
Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use AS to add the alias immediately after the table name with a space. Check out the aliasing of cities and countries below.  
```SQL
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;
```  
Notice that to select a field in your query that appears in multiple tables, you'll need to identify which table/table alias you're referring to by using a . in your SELECT statement.

You'll now explore a way to get data from both the countries and economies tables to examine the inflation rate for both 2010 and 2015.

Sometimes it's easier to write SQL code out of order: you write the SELECT statement after you've done the JOIN.

__Instructions__
- Join the tables countries (left) and economies (right) aliasing countries AS c and economies AS e.
- Specify the field to match the tables ON.
- From this join, SELECT:
    - c.code, aliased as country_code.
    - name, year, and inflation_rate, not aliased.

## Answer:
```SQL
-- 3. Select fields with aliases
SELECT c.code AS country_code, name, year, inflation_rate
FROM countries AS c
-- 1. Join to economies (alias e)
INNER JOIN economies AS e
-- 2. Match on code
ON c.code = e.code;
```

## Inner join (3)
The ability to combine multiple joins in a single query is a powerful feature of SQL, e.g:
```SQL
SELECT *
FROM left_table
  INNER JOIN right_table
    ON left_table.id = right_table.id
  INNER JOIN another_table
    ON left_table.id = another_table.id;
```
As you can see here it becomes tedious to continually write long table names in joins. This is when it becomes useful to alias each table using the first letter of its name (e.g. countries AS c)! It is standard practice to alias in this way and, if you choose to alias tables or are asked to specifically for an exercise in this course, you should follow this protocol.

Now, for each country, you want to get the country name, its region, and the fertility rate and unemployment rate for both 2010 and 2015.

## Answer:

```SQL
-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join to populations (as p)
INNER JOIN populations AS p
-- 3. Match on country code
ON c.code = p.country_code
-- 4. Join to economies (as e)
INNER JOIN economies AS e
-- 5. Match on country code and year
ON c.code = e.code 
AND e.year = p.year;
```

## Inner join with using
When joining tables with a common field name, e.g.
```SQL
SELECT *
FROM countries
INNER JOIN economies
ON countries.code = economies.code
```

__You can use USING as a shortcut:__
```SQL
SELECT *
FROM countries
INNER JOIN economies
USING(code)
```
You'll now explore how this can be done with the countries and languages tables.

__Instructions__
  -  Inner join countries on the left and languages on the right with USING(code).
  -  Select the fields corresponding to:
  -  country name AS country,
  -  continent name,
  -  language name AS language, and
  -  whether or not the language is official.
  -  Remember to alias your tables using the first letter of their names.

## Answer:
```SQL
-- 4. Select fields
SELECT  c.name AS country, continent , l.name AS language, l.official
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join to languages (as l)
INNER JOIN languages as l
-- 3. Match using code
USING(code)
```

## Self-join
In this exercise, you'll use the populations table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!

Since you'll be joining the populations table to itself, you can alias populations as p1 and also populations as p2. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.

## Answer:
```SQL
SELECT p1.country_code,
   p1.size AS size2010, 
   p2.size AS size2015,
   -- 1. calculate growth_perc
   ((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- 2. From populations (alias as p1)
FROM populations AS p1
-- 3. Join to itself (alias as p2)
INNER JOIN populations AS p2
-- 4. Match on country code
ON p1.country_code = p2.country_code
-- 5. and year (with calculation)
AND p1.year = p2.year - 5;
```

## Case when and then
Often it's useful to look at a numerical field not as raw data, but instead as being in different categories or groups.

You can use CASE with WHEN, THEN, ELSE, and END to define a new grouping field.

## Answer:
```SQL
SELECT name, continent, code, surface_area,
-- 1. First case
CASE WHEN surface_area > 2000000 THEN 'large'
    -- 2. Second case
    WHEN surface_area > 350000 THEN 'medium'
    -- 3. Else clause + end
    ELSE 'small' END
    -- 4. Alias name
    AS geosize_group
-- 5. From table
FROM countries;
```

---
## Inner challenge
The table you created with the added geosize_group field has been loaded for you here with the name countries_plus. Observe the use of (and the placement of) the INTO command to create this countries_plus table:
```SQL
SELECT name, continent, code, surface_area,
    CASE WHEN surface_area > 2000000
            THEN 'large'
       WHEN surface_area > 350000
            THEN 'medium'
       ELSE 'small' END
       AS geosize_group
INTO countries_plus
FROM countries;
```  
You will now explore the relationship between the size of a country in terms of surface area and in terms of population using grouping fields created with CASE.

By the end of this exercise, you'll be writing two queries back-to-back in a single script. You got this

## Answer::
```SQL
SELECT country_code, size,
  CASE WHEN size > 50000000
            THEN 'large'
       WHEN size > 1000000
            THEN 'medium'
       ELSE 'small' END
       AS popsize_group
INTO pop_plus       
FROM populations
WHERE year = 2015;

-- 5. Select fields
SELECT name, continent, geosize_group, popsize_group
-- 1. From countries_plus (alias as c)
FROM countries_plus AS c
  -- 2. Join to pop_plus (alias as p)
  INNER JOIN pop_plus AS p
    -- 3. Match on country code
    ON c.code = p.country_code
-- 4. Order the table    
ORDER BY geosize_group;
```