<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Setup" data-toc-modified-id="Setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Setup</a></span></li></ul></div>

# Chapter 1
Jupyter is awesome. You can run a SQL kernel, and iterate with SQL statements.
## Setup
Install a SQL kernel, available via magic commands.

`conda install -c conda-forge ipykernel-sql`

Load the SQL magic commands:

In [3]:
%load_ext sql

Download the files using below:

In [14]:
from urllib.request import urlretrieve
file = 'countries2.zip'
url_root = 'https://assets.datacamp.com/production/repositories/1069/datasets/578834f5908e3b2fa575429a287586d1eaeb2e54/'
url = url_root + file
urlretrieve(url, file)

import zipfile
with zipfile.ZipFile(file, "r") as zip_ref:
    zip_ref.extractall("./")

The table creation and data import is in a separate SQL file. Run that file to create the db on disk:

`sqlite3 leaders.db < ch1.sql`

Then do some queries to check it works. %sql is a single line, %%sql is multiline. The beauty of the kernel is it uses the SQLAlchemy API, presented with prettytable. So not only does it look good, the results are available to include in code:

Updated for postgres. Start the vagrant machine running the db, build the schema and import the data:
```
cd pg-app-dev-vm && vagrant up
PGUSER=datacamp PGPASSWORD=datacamp psql -h localhost -p 15432 datacamp < ch1-psql.sql
```

In [5]:
#%sql sqlite:///leaders.db
%sql postgres://datacamp:datacamp@localhost:15432/datacamp

'Connected: datacamp@datacamp'

In [6]:
result = %sql select * from cities limit 5

 * postgres://datacamp:***@localhost:15432/datacamp
5 rows affected.


In [7]:
print(result)

+-------------+--------------+-----------------+---------------+---------------+
|     name    | country_code | city_proper_pop | metroarea_pop | urbanarea_pop |
+-------------+--------------+-----------------+---------------+---------------+
|   Abidjan   |     CIV      |     4765000     |      None     |    4765000    |
|  Abu Dhabi  |     ARE      |     1145000     |      None     |    1145000    |
|    Abuja    |     NGA      |     1235880     |    6000000    |    1235880    |
|    Accra    |     GHA      |     2070463     |    4010054    |    2070463    |
| Addis Ababa |     ETH      |     3103673     |    4567857    |    3103673    |
+-------------+--------------+-----------------+---------------+---------------+


In [8]:
%%sql SELECT * 
FROM cities
  -- 1. Inner join to countries
  INNER JOIN countries
    -- 2. Match on the country codes
    ON cities.country_code = countries.code;

 * postgres://datacamp:***@localhost:15432/datacamp
230 rows affected.


name,country_code,city_proper_pop,metroarea_pop,urbanarea_pop,code,name_1,continent,region,surface_area,indep_year,local_name,gov_form,capital,cap_long,cap_lat
Abidjan,CIV,4765000,,4765000,CIV,Cote d'Ivoire,Africa,Western Africa,322463.0,1960,Cote dIvoire,Republic,Yamoussoukro,-4.0305,5.332
Abu Dhabi,ARE,1145000,,1145000,ARE,United Arab Emirates,Asia,Middle East,83600.0,1971,Al-Imarat al-´Arabiya al-Muttahida,Emirate Federation,Abu Dhabi,54.3705,24.4764
Abuja,NGA,1235880,6000000.0,1235880,NGA,Nigeria,Africa,Western Africa,923768.0,1960,Nigeria,Federal Republic,Abuja,7.48906,9.05804
Accra,GHA,2070463,4010054.0,2070463,GHA,Ghana,Africa,Western Africa,238533.0,1957,Ghana,Republic,Accra,-0.20795,5.57045
Addis Ababa,ETH,3103673,4567857.0,3103673,ETH,Ethiopia,Africa,Eastern Africa,1104300.0,-1000,YeItyop´iya,Republic,Addis Ababa,38.7468,9.02274
Ahmedabad,IND,5570585,,5570585,IND,India,Asia,Southern and Central Asia,3287260.0,1947,Bharat/India,Federal Republic,New Delhi,77.225,28.6353
Alexandria,EGY,4616625,,4616625,EGY,Egypt,Africa,Northern Africa,1001450.0,1922,Misr,Republic,Cairo,31.2461,30.0982
Algiers,DZA,3415811,5000000.0,3415811,DZA,Algeria,Africa,Northern Africa,2381740.0,1962,Al-Jazair/Algerie,Republic,Algiers,3.05097,36.7397
Almaty,KAZ,1703481,,1703481,KAZ,Kazakhstan,Asia,Southern and Central Asia,2724900.0,1991,Qazaqstan,Republic,Astana,71.4382,51.1879
Ankara,TUR,5271000,4585000.0,5271000,TUR,Turkey,Asia,Middle East,774815.0,1923,Turkiye,Republic,Ankara,32.3606,39.7153


In [9]:
%%sql -- 1. Select name fields (with alias) and region 
SELECT cities.name as city
FROM cities
  INNER JOIN countries
    ON cities.country_code = countries.code;

 * postgres://datacamp:***@localhost:15432/datacamp
230 rows affected.


city
Abidjan
Abu Dhabi
Abuja
Accra
Addis Ababa
Ahmedabad
Alexandria
Algiers
Almaty
Ankara


In [10]:
%%sql -- 1. Select name fields (with alias) and region 
SELECT
    cities.name as city,
    countries.name as country,
    countries.region as region
FROM cities
  INNER JOIN countries
    ON cities.country_code = countries.code;

 * postgres://datacamp:***@localhost:15432/datacamp
230 rows affected.


city,country,region
Abidjan,Cote d'Ivoire,Western Africa
Abu Dhabi,United Arab Emirates,Middle East
Abuja,Nigeria,Western Africa
Accra,Ghana,Western Africa
Addis Ababa,Ethiopia,Eastern Africa
Ahmedabad,India,Southern and Central Asia
Alexandria,Egypt,Northern Africa
Algiers,Algeria,Northern Africa
Almaty,Kazakhstan,Southern and Central Asia
Ankara,Turkey,Middle East


In [11]:
%%sql -- 3. Select fields with aliases
SELECT c.code AS country_code, c.name, e.year, e.inflation_rate
FROM countries AS c
  -- 1. Join to economies (alias e)
  INNER JOIN economies AS e
    -- 2. Match on code
    ON c.code = e.code;

 * postgres://datacamp:***@localhost:15432/datacamp
370 rows affected.


country_code,name,year,inflation_rate
AFG,Afghanistan,2010,2.179
AFG,Afghanistan,2015,-1.549
AGO,Angola,2010,14.48
AGO,Angola,2015,10.287
ALB,Albania,2010,3.605
ALB,Albania,2015,1.896
ARE,United Arab Emirates,2010,0.878
ARE,United Arab Emirates,2015,4.07
ARG,Argentina,2010,10.461
ARG,Argentina,2015,


In [12]:
%%sql -- 4. Select fields
SELECT c.code, c.name, c.region, p.year, p.fertility_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join with populations (as p)
    INNER JOIN populations as p
 -- 3. Match on country code
    ON c.code = p.country_code

 * postgres://datacamp:***@localhost:15432/datacamp
412 rows affected.


code,name,region,year,fertility_rate
ABW,Aruba,Caribbean,2010,1.704
ABW,Aruba,Caribbean,2015,1.647
AFG,Afghanistan,Southern and Central Asia,2010,5.746
AFG,Afghanistan,Southern and Central Asia,2015,4.653
AGO,Angola,Central Africa,2010,6.416
AGO,Angola,Central Africa,2015,5.996
ALB,Albania,Southern Europe,2010,1.663
ALB,Albania,Southern Europe,2015,1.793
AND,Andorra,Southern Europe,2010,1.27
AND,Andorra,Southern Europe,2015,


In [13]:
%%sql -- 4. Select fields
SELECT c.code, c.name, c.region, e.year, p.fertility_rate, e.unemployment_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join with populations (as p)
    INNER JOIN populations as p
 -- 3. Match on country code
    ON c.code = p.country_code
  -- 4. Join to economies (as e)
    INNER JOIN economies as e
    -- 5. Match on country code
    ON c.code = e.code;

 * postgres://datacamp:***@localhost:15432/datacamp
740 rows affected.


code,name,region,year,fertility_rate,unemployment_rate
AFG,Afghanistan,Southern and Central Asia,2010,4.653,
AFG,Afghanistan,Southern and Central Asia,2010,5.746,
AFG,Afghanistan,Southern and Central Asia,2015,4.653,
AFG,Afghanistan,Southern and Central Asia,2015,5.746,
AGO,Angola,Central Africa,2010,5.996,
AGO,Angola,Central Africa,2010,6.416,
AGO,Angola,Central Africa,2015,5.996,
AGO,Angola,Central Africa,2015,6.416,
ALB,Albania,Southern Europe,2010,1.793,14.0
ALB,Albania,Southern Europe,2010,1.663,14.0


In [14]:
%%sql -- 4. Select fields
SELECT c.code, c.name, c.region, e.year, p.fertility_rate, e.unemployment_rate
  -- 1. From countries (alias as c)
  FROM countries AS c
  -- 2. Join with populations (as p)
    INNER JOIN populations as p
 -- 3. Match on country code
    ON c.code = p.country_code
  -- 4. Join to economies (as e)
    INNER JOIN economies as e
    -- 5. Match on country code
    ON c.code = e.code AND e.year = p.year;

 * postgres://datacamp:***@localhost:15432/datacamp
370 rows affected.


code,name,region,year,fertility_rate,unemployment_rate
AFG,Afghanistan,Southern and Central Asia,2010,5.746,
AFG,Afghanistan,Southern and Central Asia,2015,4.653,
AGO,Angola,Central Africa,2010,6.416,
AGO,Angola,Central Africa,2015,5.996,
ALB,Albania,Southern Europe,2010,1.663,14.0
ALB,Albania,Southern Europe,2015,1.793,17.1
ARE,United Arab Emirates,Middle East,2010,1.868,
ARE,United Arab Emirates,Middle East,2015,1.767,
ARG,Argentina,South America,2010,2.37,7.75
ARG,Argentina,South America,2015,2.308,


In [15]:
%%sql -- 4. Select fields
select
    c.name as country,
    c.continent,
    l.name as language,
    l.official
    -- 1. From countries (alias as c)
    from countries as c
    -- 2. Join to languages (as l)
    inner join languages as l
    -- 3. Match using code
    using (code);

 * postgres://datacamp:***@localhost:15432/datacamp
914 rows affected.


country,continent,language,official
Afghanistan,Asia,Dari,True
Afghanistan,Asia,Pashto,True
Afghanistan,Asia,Turkic,False
Afghanistan,Asia,Other,False
Albania,Europe,Albanian,True
Albania,Europe,Greek,False
Albania,Europe,Other,False
Albania,Europe,unspecified,False
Algeria,Africa,Arabic,True
Algeria,Africa,French,False


In [16]:
%%sql -- 4. Select fields with aliases
select
p1.country_code, p1.size as size2010,
p2.country_code, p2.size as size2015
-- 1. From populations (alias as p1)
from populations as p1
  -- 2. Join to itself (alias as p2)
  inner join populations as p2
    -- 3. Match on country code
    on p1.country_code = p2.country_code

 * postgres://datacamp:***@localhost:15432/datacamp
868 rows affected.


country_code,size2010,country_code_1,size2015
ABW,101597.0,ABW,103889.0
ABW,101597.0,ABW,101597.0
ABW,103889.0,ABW,103889.0
ABW,103889.0,ABW,101597.0
AFG,27962207.0,AFG,32526562.0
AFG,27962207.0,AFG,27962207.0
AFG,32526562.0,AFG,32526562.0
AFG,32526562.0,AFG,27962207.0
AGO,21219954.0,AGO,25021974.0
AGO,21219954.0,AGO,21219954.0


In [17]:
%%sql -- 5. Select fields with aliases
SELECT p1.country_code,
       p1.size AS size2010,
       p2.size AS size2015
-- 1. From populations (alias as p1)
FROM populations as p1
  -- 2. Join to itself (alias as p2)
  inner JOIN populations as p2
    -- 3. Match on country code
    ON p1.country_code = p2.country_code
        -- 4. and year (with calculation)
        and p1.year = p2.year - 5

 * postgres://datacamp:***@localhost:15432/datacamp
217 rows affected.


country_code,size2010,size2015
ABW,101597,103889.0
AFG,27962207,32526562.0
AGO,21219954,25021974.0
ALB,2913021,2889167.0
AND,84419,70473.0
ARE,8329453,9156963.0
ARG,41222875,43416755.0
ARM,2963496,3017712.0
ASM,55636,55538.0
ATG,87233,91818.0


In [18]:
%%sql SELECT p1.country_code,
       p1.size AS size2010, 
       p2.size AS size2015,
       -- 1. calculate growth_perc (sqlite 1.0 forces float from int)
       ((1.0*p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- 2. From populations (alias as p1)
FROM populations AS p1
  -- 3. Join to itself (alias as p2)
  INNER JOIN populations AS p2
    -- 4. Match on country code
    ON p1.country_code = p2.country_code
        -- 5. and year (with calculation)
        AND p1.year = p2.year - 5;

 * postgres://datacamp:***@localhost:15432/datacamp
217 rows affected.


country_code,size2010,size2015,growth_perc
ABW,101597,103889.0,2.255972125161176
AFG,27962207,32526562.0,16.3233002316305
AGO,21219954,25021974.0,17.917192468937493
ALB,2913021,2889167.0,-0.8188749754979452
AND,84419,70473.0,-16.519977730131842
ARE,8329453,9156963.0,9.934746015134488
ARG,41222875,43416755.0,5.321996585633583
ARM,2963496,3017712.0,1.8294608799876904
ASM,55636,55538.0,-0.1761449421238047
ATG,87233,91818.0,5.256038425825089


In [19]:
%%sql SELECT name, continent, code, surface_area,
    -- 1. First case
    CASE WHEN surface_area > 2000000 THEN 'large'
        -- 2. Second case
        WHEN surface_area > 350000 THEN 'medium'
        -- 3. Else clause + end
        ELSE 'small' END
        -- 4. Alias name
        AS geosize_group
-- 5. From table
FROM countries;

 * postgres://datacamp:***@localhost:15432/datacamp
206 rows affected.


name,continent,code,surface_area,geosize_group
Afghanistan,Asia,AFG,652090.0,medium
Netherlands,Europe,NLD,41526.0,small
Albania,Europe,ALB,28748.0,small
Algeria,Africa,DZA,2381740.0,large
American Samoa,Oceania,ASM,199.0,small
Andorra,Europe,AND,468.0,small
Angola,Africa,AGO,1246700.0,medium
Antigua and Barbuda,North America,ATG,442.0,small
United Arab Emirates,Asia,ARE,83600.0,small
Argentina,South America,ARG,2780400.0,large


In [20]:
%%sql INSERT INTO countries_plus
SELECT name, continent, code, surface_area,
    CASE WHEN surface_area > 2000000
            THEN 'large'
       WHEN surface_area > 350000
            THEN 'medium'
       ELSE 'small' END
       AS geosize_group
FROM countries;

 * postgres://datacamp:***@localhost:15432/datacamp
206 rows affected.


[]

In [21]:
%%sql SELECT country_code, size,
    -- 1. First case
    CASE WHEN size > 50000000 THEN 'large'
        -- 2. Second case
        WHEN size > 1000000 THEN 'medium'
        -- 3. Else clause + end
        ELSE 'small' END
        -- 4. Alias name
        AS popsize_group
-- 5. From table
FROM populations
-- 6. Focus on 2015
WHERE year = 2015;

 * postgres://datacamp:***@localhost:15432/datacamp
217 rows affected.


country_code,size,popsize_group
ABW,103889.0,small
AFG,32526562.0,medium
AGO,25021974.0,medium
ALB,2889167.0,medium
AND,70473.0,small
ARE,9156963.0,medium
ARG,43416755.0,medium
ARM,3017712.0,medium
ASM,55538.0,small
ATG,91818.0,small


In [22]:
%%sql INSERT INTO pop_plus
SELECT country_code, size,
    CASE WHEN size > 50000000 THEN 'large'
        WHEN size > 1000000 THEN 'medium'
        ELSE 'small' END
        AS popsize_group
-- 1. Into table
FROM populations
WHERE year = 2015;

-- 2. Select all columns of pop_plus
select * from pop_plus

 * postgres://datacamp:***@localhost:15432/datacamp
217 rows affected.
217 rows affected.


country_code,size,popsize_group
ABW,103889.0,small
AFG,32526562.0,medium
AGO,25021974.0,medium
ALB,2889167.0,medium
AND,70473.0,small
ARE,9156963.0,medium
ARG,43416755.0,medium
ARM,3017712.0,medium
ASM,55538.0,small
ATG,91818.0,small


In [23]:
%%sql
DELETE FROM pop_plus;
INSERT INTO pop_plus
SELECT country_code, size,
    CASE WHEN size > 50000000 THEN 'large'
        WHEN size > 1000000 THEN 'medium'
        ELSE 'small' END
        AS popsize_group
-- 1. Into table
FROM populations
WHERE year = 2015;

-- 5. Select fields
SELECT name, continent, geosize_group, popsize_group
-- 1. From countries_plus (alias as c)
from countries_plus as c
  -- 2. Join to pop_plus (alias as p)
  inner join pop_plus as p
    -- 3. Match on country code
    on c.code = p.country_code
-- 4. Order the table    
order by geosize_group, popsize_group;

 * postgres://datacamp:***@localhost:15432/datacamp
217 rows affected.
217 rows affected.
206 rows affected.


name,continent,geosize_group,popsize_group
India,Asia,large,large
"Congo, The Democratic Republic of the",Africa,large,large
United States,North America,large,large
Russian Federation,Europe,large,large
Brazil,South America,large,large
China,Asia,large,large
Saudi Arabia,Asia,large,medium
Kazakhstan,Asia,large,medium
Argentina,South America,large,medium
Canada,North America,large,medium
