In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

Having loaded the CIA Factbook database, we can take a quick look at its construction

In [3]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


So this database contains just the one table - facts.

We will now take a look at the first 5 rows of that table, to see how it is formatted.

In [6]:
%%sql
SELECT *
    FROM facts
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions for some of the columns:

* name — the name of the country.
* area — the country's total area (both land and water).
* area_land — the country's land area in square kilometers.
* area_water — the country's waterarea in square kilometers.
* population — the country's population.
* population_growth — the country's population growth as a percentage.
* birth_rate — the country's birth rate, or the number of births per year per 1,000 people.
* death_rate — the country's death rate, or the number of death per year per 1,000 people.

In [7]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts;

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


It looks like there is one country that has zero population and another with more than 7.2 billion.  Let's use a query to identify these outliers.

We shall first of all identify those countries with the minimum population - zero.

In [9]:
%%sql
SELECT name, population
    FROM facts
    WHERE population = (SELECT MIN(population) FROM facts);

Done.


name,population
Antarctica,0


So we can see that the only country with zero population, in this table, is Antarctica. This is likely to be correct. 

So now let's look at the country with the largest population.

In [10]:
%%sql
SELECT name, population
    FROM facts
    WHERE population = (SELECT MAX(population) FROM facts);

Done.


name,population
World,7256490011


We see that the country with over 7.2 billions is actually a summary line in the table for the whole world.  We would need to exclude this enrty when it comes to looking at our summary statistics, otherwise it would give erroneous results.

In [11]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
    FROM facts
    WHERE name != 'World';

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


In [16]:
%%sql
SELECT AVG(population), AVG(area)
    FROM facts
    WHERE name != 'World';

Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


Here we identify those countries that are most dense, having greater than average population and smaller than average areas.

In [19]:
%%sql
SELECT *
    FROM facts
    WHERE population > (SELECT AVG(population) FROM facts)
    AND area < (SELECT AVG(area) FROM facts)
    ORDER BY area DESC;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
173,th,Thailand,513120,510890,2230,67976405,0.34,11.19,7.8,0.0
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
192,vm,Vietnam,331210,310070,21140,94348835,0.97,15.96,5.93,0.3
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
185,uk,United Kingdom,243610,241930,1680,64088222,0.54,12.17,9.35,2.54
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46


In [30]:
%%sql
SELECT name, MAX(population)
    FROM facts
    WHERE name != 'World';

Done.


name,MAX(population)
China,1367485388


In [31]:
%%sql
SELECT name, MAX(population_growth)
    FROM facts
    WHERE name != 'World';

Done.


name,MAX(population_growth)
South Sudan,4.02


In [34]:
%%sql
SELECT name, MAX(area_water / area_land) AS 'Highest Water to Land Ratio'
    FROM facts
    WHERE name != 'World';

Done.


name,Highest Water to Land Ratio
British Indian Ocean Territory,905


In [38]:
%%sql
SELECT name AS 'Countries with more water than land', area_water, area_land
    FROM facts
    WHERE area_water - area_land > 0;

Done.


Countries with more water than land,area_water,area_land
British Indian Ocean Territory,54340,60
Virgin Islands,1564,346


In [44]:
%%sql
SELECT name AS 'Countries with greatest annual population growth', population, population_growth, (population * population_growth / 100) AS 'Growth'
    FROM facts
    ORDER BY Growth DESC
    limit 5;

Done.


Countries with greatest annual population growth,population,population_growth,Growth
World,7256490011,1.08,78370092.1188
India,1251695584,1.22,15270686.1248
China,1367485388,0.45,6153684.246
Nigeria,181562056,2.45,4448270.372
Pakistan,199085847,1.46,2906653.3662


In [52]:
%%sql
SELECT name AS 'Countries with higher deaths than births', birth_rate, death_rate, ROUND((birth_rate - death_rate), 2) AS 'Excess'
    FROM facts
    WHERE death_rate - birth_rate > 0
    ORDER BY Excess
    LIMIT 6;

Done.


Countries with higher deaths than births,birth_rate,death_rate,Excess
Bulgaria,8.92,14.44,-5.52
Serbia,9.08,13.66,-4.58
Latvia,10.0,14.31,-4.31
Lithuania,10.1,14.27,-4.17
Ukraine,10.72,14.46,-3.74
Hungary,9.16,12.73,-3.57


In [57]:
%%sql
SELECT name AS 'Populations with highest pop/land ratios', population/area_land AS 'Ratio'
    FROM facts
    ORDER BY Ratio DESC
    LIMIT 10;

Done.


Populations with highest pop/land ratios,Ratio
Macau,21168
Monaco,15267
Singapore,8259
Hong Kong,6655
Gaza Strip,5191
Gibraltar,4876
Bahrain,1771
Maldives,1319
Malta,1310
Bermuda,1299
