# Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth.

In [1]:
#!conda install -yc conda-forge ipython-sql

We'll use the following code to connect our Jupyter Notebook to our database file

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

Write a query to return information on the tables in the database.

In [3]:
%%sql sqlite:///factbook.db
SELECT *
    FROM sqlite_master
    WHERE type = 'table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Visualize the first five rows of the facts table in the database.

In [4]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Let's start by calculating some summary statistics like the MAX(population), MIN(population), MAX(population_growth) and MIN(population_growth) in a single query

In [5]:
%%sql sqlite:///factbook.db
SELECT MAX(population),MIN(population),MAX(population_growth),MIN(population_growth)
    FROM facts;

Done.


MAX(population),MIN(population),MAX(population_growth),MIN(population_growth)
7256490011,0,4.02,0.0


We see a few interesting things in the summary statistics on the previous screen:

- There's a country with a population of 0
- There's a country with a population of 7256490011

Let's use subqueries to zoom in on just these countries wthout using the specific values

Write a query that returns the countries with the minimum population

In [6]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    WHERE population == (SELECT MIN(population)
                                FROM facts
                        );

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


Write a query that returns the countries with the maximum population

In [7]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    WHERE population == (SELECT MAX(population)
                                FROM facts
                        );

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


It seems like the table contains a row for the whole world, which explains this huge number in population column and also contains a row for Antarctica which explains the population of 0.

In [8]:
%%sql sqlite:///factbook.db
SELECT MAX(population),MIN(population),MAX(population_growth),MIN(population_growth)
    FROM facts
    WHERE name != "World";

Done.


MAX(population),MIN(population),MAX(population_growth),MIN(population_growth)
1367485388,0,4.02,0.0


In [9]:
%%sql sqlite:///factbook.db
SELECT AVG(population),AVG(area)
    FROM facts
    WHERE name != "World";

Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


Write a query that finds all countries meeting both of the following criteria:

- The population is above average
- The area is belove average

In [10]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    WHERE population > (SELECT AVG(population)
                       FROM facts
                       WHERE name != 'World'
                       )
    AND area < (SELECT AVG(area)FROM facts
                       WHERE name != 'World'
                       );

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


Which country has the most people? Which country has the most growth rate? 

In [11]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    WHERE population = (SELECT MAX(population)
                       FROM facts
                       WHERE name != 'World'
                       );

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


The country that has the most people is China

In [12]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    WHERE population_growth = (SELECT MAX(population_growth)
                       FROM facts
                       WHERE name != 'World'
                       );

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
162,od,South Sudan,644329,,,12042910,4.02,36.91,8.18,11.47


The country that has the most grow rate is South Sudan

Which countries have the highest ratios of water to land? Which countries have more water than land?

In [13]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    WHERE area_water > area_land;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
228,io,British Indian Ocean Territory,54400,60,54340,,,,,
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67


In [14]:
%%sql sqlite:///factbook.db
SELECT *, round(cast(area_water as float)/cast(area_land as float),4) AS water_land_ratio
    FROM facts
    ORDER BY water_land_ratio DESC
    LIMIT 10;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,water_land_ratio
228,io,British Indian Ocean Territory,54400,60,54340,,,,,,905.6667
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67,4.5202
246,rq,Puerto Rico,13791,8870,4921,3598357.0,0.6,10.86,8.67,8.15,0.5548
12,bf,"Bahamas, The",13880,10010,3870,324597.0,0.85,15.5,7.05,0.0,0.3866
71,pu,Guinea-Bissau,36125,28120,8005,1726170.0,1.91,33.38,14.33,0.0,0.2847
106,mi,Malawi,118484,94080,24404,17964697.0,3.32,41.56,8.41,0.0,0.2594
125,nl,Netherlands,41543,33893,7650,16947904.0,0.41,10.83,8.66,1.95,0.2257
182,ug,Uganda,241038,197100,43938,37101745.0,3.24,43.79,10.69,0.74,0.2229
56,er,Eritrea,117600,101000,16600,6527689.0,2.25,30.0,7.52,0.0,0.1644
99,li,Liberia,111369,96320,15049,4195666.0,2.47,34.41,9.69,0.0,0.1562


Which countries have a higher death rate than birth date?

In [15]:
%%sql sqlite:///factbook.db
SELECT *
    FROM facts
    WHERE death_rate > birth_rate;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56
16,bo,Belarus,207600,202900,4700,9589689,0.2,10.7,13.36,0.7
22,bk,Bosnia and Herzegovina,51197,51187,10,3867055,0.13,8.87,9.75,0.38
26,bu,Bulgaria,110879,108489,2390,7186893,0.58,8.92,14.44,0.29
44,hr,Croatia,56594,55974,620,4464844,0.13,9.45,12.18,1.39
47,ez,Czech Republic,78867,77247,1620,10644842,0.16,9.63,10.34,2.33
57,en,Estonia,45228,42388,2840,1265420,0.55,10.51,12.4,3.6
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
67,gr,Greece,131957,130647,1310,10775643,0.01,8.66,11.09,2.32
75,hu,Hungary,93028,89608,3420,9897541,0.22,9.16,12.73,1.33


Which countries have the highest population/area ratio?

In [16]:
%%sql sqlite:///factbook.db
SELECT *,round(population/area) AS population_area_ratio
    FROM facts
    ORDER BY population_area_ratio DESC
    LIMIT 10;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,population_area_ratio
205,mc,Macau,28,28,0,592731,0.8,8.88,4.22,3.37,21168.0
117,mn,Monaco,2,2,0,30535,0.12,6.65,9.24,3.83,15267.0
156,sn,Singapore,697,687,10,5674472,1.89,8.27,3.43,14.05,8141.0
204,hk,Hong Kong,1108,1073,35,7141106,0.38,9.23,7.07,1.68,6445.0
251,gz,Gaza Strip,360,360,0,1869055,2.81,31.11,3.04,0.0,5191.0
233,gi,Gibraltar,6,6,0,29258,0.24,14.08,8.37,3.28,4876.0
13,ba,Bahrain,760,760,0,1346613,2.41,13.66,2.69,13.09,1771.0
108,mv,Maldives,298,298,0,393253,0.08,15.75,3.89,12.68,1319.0
110,mt,Malta,316,316,0,413965,0.31,10.18,9.09,1.98,1310.0
227,bd,Bermuda,54,54,0,70196,0.5,11.33,8.23,1.88,1299.0
