# Analysis of CIA World Factbook

We will use a data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/) that contains various statistics about all the countries on Earth.

## 1.0 Setup & Exploration

In [17]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

In [18]:
%%sql
/* cheking db */

SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [19]:
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display

# Create the connection
con = sqlite3.connect(r'factbook.db')

#create the dataframe from a query
facts = pd.read_sql_query("SELECT * FROM facts", con)

display(facts.shape)
facts.head(10)


(261, 11)

Unnamed: 0,id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
0,1,af,Afghanistan,652230.0,652230.0,0.0,32564342.0,2.32,38.57,13.89,1.51
1,2,al,Albania,28748.0,27398.0,1350.0,3029278.0,0.3,12.92,6.58,3.3
2,3,ag,Algeria,2381741.0,2381741.0,0.0,39542166.0,1.84,23.67,4.31,0.92
3,4,an,Andorra,468.0,468.0,0.0,85580.0,0.12,8.13,6.96,0.0
4,5,ao,Angola,1246700.0,1246700.0,0.0,19625353.0,2.78,38.78,11.49,0.46
5,6,ac,Antigua and Barbuda,442.0,442.0,0.0,92436.0,1.24,15.85,5.69,2.21
6,7,ar,Argentina,2780400.0,2736690.0,43710.0,43431886.0,0.93,16.64,7.33,0.0
7,8,am,Armenia,29743.0,28203.0,1540.0,3056382.0,0.15,13.61,9.34,5.8
8,9,as,Australia,7741220.0,7682300.0,58920.0,22751014.0,1.07,12.15,7.14,5.65
9,10,au,Austria,83871.0,82445.0,1426.0,8665550.0,0.55,9.41,9.42,5.56


Explanation of the columns:
- `name` -- Name of the country
- `area` -- Total area of the country (land and water) in km^2
- `area_land` -- Land area in km^2
- `area_water` -- Water area in km^2
- `population` -- The country's population
- `population_growth` -- Population growth as a percentage
- `birth_rate` -- Number of births per year per 1000 people
- `death_rate` -- Number of deaths per year per 1000 people
- `migration_rate` -- Difference between people entering and leaving the country per year per 1000 people

In [20]:
%%sql
SELECT MIN(population),
       MAX(population), MIN(population_growth), MAX(population_growth)
  FROM facts; 

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


According to the query above, there are countries with `0` population and population growth. There is also a country with a population over `7.2`bn. Let's investigate these values.

In [21]:
%%sql
SELECT *
  FROM facts
WHERE population = (SELECT MIN(population)
                      FROM facts
                    );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


In [22]:
%%sql
SELECT *
  FROM facts
WHERE population = (SELECT MAX(population)
                      FROM facts
                    );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


The 'country' with no population is Antarctica, a polar region with no countries or permanent habitation outside of research stations.
The db also contains an entry for `World` which is the 'country' with a population of `7.2`bn.

Because of the `World` entry we will need to re-calculate the statistics we did before with this entry excluded.

In [23]:
%%sql
SELECT MIN(population),
       MAX(population), MIN(population_growth), MAX(population_growth)
  FROM facts
 WHERE name != 'World'

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


In [24]:
%%sql
SELECT *
  FROM facts
WHERE population = (SELECT MAX(population)
                      FROM facts
                     WHERE name != 'World'
                    );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


With `World` removed the highest population is `1.3`bn, which corresponds to China.

In [25]:
%%sql
SELECT ROUND(AVG(population) / 1000000, 2) AS 'Avg. Population (in MM)',
       ROUND(AVG(area), 2) AS 'Avg. Area (in km^2)'
  FROM facts;

 * sqlite:///factbook.db
Done.


Avg. Population (in MM),Avg. Area (in km^2)
62.09,555093.55


We'll run a query to find out what countries have *populations* above the average and also have *total areas* below the average.

In [26]:
%%sql
SELECT name, area, area_land, area_water,
       ROUND(population / 1000000, 2) AS 'population (in MM)'
  FROM facts
 WHERE population > (SELECT AVG(population)
                       FROM facts                
                    ) 
         AND area < (SELECT AVG(area)
                       FROM facts
                    )
 ORDER BY population DESC, name DESC


 * sqlite:///factbook.db
Done.


name,area,area_land,area_water,population (in MM)
Bangladesh,148460,130170,18290,168.0
Japan,377915,364485,13430,126.0
Philippines,300000,298170,1830,100.0
Vietnam,331210,310070,21140,94.0
Germany,357022,348672,8350,80.0
Thailand,513120,510890,2230,67.0
United Kingdom,243610,241930,1680,64.0


In [27]:
%%sql
/* finding population/area ratio */
SELECT name, area, population,
       (population / area) AS 'population_area'
  FROM facts
 ORDER BY (population / area) DESC, name DESC
 LIMIT 15;


 * sqlite:///factbook.db
Done.


name,area,population,population_area
Macau,28,592731,21168
Monaco,2,30535,15267
Singapore,697,5674472,8141
Hong Kong,1108,7141106,6445
Gaza Strip,360,1869055,5191
Gibraltar,6,29258,4876
Bahrain,760,1346613,1771
Maldives,298,393253,1319
Malta,316,413965,1310
Bermuda,54,70196,1299


The first query returns a list of countries with large populations and small land areas. The second query is cluttered by micro-states that have a tiny area, making their population-to-land ratio very high even with small populations.

Earlier, we found that China is the country with the highest population. Now we will check which country has the highest growth rate.

In [28]:
%%sql
SELECT *
  FROM facts
 WHERE population_growth = (SELECT MAX(population_growth)
                              FROM facts
                           );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
162,od,South Sudan,644329,,,12042910,4.02,36.91,8.18,11.47


Next, we will find the countries with high land:water ratios and what countries have more water than land.

In [29]:
%%sql
/* countries with high land:water ratio*/
SELECT name, area, area_land, area_water, ROUND(area_land / area_water, 2) AS 'land_water_ratio'
  FROM facts
ORDER BY land_water_ratio DESC
LIMIT 15


 * sqlite:///factbook.db
Done.


name,area,area_land,area_water,land_water_ratio
Bosnia and Herzegovina,51197.0,51187,10,5118.0
Niger,,1266700,300,4222.0
Morocco,446550.0,446300,250,1785.0
Guinea,245857.0,245717,140,1755.0
Costa Rica,51100.0,51060,40,1276.0
Djibouti,23200.0,23180,20,1159.0
"Korea, North",120538.0,120408,130,926.0
Cyprus,9251.0,9241,10,924.0
Namibia,824292.0,823290,1002,821.0
Burkina Faso,274200.0,273800,400,684.0


In [30]:
%%sql
/* countries with more water than land*/
SELECT *
  FROM facts
 WHERE area_water > area_land

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
228,io,British Indian Ocean Territory,54400,60,54340,,,,,
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67


Let's find the countries with the highest birth rates and see if these countries are also the countries with the highest population growth.

In [31]:
%%sql
/* countries with the highest birth rates */
SELECT name, population_growth, birth_rate, death_rate, migration_rate
  FROM facts
 ORDER BY birth_rate DESC, population_growth DESC
 LIMIT 15

 * sqlite:///factbook.db
Done.


name,population_growth,birth_rate,death_rate,migration_rate
Niger,3.25,45.45,12.42,0.56
Mali,2.98,44.99,12.89,2.26
Uganda,3.24,43.79,10.69,0.74
Zambia,2.88,42.13,12.67,0.68
Burkina Faso,3.03,42.03,11.72,0.0
Burundi,3.28,42.01,9.27,0.0
Malawi,3.32,41.56,8.41,0.0
Somalia,1.83,40.45,13.62,8.49
Angola,2.78,38.78,11.49,0.46
Mozambique,2.45,38.58,12.1,1.98


Generally, the countries with high birth rates are the ones with high population growth. This ranking is altered by the death rate and, in particular, the migration rate of the country.
Let's check the countries with the highest migration rate.

In [32]:
%%sql
/* countries with highest migration_rate */
SELECT name, population_growth, birth_rate, death_rate, migration_rate
  FROM facts
 ORDER BY migration_rate DESC, name DESC
 LIMIT 15

 * sqlite:///factbook.db
Done.


name,population_growth,birth_rate,death_rate,migration_rate
Qatar,3.07,9.84,1.53,22.39
American Samoa,0.3,22.89,4.75,21.13
"Micronesia, Federated States of",0.46,20.54,4.23,20.93
Syria,0.16,22.17,4.0,19.79
Tonga,0.03,23.0,4.85,17.84
British Virgin Islands,2.32,10.91,4.99,17.28
Luxembourg,2.13,11.37,7.24,17.16
Cayman Islands,2.1,12.11,5.53,14.4
Singapore,1.89,8.27,3.43,14.05
Nauru,0.55,24.95,5.87,13.63


From the table we can see the countries with high migration rates. Many, but not nearly all, also have high population growth. This may indicate that migration is taking place because of a lack of opportunity in those countries that's due to overcrowding.