# Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like:

- population - The population as of 2015.
- population_growth - The annual population growth rate, as a percentage.
- area - The total land and water area.

We'll use the following code to connect our Jupyter Notebook to our database file:

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

Let's query the database to see the tables we can work with.

In [3]:
%%sql
SELECT * FROM sqlite_master WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Let's query the first 5 rows of the facts table to take a closer look at the data.

In [4]:
%%sql
SELECT * FROM facts LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Let's start by calculating some summary statistics and look for any outlier countries.

In [5]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts;

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


In [12]:
%%sql
SELECT name, population 
FROM facts 
WHERE population = (SELECT MIN(population)
                    FROM facts)

Done.


name,population
Antarctica,0


In [13]:
%%sql
SELECT name, population 
FROM facts 
WHERE population = (SELECT MAX(population)
                    FROM facts)

Done.


name,population
World,7256490011


It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook [page for Antarctica](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html):

![](https://s3.amazonaws.com/dq-content/257/fb_antarctica.png)

Next, let's calculate the average population and average area sizes.

In [11]:
%%sql
SELECT AVG(population) avg_population, AVG(area) avg_area
FROM facts;

Done.


avg_population,avg_area
62094928.32231405,555093.546184739


Next, let's see if we can find all countries meeting both of the following conditions:

- The population is above average.
- The area is below average.

In [15]:
%%sql
SELECT name, population, area
FROM facts
WHERE population > (SELECT AVG(population) FROM facts)
AND area < (SELECT AVG(area) FROM facts);

Done.


name,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Japan,126919659,377915
Philippines,100998376,300000
Thailand,67976405,513120
United Kingdom,64088222,243610
Vietnam,94348835,331210


Let's find out which countries have the highest ratios of water to land.

In [18]:
%%sql
SELECT name, area_land, area_water, CAST(area_water as float) / CAST(area_land as float) water_land_ratio
FROM facts
ORDER BY water_land_ratio DESC
LIMIT 5;

Done.


name,area_land,area_water,water_land_ratio
British Indian Ocean Territory,60,54340,905.6666666666666
Virgin Islands,346,1564,4.520231213872832
Puerto Rico,8870,4921,0.5547914317925592
"Bahamas, The",10010,3870,0.3866133866133866
Guinea-Bissau,28120,8005,0.2846728307254623


Now let's find out which countries have more water than land.

In [19]:
%%sql
SELECT name, area_land, area_water
FROM facts
WHERE area_water > area_land;

Done.


name,area_land,area_water
British Indian Ocean Territory,60,54340
Virgin Islands,346,1564


How about which countries will add the most people to their population next year.

In [21]:
%%sql
SELECT name, population * population_growth people_added
FROM facts
ORDER BY people_added DESC;

Done.


name,people_added
World,7837009211.88
India,1527068612.48
China,615368424.6
Nigeria,444827037.2000001
Pakistan,290665336.62
Ethiopia,287456216.91
Bangladesh,270332392.0
United States,250667713.92
Indonesia,235514180.08
"Congo, Democratic Republic of the",194469083.2


Lastly, which countries have a higher death rate than birth rate.

In [22]:
%%sql
SELECT name, death_rate, birth_rate
FROM facts
WHERE death_rate > birth_rate;

Done.


name,death_rate,birth_rate
Austria,9.42,9.41
Belarus,13.36,10.7
Bosnia and Herzegovina,9.75,8.87
Bulgaria,14.44,8.92
Croatia,12.18,9.45
Czech Republic,10.34,9.63
Estonia,12.4,10.51
Germany,11.42,8.47
Greece,11.09,8.66
Hungary,12.73,9.16
