# Analyzing CIA Factbook Data Using SQL
In this project we'll be working with data from [The CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/), providing statistics on all countries on earth. The SQLite `factbook.db` database, that we'll be using, can be downloaded [here](https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db).

In this project we will use SQL to explore and analyze data from the forementioned database. Let's begin by connecting our notebook to the database file:

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

To see information on the tables available we do the following:

In [2]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Here we have information on what tables are available, what columns are present and what properties they have.

The main table seems to be `facts` so let's turn our attention to that table and begin by displaying a few rows:

In [3]:
%%sql
SELECT *
    FROM facts
    LIMIT 10;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46
6,ac,Antigua and Barbuda,442,442,0,92436,1.24,15.85,5.69,2.21
7,ar,Argentina,2780400,2736690,43710,43431886,0.93,16.64,7.33,0.0
8,am,Armenia,29743,28203,1540,3056382,0.15,13.61,9.34,5.8
9,as,Australia,7741220,7682300,58920,22751014,1.07,12.15,7.14,5.65
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56


Here are the descriptions for some of the columns:
- `name` - The name of the country.
- `area` - The country's total area (both land and water).
- `area_land` - The country's land area in square kilometers.
- `area_water` - The country's waterarea in square kilometers.
- `population` - The country's population.
- `population_growth` - The country's population growth as a percentage.
- `birth_rate` - The country's birth rate, or the number of births a year per 1,000 people.
- `death_rate` - The country's death rate, or the number of death a year per 1,000 people.

## Summary Statistics
Let's begin processing the data by calculating some initial summary statistics:
- Minimum population
- Maximum population
- Minimum population growth
- Maximum population growth

In [4]:
%%sql
SELECT MIN(population) 'Min pop.', MAX(population) 'Max pop.',
MIN(population_growth) 'Min pop. gr.', MAX(population_growth) 'Max pop. gr.'
    FROM facts;

 * sqlite:///factbook.db
Done.


Min pop.,Max pop.,Min pop. gr.,Max pop. gr.
0,7256490011,0.0,4.02


A few noteworthy observations:
- The minimum population is 0. A country without inhabitants is an unusual concept and should be looked into.
- The maximum population is 7256490011 which is just slightly below the current actual population of the entire earth. This could be an error or an actual aggregation of all countries populations and should be looked into.

## Exploring outliers

In [5]:
%%sql
SELECT *
    FROM facts
    WHERE population=(SELECT MIN(population)
                         FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


Antarctica is the country with 0 population. After reading about this region it's actually true that it has no permanent citizens, only scientist visiting on missions. This seem to match the the CIA Factbook [page for Antarctica](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html).

In [6]:
%%sql
SELECT *
    FROM facts
    WHERE population=(SELECT MAX(population)
                         FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


As suspected, we have an entry representing the entire world.

## Summary Statistics Revisited
As we determined that there is an entry for the entire world present, this should be excluded from our summary statistics:

In [7]:
%%sql
SELECT MIN(population) 'Min pop.', MAX(population) 'Max pop.',
MIN(population_growth) 'Min pop. gr.', MAX(population_growth) 'Max pop. gr.'
    FROM facts
    WHERE name <> 'World';

 * sqlite:///factbook.db
Done.


Min pop.,Max pop.,Min pop. gr.,Max pop. gr.
0,1367485388,0.0,4.02


This seems more reasonable. The maximum population is likely represented på China.

Let's continue by looking at the averages for population and area:

In [11]:
%%sql
SELECT CAST(AVG(population) AS int) 'Average population', CAST(AVG(area) AS int) 'Average area'
    FROM facts;

 * sqlite:///factbook.db
Done.


Average population,Average area
62094928,555093


As we see, the average population is around 6.21 million and the average area is around 0.56 million m^2.

## Dense populations
Let's extend our previous result and explore what countries are densely populated by listing those who have an above average population relative to area.

In [29]:
%%sql
SELECT name, population/area as pop_density
    FROM facts
    WHERE pop_density > (SELECT SUM(population)/SUM(area) FROM facts)
    ORDER BY pop_density DESC
    LIMIT 25;

 * sqlite:///factbook.db
Done.


name,pop_density
Macau,21168
Monaco,15267
Singapore,8141
Hong Kong,6445
Gaza Strip,5191
Gibraltar,4876
Bahrain,1771
Maldives,1319
Malta,1310
Bermuda,1299


Several of the above are small, city states which as a consequence of essentially the entire nation being made out of a city has a high population density.

## Growth rates explored
Let's continue exploring what countries have the lowest growth rate:

In [36]:
%%sql
SELECT name, population_growth
    FROM facts
    ORDER BY population_growth DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24
Qatar,3.07
Burkina Faso,3.03
Mali,2.98
Cook Islands,2.95
Iraq,2.93


As we can see, there seem to be a lot of missing values, primarily for small island nations. Also, recall that Antarctica has 0 population which means there is no legitimate relative population growth.

Let's redo our calculations, excluding the missing values:

In [37]:
%%sql
SELECT name, population_growth
    FROM facts
    WHERE population_growth IS NOT NULL
    ORDER BY population_growth
    LIMIT 15;

 * sqlite:///factbook.db
Done.


name,population_growth
Holy See (Vatican City),0.0
Cocos (Keeling) Islands,0.0
Greenland,0.0
Pitcairn Islands,0.0
Greece,0.01
Norfolk Island,0.01
Tokelau,0.01
Falkland Islands (Islas Malvinas),0.01
Guyana,0.02
Slovakia,0.02


This seems like a much more reasonable result. The nations with the lowest growth rate tend to either be of a "special" nature that prohibits growth, such as the vatican city, or remote and/or cold places.

Worth noting is that no countries have a negative growth rate. This could be the case in countries afflicted by war or extreme poverty and starvation. However, such issues may make it hard to collect accurate data, maybe leaving such countries with a missing value or with inaccurate data.

Let's also look at the countries with the lowest growth rate:

In [38]:
%%sql
SELECT name, population_growth
    FROM facts
    WHERE population_growth IS NOT NULL
    ORDER BY population_growth DESC
    LIMIT 15;

 * sqlite:///factbook.db
Done.


name,population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24
Qatar,3.07
Burkina Faso,3.03
Mali,2.98
Cook Islands,2.95
Iraq,2.93


As is expected, the nations with the highest growth rate tend to be developing nations.

Worth noting is that no really prosperous countries are on either list.

Many additional questions can be explored but we'll settle with this as of now.