# Analyzing CIA Factbook Data Using SQL

In this project, we will do a quick analysis of the CIA Factbook Data using SQL. The SQLite database contains information about all the countries in the world for the year 2015, including the ```code```, ```name```, ```area```, ```area_land```, ```area_water```, ```population```, ```population_growth```, ```birth_rate```, ```death_rate``` and ```migration_rate```. More information about the data can be found [here](https://www.cia.gov/library/publications/resources/the-world-factbook/).

-------

## Connecting to the database

We connect to the database.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

We do a quick check of the tables in the database.

In [2]:
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


## Exploring the database

We query the first five rows of the table to understand the data.

In [3]:
%%sql
SELECT *
FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Analyzing the population and the population growth

We query the minimum and maximum values for population and population growth, but we see there are countries with zero population and others with over 7.2 billion.

In [4]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts;

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


We check which country has zero population and we see that is Antarctica, so we can assume it is correct, because only researchers live there and they are not actual citizens.

In [5]:
%%sql
SELECT name, population
FROM facts
WHERE population = (SELECT MIN(population) FROM facts);

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0


We check which country has over 7.2 billion population and we see there is a row for the 'World', so we already know we can leave that row out for country-specific analysis.  

In [6]:
%%sql
SELECT name, population
FROM facts
WHERE population = (SELECT MAX(population) FROM facts);

 * sqlite:///factbook.db
Done.


name,population
World,7256490011


We now check which country is the one with the biggest population, without taking into account 'World', and we can see that it is China. 

In [8]:
%%sql
SELECT name, population
FROM facts
WHERE population = (SELECT MAX(population) FROM facts WHERE name != 'World')

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


According to the description, the 'population_growth' is a calculation between the 'birth_rate', 'death_rate' and 'migration_rate.

We query which country is the one with the maximum population growth that we have seen before. 

In [21]:
%%sql
SELECT name, population, population_growth, birth_rate, death_rate, migration_rate
FROM facts
WHERE population_growth = (SELECT MAX(population_growth) FROM facts);

 * sqlite:///factbook.db
Done.


name,population,population_growth,birth_rate,death_rate,migration_rate
South Sudan,12042910,4.02,36.91,8.18,11.47


We do the same to find out which countries have the minimum 'population_growth', but we can see that some data is missing, so we cannot be sure about the results.

In [22]:
%%sql
SELECT name, population, population_growth, birth_rate, death_rate, migration_rate
FROM facts
WHERE population_growth = (SELECT MIN(population_growth) FROM facts);

 * sqlite:///factbook.db
Done.


name,population,population_growth,birth_rate,death_rate,migration_rate
Holy See (Vatican City),842,0.0,,,
Cocos (Keeling) Islands,596,0.0,,,
Greenland,57733,0.0,14.48,8.49,5.98
Pitcairn Islands,48,0.0,,,


### Analyzing the population and the land area

We now query what is the average population and land area, which we can see that it is around 62 million and 522,000 respectively. 

In [24]:
%%sql
SELECT ROUND(AVG(population)) AS average_population, ROUND(AVG(area_land)) AS average_land_area
FROM facts;

 * sqlite:///factbook.db
Done.


average_population,average_land_area
62094928.0,522703.0


We can now do a quick analysis of which countries might be over-populated, by having an above average population and below average area. At the same time, we calculate the population density for each of these countries. We do a query to get this information.

In [28]:
%%sql
SELECT name, population, area_land, ROUND(population / CAST(area_land AS FLOAT), 2) AS population_density
FROM facts
WHERE population > (SELECT AVG(population) FROM facts) AND area_land < (SELECT AVG(area_land) FROM facts)
ORDER BY population_density DESC;

 * sqlite:///factbook.db
Done.


name,population,area_land,population_density
Bangladesh,168957745,130170,1297.98
Japan,126919659,364485,348.22
Philippines,100998376,298170,338.73
Vietnam,94348835,310070,304.28
United Kingdom,64088222,241930,264.9
Germany,80854408,348672,231.89
Thailand,67976405,510890,133.05


### Analyzing land area and water area

We want to know what countries have a larger area of water than area of land. We do a query to get this information and also calculate what percentage of the total area is water. 

In [37]:
%%sql
SELECT name, area_land, area_water, ROUND(CAST(area_water AS FLOAT) / area, 3) AS water_percentage
FROM facts
WHERE area_water > area_land;

 * sqlite:///factbook.db
Done.


name,area_land,area_water,water_percentage
British Indian Ocean Territory,60,54340,0.999
Virgin Islands,346,1564,0.819


### Analyzing birth rate and death rate

We now want to know what countries have a higher death rate than birth rate. We create a query to bring this information and calculate the difference between these two rates. We order the data from higher difference to lower and also take only into account the countries whose difference is higher than 1.0. We can see that most countries are from Europe. 

In [45]:
%%sql
SELECT name, death_rate, birth_rate, ROUND(death_rate - birth_rate, 2) AS difference
FROM facts
WHERE death_rate > birth_rate AND difference > 1.0
ORDER BY difference DESC;

 * sqlite:///factbook.db
Done.


name,death_rate,birth_rate,difference
Bulgaria,14.44,8.92,5.52
Serbia,13.66,9.08,4.58
Latvia,14.31,10.0,4.31
Lithuania,14.27,10.1,4.17
Ukraine,14.46,10.72,3.74
Hungary,12.73,9.16,3.57
Germany,11.42,8.47,2.95
Slovenia,11.37,8.42,2.95
Romania,11.9,9.14,2.76
Croatia,12.18,9.45,2.73


### Analyzing the migration rate

We create a query to find the 10 countries with the highest migration rate.  

In [51]:
%%sql
SELECT name, migration_rate
FROM facts
ORDER BY migration_rate DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,migration_rate
Qatar,22.39
American Samoa,21.13
"Micronesia, Federated States of",20.93
Syria,19.79
Tonga,17.84
British Virgin Islands,17.28
Luxembourg,17.16
Cayman Islands,14.4
Singapore,14.05
Nauru,13.63


Now we create a query to see which countries have a higher migration rate than birth rate. 

In [53]:
%%sql
SELECT name, migration_rate, birth_rate
FROM facts
WHERE migration_rate > birth_rate
ORDER BY migration_rate DESC;

 * sqlite:///factbook.db
Done.


name,migration_rate,birth_rate
Qatar,22.39,9.84
"Micronesia, Federated States of",20.93,20.54
British Virgin Islands,17.28,10.91
Luxembourg,17.16,11.37
Cayman Islands,14.4,12.11
Singapore,14.05,8.27
Saint Pierre and Miquelon,8.49,7.42
