# Analyzing CIA Factbook Using SQL

# Introduction

On this project we will look into [CIA World Factbook data](https://www.cia.gov/library/publications/the-world-factbook/). The Factbook contains demographic information like:
 * population - The population as of 2015.
 * population_growth - The annual population growth rate, as a percentage.
 * area - The total land and water area.

The purpose of this project it to explore and compare demographic information between different parts of the world.

# Overview of Data

Let's first set up our environment to accomodate SQL Lite and import the Factbook data.

In [1]:
import sqlite3 as sql

In [2]:
%%capture
%load_ext sql

In [3]:
%sql sqlite:///factbook.db

'Connected: @factbook.db'

Next, let's preview the data.

In [4]:
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [5]:
%%sql -- ## Select the fact table and show the first 5 rows
SELECT *     
FROM facts
LIMIT 5      

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


# Summary Statistics

In [6]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
FROM facts

 * sqlite:///factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


The lowest population is 0, while the highest population is at 7,256,490,011 in our data set. In terms of population growth, the highest amount of population growth is at 4.02%, while the least growth is at 0%.

# Exploring Outliers

Let's look into the extreme ends of the least and most populated regions.

In [7]:
%%sql
SELECT *
FROM facts
WHERE population = (SELECT MAX(population) from facts)

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


Since World is just the sum of all region's population growth, we can exclude this from our data set.

In [8]:
%%sql
SELECT *
FROM facts
WHERE population = (SELECT MAX(population) from facts WHERE name <> 'World')

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


In [9]:
%%sql 
SELECT *
FROM facts
WHERE population = (SELECT MIN(population) from facts WHERE name <> 'World')

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


The region with the highest number of population is China, while the region with 0 population is Antarctica.

# Finding Densely Populated Regions

Let's look into which regions have:
 * Above average values for population
 * Below average values for area

In [10]:
%%sql -- # Calculate the average number of population and area
SELECT AVG(population), AVG(area)
FROM facts
WHERE name <> 'World' -- # Excludes World since its the sum of all region

 * sqlite:///factbook.db
Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


In [11]:
%%sql 
SELECT name, population, area 
  FROM facts
 WHERE name <> 'World'
   AND population > (SELECT AVG(population) from facts WHERE name <> 'World')
   AND area < (SELECT AVG(area) from facts WHERE name <> 'World')

 * sqlite:///factbook.db
Done.


name,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Iraq,37056169,438317
Italy,61855120,301340
Japan,126919659,377915
"Korea, South",49115196,99720
Morocco,33322699,446550
Philippines,100998376,300000
Poland,38562189,312685
Spain,48146134,505370


# Land and Water Area by Region

Let's look into which region have:
 * Highest ratios of water to land
 * More water than land

In [12]:
%%sql 
SELECT name, area, area_land, area_water , CAST(area_water as float)/CAST(area_land as float) as water_land_ratio
  FROM facts
 WHERE name <> 'World'
 ORDER BY water_land_ratio DESC
 LIMIT 5

 * sqlite:///factbook.db
Done.


name,area,area_land,area_water,water_land_ratio
British Indian Ocean Territory,54400,60,54340,905.6666666666666
Virgin Islands,1910,346,1564,4.520231213872832
Puerto Rico,13791,8870,4921,0.5547914317925592
"Bahamas, The",13880,10010,3870,0.3866133866133866
Guinea-Bissau,36125,28120,8005,0.2846728307254623


British Indian Ocean Territory, Virgin Islands, Puerto Rica, The Bahamas, and Guinea Bissau are the top 5 countries with the highest water to land ratio. British Indian Ocean Territory and Virgin Islands are the only two regions with greater than 1 water to land ratio. This means that they are the only two regions that has more water than land.

# Population by Region

Let's look into which region have:
 * Highest population growth rate
 * Higher death rate than birth rate

In [13]:
%%sql 
SELECT name, population, population_growth
  FROM facts
 WHERE name <> 'World'
 ORDER BY population_growth DESC
 LIMIT 5

 * sqlite:///factbook.db
Done.


name,population,population_growth
South Sudan,12042910,4.02
Malawi,17964697,3.32
Burundi,10742276,3.28
Niger,18045729,3.25
Uganda,37101745,3.24


It seems like South Sudan will add the most people to their population by the end of next year with the highest population growth of 4.02%.

In [14]:
%%sql 
SELECT name, death_rate, birth_rate
  FROM facts
 WHERE name <> 'World'
   AND death_rate > birth_rate

 * sqlite:///factbook.db
Done.


name,death_rate,birth_rate
Austria,9.42,9.41
Belarus,13.36,10.7
Bosnia and Herzegovina,9.75,8.87
Bulgaria,14.44,8.92
Croatia,12.18,9.45
Czech Republic,10.34,9.63
Estonia,12.4,10.51
Germany,11.42,8.47
Greece,11.09,8.66
Hungary,12.73,9.16


These region's population is declining since the death rate is higher than its birth rate.