# <center> Analysing CIA Factbook Data Using SQL</center>
<div>
<img src="Data/Seal_of_the_Central_Intelligence_Agency.svg.png" width="300"/>
</div>

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:
- <mark>population</mark> — the global population.
- <mark>population_growth</mark> — the annual population growth rate, as a percentage.
- <mark>area</mark> — the total land and water area.

We are going to use SQL to access this factbook and answer questions on the attributes of different nations.

## Setting up SQL in the Jupyter Notebook environment

As we are using SQLite to analyse the CIA dataset, we first need to initiliase it for use later in the notebook and fetch the dataset.

In [1]:
import pandas as pd
import numpy as np
import sqlite3 as sql

In [2]:
%%capture
%load_ext sql
# Loading the factbook database
%sql sqlite:///Data/factbook.db

Now we want to look at the different tables containted within the SQLite database that we loaded above. We have written a SQL query to look at the master of the database.

In [3]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///Data/factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


We now want to look more closely at the table of facts, so we have written a query to list the first five rows in the facts table.

In [4]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///Data/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


We now want to start investigating some statistics. We are going to write a single query which recovers the following information:

- <mark>Minimum population</mark>
- <mark>Maximum population</mark>
- <mark>Minimum population growth</mark>
- <mark>Maximum population growth</mark>

In [5]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth)
  FROM facts;

 * sqlite:///Data/factbook.db
Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


We see a few interesting things in the summary statistics on the previous screen:

- There's a country with a population of 0
- There's a country with a population of 7256490011 (or more than 7.2 billion people)

We are now going to use subqueries to zoom in on just these countries without using the specific values.

## Minimum population:

In [6]:
%%sql
SELECT name
  FROM facts
 WHERE population = (SELECT MIN(population)
                       FROM facts);

 * sqlite:///Data/factbook.db
Done.


name
Antarctica


From this subquery we can see that the country returning a population of zero is <mark>Antarctica</mark>.

## Maximum population:

In [7]:
%%sql
SELECT name
  FROM facts
 WHERE population = (SELECT MAX(population)
                       FROM facts);

 * sqlite:///Data/factbook.db
Done.


name
World


From this subquery we can see that the entry which returns a population of 7.2 billion is the <mark>World</mark> itself.

We now want to recalculate our summary statistics, but without including the whole world in the calculation.

In [8]:
%%sql
SELECT MIN(population) AS 'Minimum Population', 
       MAX(population) AS 'Maximum Popoulation',
       MIN(population_growth) 'Minimum Population Growth', 
       MAX(population_growth) AS 'Maximum Population Growth'
  FROM facts
 WHERE population != (SELECT MAX(population)
                       FROM facts);

 * sqlite:///Data/factbook.db
Done.


Minimum Population,Maximum Popoulation,Minimum Population Growth,Maximum Population Growth
0,1367485388,0.0,4.02


In [9]:
%%sql
SELECT ROUND(AVG(population), 2) AS 'Average Population', ROUND(AVG(area),2) AS 'Average Area'
  FROM facts
 WHERE population != (SELECT MAX(population)
                       FROM facts);

 * sqlite:///Data/factbook.db
Done.


Average Population,Average Area
32242666.57,582949.85


We now want to write an SQL query that will return countries with above average populations and below average areas.

In [10]:
%%sql
SELECT name, population, area
  FROM facts
 WHERE population >= (SELECT AVG(population)
                       FROM facts
                      WHERE name != 'World')
       AND
       area <= (SELECT AVG(area)
                  FROM facts
                 WHERE name != 'World');

 * sqlite:///Data/factbook.db
Done.


name,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Iraq,37056169,438317
Italy,61855120,301340
Japan,126919659,377915
"Korea, South",49115196,99720
Morocco,33322699,446550
Philippines,100998376,300000
Poland,38562189,312685
Spain,48146134,505370


Some of these countries are generally known to be densely populated, so we have confidence in our results!

## Further Analyses

- Which country has the most people? Which country has the highest growth rate?
- Which countries have the highest ratios of water to land? Which countries have more water than land?
- Which countries will add the most people to their populations next year?
- Which countries have a higher death rate than birth rate?
- Which countries have the highest population/area ratio, and how does it compare to list we found in the previous screen?

## Which country has the most people? Which country has the highest growth rate?

In [11]:
%%sql
SELECT name, population
  FROM facts
 WHERE population = (SELECT MAX(population)
                       FROM facts
                      WHERE name != 'World');

 * sqlite:///Data/factbook.db
Done.


name,population
China,1367485388


In [12]:
%%sql
SELECT name, population_growth
  FROM facts
 WHERE population_growth = (SELECT MAX(population_growth)
                       FROM facts
                      WHERE name != 'World');

 * sqlite:///Data/factbook.db
Done.


name,population_growth
South Sudan,4.02


These results tell us that <mark>China</mark> has the largest population on Earth, with **1.37 billion** people. <mark>South Sudan</mark> has the fastest population growth at **4.02%** per annum.

## Which countries have the highest ratios of water to land? Which countries have more water than land?

In [13]:
%%sql
SELECT name, ROUND(CAST(area_water AS float) / CAST(area_land AS float), 2) AS water_ratio
  FROM facts
 ORDER BY water_ratio DESC
 LIMIT 10;

 * sqlite:///Data/factbook.db
Done.


name,water_ratio
British Indian Ocean Territory,905.67
Virgin Islands,4.52
Puerto Rico,0.55
"Bahamas, The",0.39
Guinea-Bissau,0.28
Malawi,0.26
Netherlands,0.23
Uganda,0.22
Eritrea,0.16
Liberia,0.16


In [14]:
%%sql
SELECT name, ROUND(CAST(area_water AS float) / CAST(area_land AS float), 2) AS water_ratio
  FROM facts
 WHERE area_water > area_land;

 * sqlite:///Data/factbook.db
Done.


name,water_ratio
British Indian Ocean Territory,905.67
Virgin Islands,4.52


The <mark>British Indian Ocean Territory</mark> has an anomalously high ratio of of **905.67**, while the <mark>Virgin Islands</mark> also has a high ratio of **4.52**. These high ratios make sense as these are archipelagos surrounded by vast areas of ocean. 

These two ratios above 1 are confirmed by the query returning only these two countries as ones with more water than land.

However, it is curious as only these two countries have ratios above 1. Many other coutnries are islands surrounded by significant bodies of water.

## Which countries will add the most people to their populations next year?

In [15]:
%%sql
SELECT name, (ROUND(CAST(population AS float) * CAST(population_growth AS float), 2)) / 100 AS new_people
  FROM facts
 WHERE name != 'World'
 ORDER BY new_people DESC
 LIMIT 5;

 * sqlite:///Data/factbook.db
Done.


name,new_people
India,15270686.1248
China,6153684.246
Nigeria,4448270.372
Pakistan,2906653.3662
Ethiopia,2874562.1691


According to this finding, <mark>India</mark> will have the most significant population increase next year, with **15.3 million** babies expected to be born. The next country on the list is <mark>China</mark>, with an expected popoulation increase of **6.2 million** people.

## Which countries have a higher death rate than birth rate?

In [16]:
%%sql
SELECT name, death_rate, birth_rate, ROUND((death_rate / birth_rate ), 2) AS death_over_life
  FROM facts
 WHERE death_over_life > 1
 ORDER BY death_over_life DESC;

 * sqlite:///Data/factbook.db
Done.


name,death_rate,birth_rate,death_over_life
Bulgaria,14.44,8.92,1.62
Serbia,13.66,9.08,1.5
Latvia,14.31,10.0,1.43
Lithuania,14.27,10.1,1.41
Hungary,12.73,9.16,1.39
Monaco,9.24,6.65,1.39
Germany,11.42,8.47,1.35
Slovenia,11.37,8.42,1.35
Ukraine,14.46,10.72,1.35
Saint Pierre and Miquelon,9.72,7.42,1.31


The above countries will see a greater proporotion of their population die than will be born. Without any immigration into these countries, we would expect these popoulations to begin declining in their overall size.

##  Which countries have the highest population/area ratio?

In [17]:
%%sql
SELECT name, population, area, population / area AS density
  FROM facts
 ORDER BY density DESC
 LIMIT 10;

 * sqlite:///Data/factbook.db
Done.


name,population,area,density
Macau,592731,28,21168
Monaco,30535,2,15267
Singapore,5674472,697,8141
Hong Kong,7141106,1108,6445
Gaza Strip,1869055,360,5191
Gibraltar,29258,6,4876
Bahrain,1346613,760,1771
Maldives,393253,298,1319
Malta,413965,316,1310
Bermuda,70196,54,1299


## Observations

We can see that <mark>Macau</mark> has the highest population density, with 21,168 $\frac{persons}{km^2}$. This is followed by <mark>Monaco</mark>, <mark>Singapore</mark> and <mark>Hong Kong</mark>. Notably these are all city states, with the entirity of the country's population being based within a very area small area ($2 - 1100\ km^2$), hence resulting in high population densities.