# Analysing The CIA Factbook Using SQL
In this project we will be reviewing the [CIA Factbook](https://www.cia.gov/the-world-factbook/) which contains a compendium of statistics about all of the countries on Earth. 

The Factbook contains demographic information such as populations, population growth rates, areas of land and water for the respective countries.

## Summary Of Results
- There is a great disparity between the most populated countries of China and India with 1 billion people compared to the smallest of Pitcairn Islands or Antartica (of ~50 and of 0 people respectively)
- The highest growth rates are seen in African countries. 
- The lowest growth rates are seen are generally islands. Slovakia, Russia and Georgia are also among the lowest growth rates.
- An average country has 30million people and 5.4 million sq km but no country is similar (+/-10%) to the average.
- The most densely populated areas are Macau, Monaco, Singapore, Hong Kong and the Gaza Strip


- Some of the small islands have very low land to water values.
Unsurpisingly some land locked countries top this land to water list.
- Saraha desert countries are also not a surprise for high land to water.
Some suprises could be Finland, Netherlands, and Bangladesh have surprisingly high levels of land to water, they could by some measures to be considered quite wet / humid.
- The World is increasing by 78.4 million a year.
- African and South East Asian countries here have disproportionate growth per population compared to China or the World. China  has below average growth, but as we saw earlier it has lot of people (~20% of the world live in China).
- European countries, Japan and Russia are seen here to have negative natural population change.
- The European Union despite having no population increase is experiencing a large amount of net migration. (12 Million a year for its population of 513 million)

- The United States has a compounding issue of both high migration (comparable to Europe) and a large birth-death rate.

- Italy and Russia are in the negative for birth-death but both have significant migration to them that offsets the natural population decrease.
96.2 million people migrated to a new country.


## Loading SQL and The Database 

In [1]:
%%capture 
%load_ext sql 
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

In [2]:
%%sql /*#(This needs to be run for each SQL query)*/
SELECT *
  FROM sqlite_master
 WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [3]:
%%sql /*# Lets take a peek at the first 5 rows*/
SELECT *
FROM facts
LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


The columns we have are:
- code - A two letter abbreviatiion
- name - The english Name of the country
- area - Size of land + water territory in square kilometers
- area_land and area_water - Sizes of land and water territory (respectively) in square kilometers
- population - A best estimate of the number of residents (from censuses, surveys and trends)
- population_growth - The annual percentage change in population. Positive values indicate positive growth.
- birth_rate , death_rate - The number of births and deaths (respectively) per 1000 people. (also called the crude birth rate and crude death rate)
- migration_rate - The net migration rate showing the number of people migrating to the country minus the number of people migrating out of the country

## Initial Population Statistics
Lets explore the Minimums and Maximums of Population and Population Growth.
Lets also look at some of the least populated countries.

In [4]:
%%sql 
SELECT Min(population),Max(population), 
Min(population_growth),Max(population_growth)
FROM facts;

Done.


Min(population),Max(population),Min(population_growth),Max(population_growth)
0,7256490011,0.0,4.02


There are countries (or a country) with a zero population.

7,256,490,011 or 7 billion people is the country with the higest population

#### What are the least populated countries?

In [5]:
%%sql 
SELECT name, population, population_growth
FROM facts
ORDER by population
LIMIT 25;

Done.


name,population,population_growth
Ashmore and Cartier Islands,,
Coral Sea Islands,,
Heard Island and McDonald Islands,,
Clipperton Island,,
French Southern and Antarctic Lands,,
Bouvet Island,,
Jan Mayen,,
British Indian Ocean Territory,,
South Georgia and South Sandwich Islands,,
Navassa Island,,


Minimum has ignored the 'None' populations. Antartica is the only country with a zero population listed.

Unsuprisingly the Oceans, Military Islands (such as Wake Island	or British Indian Ocean Territory) are listed as None for population.

#### What are the most populated countries?

In [6]:
%%sql 
SELECT name, population
FROM facts
ORDER by population DESC
LIMIT 5;

Done.


name,population
World,7256490011
China,1367485388
India,1251695584
European Union,513949445
United States,321368864


Now we can see clearly. The max population of 7+ billion is the world population in the database.
China and India are relatively close in population.
Their populations combined are 36% of the worlds.

#### Whats the max population growths?:
(We will remove the "non-countries' of Anatartic, World and European Union in this search.)

In [7]:
%%sql 
SELECT name, population_growth
FROM facts
WHERE name NOT IN ('Antarctica', 'World', 'European Union')
ORDER BY population_growth DESC
LIMIT 5;

Done.


name,population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24


The top 5 countries with the highest growth rate are all in Africa.

#### What are the minimum population growths?

In [55]:
%%sql 
SELECT name, population_growth
FROM facts
WHERE name NOT IN ('Antarctica', 'World', 'European Union')
ORDER BY population_growth
LIMIT 40;

Done.


name,population_growth
Kosovo,
Ashmore and Cartier Islands,
Coral Sea Islands,
Heard Island and McDonald Islands,
Clipperton Island,
French Southern and Antarctic Lands,
Saint Barthelemy,
Saint Martin,
Bouvet Island,
Jan Mayen,


Lots of Islands listed in no population growth are nature reserves or military bases.

Interestingly seems to be no negative growths on the factbook at the time the data was captured for this DataQuest project.
Certainly the 2020 data shows this is quite different with 129 countries showing negative growth.
Lebanon in particular shows ~9% negative growth.

#### What is the average population and area?

In [9]:
%%sql 
SELECT AVG(CAST(population AS FLOAT)) AS Average_Population,
AVG(CAST (area AS FLOAT)) AS Average_Area
FROM facts
WHERE name NOT IN ('Antarctica', 'World', 'European Union')

Done.


Average_Population,Average_Area
30362063.58995816,539893.1895161291


30 million people and and 5.4 million sq km is the Average Country.

## Is There An Average Country?

Are any countries similar (+/-10%) to this 'Average Country' above?

In [10]:
%%sql
SELECT name, population, area
FROM facts
WHERE population BETWEEN 27300000 AND 33400000 
ORDER by population DESC;

Done.


name,population,area
Morocco,33322699,446550
Afghanistan,32564342,652230
Nepal,31551305,147181
Malaysia,30513848,329847
Peru,30444999,1285216
Venezuela,29275460,912050
Uzbekistan,29199942,447400
Saudi Arabia,27752316,2149690


In [11]:
%%sql
SELECT name, population, area
FROM facts
WHERE area BETWEEN 485000 AND 594000
ORDER by area DESC;

Done.


name,population,area
Madagascar,23812681,587041
Botswana,2182719,581730
Kenya,45925301,580367
Yemen,26737317,527968
Thailand,67976405,513120
Spain,48146134,505370
Turkmenistan,5231422,488100


In [12]:
%%sql
SELECT name, population, area, migration_rate
FROM facts
WHERE name IN ('Afghanistan', 'Brazil', 'Lebanon')
ORDER by area DESC;

Done.


name,population,area,migration_rate
Brazil,204259812,8515770,0.14
Afghanistan,32564342,652230,1.51
Lebanon,6184701,10400,1.1


8 Countries of similar population, 7 of similar area. 

No countries fills both averages.

Clearly world population density is not represented in any one country.

## Countries With The Highest Population Density

In [59]:
%%sql
SELECT name, population, area,
(CAST(population AS FLOAT) / area_land) as Population_Density
FROM facts
ORDER by Population_Density DESC
LIMIT 10;

Done.


name,population,area,Population_Density
Macau,592731,28,21168.964285714286
Monaco,30535,2,15267.5
Singapore,5674472,697,8259.784570596797
Hong Kong,7141106,1108,6655.27120223672
Gaza Strip,1869055,360,5191.819444444444
Gibraltar,29258,6,4876.333333333333
Bahrain,1346613,760,1771.8592105263158
Maldives,393253,298,1319.6409395973155
Malta,413965,316,1310.01582278481
Bermuda,70196,54,1299.925925925926


Many on this list are islands or are in Asia. The exception being the Gaza Strip. 

These are countries with high population density. Clearly there will still be variation within the countries and some countries may have more densely populated cities that the countries on this list.

As to where to progress from here lets have a brief recap of the columns available:

In [14]:
%%sql
SELECT *
FROM facts
LIMIT 1

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51


Areas still to to explore:
- Water area to land area.
- Birth minus death rate
- Natural Population increase each year
- Migrant population added each year

## Greatest And Smallest Land To Water Ratio

In [15]:
%%sql
SELECT name, area_land, area_water, ROUND((CAST(area_land AS FLOAT) / area_water) ,4) AS area_land_per_water
FROM facts
WHERE name NOT IN ('Antarctica', 'World', 'European Union') 
AND ((area_land_per_water BETWEEN 0 AND 10) OR (area_land_per_water >1000))
ORDER BY area_land_per_water DESC;


Done.


name,area_land,area_water,area_land_per_water
Bosnia and Herzegovina,51187,10,5118.7
Niger,1266700,300,4222.3333
Morocco,446300,250,1785.2
Guinea,245717,140,1755.1214
Costa Rica,51060,40,1276.5
Djibouti,23180,20,1159.0
India,2973193,314070,9.4667
Finland,303815,34330,8.8498
Taiwan,32260,3720,8.672
"Gambia, The",10120,1180,8.5763


Some of the small islands have very low land to water values.

Unsurpisingly some land locked countries top this land to water list.

Saraha desert countries are also not a suprise for high land to water.

Some suprises could be Finland, Netherlands, and Bangladesh which could by some measures to be considered quite wet / humid.

## What is the Natural Population Increase per 1000 people?
This is the birth rate minus the death rate of a country.

In [16]:
%%sql
SELECT name, population, birth_rate, death_rate,
ROUND((CAST(birth_rate AS FLOAT) - CAST(death_rate AS FLOAT)), 3) AS births_minus_deaths
FROM facts
ORDER BY births_minus_deaths DESC
LIMIT 10

Done.


name,population,birth_rate,death_rate,births_minus_deaths
Malawi,17964697,41.56,8.41,33.15
Uganda,37101745,43.79,10.69,33.1
Niger,18045729,45.45,12.42,33.03
Burundi,10742276,42.01,9.27,32.74
Mali,16955536,44.99,12.89,32.1
Burkina Faso,18931686,42.03,11.72,30.31
Zambia,15066266,42.13,12.67,29.46
Ethiopia,99465819,37.27,8.19,29.08
South Sudan,12042910,36.91,8.18,28.73
Tanzania,51045882,36.39,8.0,28.39


African countries as we saw earlier dominate the birth rate, but also dominate the birth-death rate (per 1000 people).
Lets now see the countries with the largest natural population increase.

## What is the Natural Population Increase?
This is represented by the birth rate minus the death rate (which are per 1000 people) multiplied by the population, divided by 1000.

Lets now see the countries with the largest natural population increase.

In [17]:
%%sql
SELECT name,
ROUND((CAST(birth_rate AS FLOAT) - CAST(death_rate AS FLOAT)), 3) AS births_minus_deaths,
(ROUND((((CAST(birth_rate AS FLOAT) - CAST(death_rate AS FLOAT)) * CAST(population AS FLOAT)) /1000),0)) AS natural_population_increase
FROM facts
ORDER BY natural_population_increase DESC
LIMIT 10

Done.


name,births_minus_deaths,natural_population_increase
World,10.8,78370092.0
India,12.23,15308237.0
China,4.96,6782728.0
Nigeria,24.74,4491845.0
Pakistan,16.09,3203291.0
Ethiopia,29.08,2892466.0
Indonesia,10.35,2649535.0
Bangladesh,15.53,2623914.0
"Congo, Democratic Republic of the",24.81,1969297.0
Philippines,18.16,1834131.0


The World is increasing by 78.4 million a year.

African and South East Asian countries here have disproportionate growth per population compared to China or the World. China  has below average growth, but as we saw earlier it has lot of people (~20% of the world live in China).

In [18]:
%%sql
SELECT name, population,
ROUND((CAST(birth_rate AS FLOAT) - CAST(death_rate AS FLOAT)), 3) AS births_minus_deaths,
(ROUND((((CAST(birth_rate AS FLOAT) - CAST(death_rate AS FLOAT)) * CAST(population AS FLOAT)) /1000),0)) AS natural_population_increase
FROM facts
WHERE births_minus_deaths NOT NULL
ORDER BY natural_population_increase
LIMIT 25

Done.


name,population,births_minus_deaths,natural_population_increase
Russia,142423773,-2.09,-297666.0
Germany,80854408,-2.95,-238521.0
Japan,126919659,-1.58,-200533.0
Ukraine,44429471,-3.74,-166166.0
Italy,61855120,-1.45,-89690.0
Romania,21666350,-2.76,-59799.0
Bulgaria,7186893,-5.52,-39672.0
Hungary,9897541,-3.57,-35334.0
Serbia,7176794,-4.58,-32870.0
Greece,10775643,-2.43,-26185.0


European countries, Japan and Russia are seen here to have negative natural population change. Saint Pierre and Miquelon is part of Canadas Newfoundland.

It is interesting to note the European Union has no net population change.

It is also worth noting this is merely birth - death rate and doesnt account for any immigration change.

So next we shall explore, 'Is population increase also reflected in migration increase?'

## Birth - Deaths Compared To Migration 

In [41]:
%%sql
SELECT name, population, migration_rate,
(ROUND((((CAST(birth_rate AS FLOAT) - CAST(death_rate AS FLOAT)) * CAST(population AS FLOAT)) /1000),0)) AS natural_population_increase
, ROUND((CAST(migration_rate AS FLOAT) * CAST(population AS FLOAT) / 100),3) AS net_migration
FROM facts
ORDER BY net_migration DESC
LIMIT 10

Done.


name,population,migration_rate,natural_population_increase,net_migration
European Union,513949445,2.5,0.0,12848736.125
United States,321368864,3.86,1394741.0,12404838.15
China,1367485388,0.44,6782728.0,6016935.707
Spain,48146134,8.31,28888.0,4000943.735
Syria,17064854,19.79,310068.0,3377134.607
Pakistan,199085847,1.54,3203291.0,3065922.044
Indonesia,255993674,1.16,2649535.0,2969526.618
Italy,61855120,4.1,-89690.0,2536059.92
Russia,142423773,1.69,-297666.0,2406961.764
Philippines,100998376,2.09,1834131.0,2110866.058


The European Union despite having no population increase is experiencing a large amount of net migration. (12 Million a year for its population of 513 million)

The United States has a compounding issue of both high migration (comparable to Europe) and a large birth-death rate.

Italy and Russia are in the negative for birth-death but both have significant migration to them that offsets the natural population decrease.

In [46]:
%%sql
SELECT name, population, migration_rate,
(ROUND((((CAST(birth_rate AS FLOAT) - CAST(death_rate AS FLOAT)) * CAST(population AS FLOAT)) /1000),0)) AS natural_population_increase
, ROUND((CAST(migration_rate AS FLOAT) * CAST(population AS FLOAT) / 100),3) AS net_migration
FROM facts
ORDER BY net_migration 
LIMIT 50

Done.


name,population,migration_rate,natural_population_increase,net_migration
Kosovo,1870981.0,,,
Montenegro,647073.0,,641.0,
Holy See (Vatican City),842.0,,,
Ashmore and Cartier Islands,,,,
Christmas Island,1530.0,,,
Cocos (Keeling) Islands,596.0,,,
Coral Sea Islands,,,,
Heard Island and McDonald Islands,,,,
Norfolk Island,2210.0,,,
Clipperton Island,,,,


Lots of missing data here. There are no countries in the data set with negative migration, this is most likely false.

A mix of countries here. African, Asian, and both Central and Southern American countries.

It seems the database only includes the positive migration rates.

## How Many People Are Migrating?

In [54]:
%%sql
SELECT (SUM(migration_rate * population) / 100) AS Migration_Total    
FROM facts

Done.


Migration_Total
96186237.80490002


96.2 million people migrated to a new country.