## Analysing CIA World Factbook
---

### Introduction

In this project we'll work with data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/). That contains a lot of statistics about countries from all ovet the world.

For project we use `factbook.db` database that you can download [here](https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db).

First of let's connect to the database and quickly explore it.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

In [2]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type = 'table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


There ate two tables in the database. Let's see what them contains.

In [3]:
%%sql
SELECT *
  FROM sqlite_sequence
 LIMIT 5; 

Done.


name,seq
facts,261


In [4]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;   

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


We'll work with the `facts` table. Column names are mostly clear but anyway we'll give short description for them below:
* `id` - the index number
* `code` - the country's code
* `name` - the country's name
* `area` - the country's total area in km2
* `area_land` - the country's land area in km2
* `area_water` - the country's water area in km2
* `population` - the country's population
* `population_growth` - the country's population growth in %
* `birth_rate` - the number of birth per 1000 people
* `death_rate` - the number of death per 1000 people
* `migration_rate` - the number of migrants per 1000 people

### Population analysys

Now we can start exploring our data. First goes population, let's check min and max population and growth also.

In [5]:
%%sql
SELECT MIN(population) AS 'Min population',
       MAX(population) AS 'Max population',
       MIN(population_growth) AS 'Min population growth',
       MAX(population_growth) AS 'Max population growth'
  FROM facts;

Done.


Min population,Max population,Min population growth,Max population growth
0,7256490011,0.0,4.02


It is quite exiting to find that there is at least one country with **zero population**. Also there is no country with the negative population growth. Wonderfull!

Now let's find out country with the largest population and countries with zero population.

In [6]:
%%sql
SELECT name, population
  FROM facts
 WHERE population = (SELECT MAX(population)
                       FROM facts
                      )
    OR population = (SELECT MIN(population)
                       FROM facts
                      );

Done.


name,population
Antarctica,0
World,7256490011


So the largest popualtion(**7.2 billion**) is the whole world population. We might guess about it.

There is explanation for zero population aswell. It's Antarctica! It is consistent with [Antarctica page](https://www.cia.gov/the-world-factbook/countries/antarctica/):
> no indigenous inhabitants, but there are both permanent and summer-only staffed research stations

Now we can exclude these two rows and find real number for population and growth.

In [26]:
%%sql
SELECT name, population, area
  FROM facts
 WHERE name = 'European Union'

Done.


name,population,area
European Union,513949445,4324782


Also I've find out there is **European Union** in the table. We'll exclude it from some futher analysys.

In [27]:
%%sql
SELECT MIN(population) AS 'Min population',
       MAX(population) AS 'Max population',
       MIN(population_growth) AS 'Min population growth',
       MAX(population_growth) AS 'Max population growth'
  FROM facts
 WHERE name NOT IN ('World', 'European Union');

Done.


Min population,Max population,Min population growth,Max population growth
0,1367485388,0.0,4.02


Also find country with the largest population.

In [28]:
%%sql
SELECT name, population
  FROM facts
 WHERE population = (SELECT MAX(population)
                       FROM facts
                      WHERE name NOT IN ('World', 'European Union')
                    );

Done.


name,population
China,1367485388


It's China! It is wide known fact so we might guess it aswell. Exclude `World` row again.

### Densely populated countries

Let's calculate the average value for the `population` and `area` columns first.

In [66]:
%%sql
SELECT AVG(population) AS 'Avg population',
       AVG(area) AS 'Avg area',
       AVG(population)/AVG(area) AS 'Avg density'
  FROM facts
 WHERE name NOT IN ('World', 'European Union');

Done.


Avg population,Avg area,Avg density
30235554.991666667,539893.1895161291,56.00284570873141


Now we can use these values to identify densely populated countries. They should meet the following criteria:
* Above-average values for population.
* Below-average values for area

Also calculate population density for these countries.

In [30]:
%%sql
SELECT name, population, area, ROUND(CAST(population AS Float)/area, 2) AS 'population density'
  FROM facts
 WHERE population > (SELECT AVG(population)
                       FROM facts
                      WHERE name NOT IN ('World', 'European Union')
                      )
   AND area < (SELECT AVG(area)
                 FROM facts
                WHERE name NOT IN ('World', 'European Union')
                      )
   AND name NOT IN ('World', 'European Union')
 ORDER BY "population density" DESC;

Done.


name,population,area,population density
Bangladesh,168957745,148460,1138.07
"Korea, South",49115196,99720,492.53
Philippines,100998376,300000,336.66
Japan,126919659,377915,335.84
Vietnam,94348835,331210,284.86
United Kingdom,64088222,243610,263.08
Germany,80854408,357022,226.47
Nepal,31551305,147181,214.37
Italy,61855120,301340,205.27
Uganda,37101745,241038,153.92


Now we can see that **Banladesh** is most densely populated country. It has more than 2 times higher density than the second place - **South Korea**.

Just out of curiosity let's find least densely populated countries.

In [31]:
%%sql
SELECT name, population, area, ROUND(CAST(population AS FLOAT)/area, 2) AS 'population density'
  FROM facts
 WHERE population < (SELECT AVG(population)
                       FROM facts
                      WHERE name NOT IN ('World', 'European Union')
                      )
   AND area > (SELECT AVG(area)
                 FROM facts
                WHERE name NOT IN ('World', 'European Union')
                      )
   AND name NOT IN ('World', 'European Union')
 ORDER BY "population density";

Done.


name,population,area,population density
Greenland,57733,2166086,0.03
Mongolia,2992908,1564116,1.91
Namibia,2212307,824292,2.68
Australia,22751014,7741220,2.94
Mauritania,3596702,1030700,3.49
Libya,6411776,1759540,3.64
Botswana,2182719,581730,3.75
Kazakhstan,18157122,2724900,6.66
Central African Republic,5391539,622984,8.65
Bolivia,10800882,1098581,9.83


**Greenland** has the lowest population density but it is an [autonomous territory within the Kingdom of Denmark](https://en.wikipedia.org/wiki/Greenland).
So actual country with the lowest population density is **Mongolia**.

### Growth rate

Let's find out top 10 countries with highest growth rate. Also calculate actual number of people that these countries will add to their populations next year.

In [32]:
%%sql
SELECT name, population, population_growth, ROUND(population*population_growth, 2) AS 'yearly change'
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
 ORDER BY population_growth DESC
 LIMIT 10;

Done.


name,population,population_growth,yearly change
South Sudan,12042910,4.02,48412498.2
Malawi,17964697,3.32,59642794.04
Burundi,10742276,3.28,35234665.28
Niger,18045729,3.25,58648619.25
Uganda,37101745,3.24,120209653.8
Qatar,2194817,3.07,6738088.19
Burkina Faso,18931686,3.03,57363008.58
Mali,16955536,2.98,50527497.28
Cook Islands,9838,2.95,29022.1
Iraq,37056169,2.93,108574575.17


All countries except **Cook Islands** are from **Africa** and **Middle East**. Probably it's because of high fertility rate(births) per women in this region. But we should explore it more.
![Image](https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Countriesbyfertilityrate.svg/2560px-Countriesbyfertilityrate.svg.png)


---

Let's also sort it by absolute values - `yearly chage`.

In [40]:
%%sql
SELECT name, population, population_growth,
       ROUND(population*population_growth/100, 2) AS 'yearly change'
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
 ORDER BY "yearly change" DESC
 LIMIT 10;

Done.


name,population,population_growth,yearly change
India,1251695584,1.22,15270686.12
China,1367485388,0.45,6153684.25
Nigeria,181562056,2.45,4448270.37
Pakistan,199085847,1.46,2906653.37
Ethiopia,99465819,2.89,2874562.17
Bangladesh,168957745,1.6,2703323.92
United States,321368864,0.78,2506677.14
Indonesia,255993674,0.92,2355141.8
"Congo, Democratic Republic of the",79375136,2.45,1944690.83
Philippines,100998376,1.61,1626073.85


Here we can see counries with the highest current population like **China** and **India**. It expains huge yearly change numbers.

There are no countries from top 10 growth rate due to their small current population.

At the and of the paragraph I want to see **World** population growth.

In [39]:
%%sql
SELECT name, population, population_growth,
       ROUND(population*population_growth/100, 2) AS 'yearly change'
  FROM facts
 WHERE name = 'World'

Done.


name,population,population_growth,yearly change
World,7256490011,1.08,78370092.12


### Birth and death rates

Let's take a closer look to the birth and death rates of these countries.

In [34]:
%%sql
SELECT name, population_growth, birth_rate, death_rate
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
 ORDER BY population_growth DESC
 LIMIT 10;

Done.


name,population_growth,birth_rate,death_rate
South Sudan,4.02,36.91,8.18
Malawi,3.32,41.56,8.41
Burundi,3.28,42.01,9.27
Niger,3.25,45.45,12.42
Uganda,3.24,43.79,10.69
Qatar,3.07,9.84,1.53
Burkina Faso,3.03,42.03,11.72
Mali,2.98,44.99,12.89
Cook Islands,2.95,14.33,8.03
Iraq,2.93,31.45,3.77


It's interesting. Higher growth rate has lowest birth rate from our top 10 table. So death rate probably impact even more then birth rate.
Let's check top countries with highest birth and death rates

In [35]:
%%sql
SELECT name, population_growth, birth_rate, death_rate
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
 ORDER BY birth_rate DESC
 LIMIT 10;

Done.


name,population_growth,birth_rate,death_rate
Niger,3.25,45.45,12.42
Mali,2.98,44.99,12.89
Uganda,3.24,43.79,10.69
Zambia,2.88,42.13,12.67
Burkina Faso,3.03,42.03,11.72
Burundi,3.28,42.01,9.27
Malawi,3.32,41.56,8.41
Somalia,1.83,40.45,13.62
Angola,2.78,38.78,11.49
Mozambique,2.45,38.58,12.1


All countries are from **Africa**. This region is the leader for number of births as we could see at the fertility map above.

Now death rates.

In [36]:
%%sql
SELECT name, population_growth, birth_rate, death_rate
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
 ORDER BY death_rate DESC
 LIMIT 10;

Done.


name,population_growth,birth_rate,death_rate
Lesotho,0.32,25.47,14.89
Ukraine,0.6,10.72,14.46
Bulgaria,0.58,8.92,14.44
Guinea-Bissau,1.91,33.38,14.33
Latvia,1.06,10.0,14.31
Chad,1.89,36.6,14.28
Lithuania,1.04,10.1,14.27
Namibia,0.59,19.8,13.91
Afghanistan,2.32,38.57,13.89
Central African Republic,2.13,35.08,13.8


Quite unexpected find here countries like **Bulgaria** or **Ukraine**. It seems **Eastern Europe** and **Baltic region** are suffering from high death rates as much as **Africa**.
![Image](https://upload.wikimedia.org/wikipedia/commons/d/d7/Death_rate_world_map.PNG)

___
### Area. Land and water ratio

Now let's explore country territories a bit. First we'll find top 10 countries with land area.

In [23]:
%%sql
SELECT name, area, area_land, area_water
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
 ORDER BY area DESC
 LIMIT 10;

Done.


name,area,area_land,area_water
Russia,17098242,16377742,720500
Canada,9984670,9093507,891163
United States,9826675,9161966,664709
China,9596960,9326410,270550
Brazil,8515770,8358140,157630
Australia,7741220,7682300,58920
India,3287263,2973193,314070
Argentina,2780400,2736690,43710
Kazakhstan,2724900,2699700,25200
Algeria,2381741,2381741,0


Three is no surprise, [**Russia**](https://en.wikipedia.org/wiki/Russia) is the biggest country. Whole list obviously correspondts with [wikipedia](https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_area).

![Image](https://travelnotes.org/Europe/images/russia.gif)
___

Now we'll find bottom 10.

In [24]:
%%sql
SELECT name, area, area_land, area_water
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
   AND area IS NOT NULL
 ORDER BY area
 LIMIT 10;

Done.


name,area,area_land,area_water
Holy See (Vatican City),0,0,0
Monaco,2,2,0
Coral Sea Islands,3,3,0
Ashmore and Cartier Islands,5,5,0
Navassa Island,5,5,0
Spratly Islands,5,5,0
Clipperton Island,6,6,0
Gibraltar,6,6,0
Wake Island,6,6,0
Paracel Islands,7,7,0


There are only city-states or island states which explains their area values.[**Vatican**](https://en.wikipedia.org/wiki/Vatican_City) is so small that it has less than 1 km2 according the table.

![Image](https://maps-vatican.com/img/1200/map-of-vatican-city-in-rome.jpg)
___

Last thing that we're going to do is calculating land water ratio. With these values check top 10 and bottom 10 countries aswell.

But before that we'll find how many counties do not have any water area at all.

In [52]:
%%sql
SELECT COUNT(name) AS 'Countries without water area'
  FROM facts
 WHERE area_water = 0
    OR area_water IS NULL

Done.


Countries without water area
108


There are **108** countries without any water area or without info about it in the table. We'll exclude them from ratio calculation.
Let's start with top 10.

In [53]:
%%sql
SELECT name, area, area_land, area_water,
       ROUND(CAST(area_land AS Float)/area_water, 2) AS land_water_ratio
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
   AND (area_water != 0 OR area_water IS NOT NULL)
 ORDER BY land_water_ratio DESC
 LIMIT 10;

Done.


name,area,area_land,area_water,land_water_ratio
Bosnia and Herzegovina,51197.0,51187,10,5118.7
Niger,,1266700,300,4222.33
Morocco,446550.0,446300,250,1785.2
Guinea,245857.0,245717,140,1755.12
Costa Rica,51100.0,51060,40,1276.5
Djibouti,23200.0,23180,20,1159.0
"Korea, North",120538.0,120408,130,926.22
Cyprus,9251.0,9241,10,924.1
Namibia,824292.0,823290,1002,821.65
Burkina Faso,274200.0,273800,400,684.5


Bottom 10

In [67]:
%%sql
SELECT name, area, area_land, area_water,
       ROUND(CAST(area_land AS Float)/area_water, 5) AS land_water_ratio
  FROM facts
 WHERE name NOT IN ('World', 'European Union')
   AND area_water != 0
   AND area_water IS NOT NULL
   AND area_land IS NOT NULL 
 ORDER BY land_water_ratio
 LIMIT 10;

Done.


name,area,area_land,area_water,land_water_ratio
British Indian Ocean Territory,54400,60,54340,0.0011
Virgin Islands,1910,346,1564,0.22123
Puerto Rico,13791,8870,4921,1.80248
"Bahamas, The",13880,10010,3870,2.58656
Guinea-Bissau,36125,28120,8005,3.5128
Malawi,118484,94080,24404,3.85511
Netherlands,41543,33893,7650,4.43046
Uganda,241038,197100,43938,4.48587
Eritrea,117600,101000,16600,6.08434
Liberia,111369,96320,15049,6.40043


Let's tale a closer look at our leaders.

[**Bosnia and Herzegovina**](https://en.wikipedia.org/wiki/Bosnia_and_Herzegovina) is a country in South and Southeast Europe, located within the Balkans. It has access to the sea across the tiny isthmus.  

![Image](https://www.ecoi.net/en/file/local/1302548/4543_1438871079_bosnia-and-herzegovina-sm-2015.gif)
___

[**British Indian Ocean Territory**](https://en.wikipedia.org/wiki/British_Indian_Ocean_Territory) comprises the seven atolls of Chagos Archipelago in the Indian Ocean.

![Image](https://res.cloudinary.com/fen-learning/image/upload/c_limit/infopls_images/images/mbritindian.gif)
___

There are also countries in the list with high land water ratio but without any access to sea. For example [**Malawi**](https://www.researchgate.net/profile/Steven-Gondwe/publication/341480255/figure/fig1/AS:895036526981120@1590404745084/Map-of-Malawi-showing-districts-and-major-water-bodies.ppm) which has huge lake. 

![Image](https://www.countrycodeguide.com/worldmap/uploads/settings/map-malawi.jpg)
___

### Conclusions

In this project we've analysed some demographic and geogrphic data from CIA Factbook. Here some findings:
##### Demigraphic
* **China** is the country with the largest population. It has **1.367B** people.
* **India** has largest population growth per year at absolute numbers - **15.27M**. Then go **China** with **6.15M** and **Nigerea** with **4.44M**.
* But if we measure growth rate in relative values the first three places will be next:
    1. **South Sudan** with	**4.02%**
    2. **Malawi** with	**3.32%**
    3. **Burundi** with	**3.28%**
* We went further and estimated birth and death rates. We've discovered that all 10 countries with highest birth rate are from **Africa**. Their birth rates lie between **38.58** and **45.45**.
* More exciting findings in death rate analysys. It seems **Eastern Europe** and **Baltic region** are suffering from high death rates as much as **Africa**. Death rates lie between **13.8** and **14.89**.

###### Geographic
* We've found the largest country which is **Russia** with **17098242 km2** area. **Canada** and **USA** are folowing with **9984670 km2** and	**9826675 km2** respectively.
* Also we've explored smallest countries which are bunch of island states and city-states. **Vatican** is the smallest country with less than **1 km2** area.
* Finally we've calculated land water ratio and analysed countries using it. There are countries all over the world with high land water ratio which means lack of water resources. **Bosnia and Herzegovina**, **Namibia** and **North Korea** for exaple.
* Countries with low land water ratio are separated into two categories:
    1. Island and coastal countries like **British Indian Ocean Territory** or **Netherlands**.
    2. Counties with fresh water reservoirs like **Malawi** or **Uganda**.