# CIA Factbook Data
---

## Introduction

The aim of the following project is to find countries with the highest population, look for some patterns in population growth, land area, birth and death rates.
The data used in this project comes from CIA World Factbook, which is a compendium of statistics about all of the countries on Earth, containing demographic information regarding:
- population (as of 2015),
- area - the total water and land area,
- population_growth - percentage of the annual population growth rate.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

We imported the necessary libraries and prepared our sql data set to work. Now we will write a query to return information on the tables in the database.

In [2]:
%%sql
SELECT * FROM sqlite_master WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Our database contains two tables: `sqlite_sequence` and `facts`. We would like to investigate the `facts` table in more detail.

In [3]:
%%sql
SELECT * FROM facts LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


The `facts` dataset contains various columns referring to countries, their populations etc. Column named:
- name - indicates the country name,
- area - informs about the total land and water area of the country
- population - number of people living in the country,
- population_growth - as a percentage,
- birth_rate - the number of births a year per 1,000 people,
- death_rate - the number of death a year per 1,000 people.

## Population Summary Statistics

In the next step of our investigation we would like to analyze the `population` and `population_growth` columns. We will use summary statistics.

In [4]:
%%sql
SELECT 
    MIN(population) min_pop,
    MAX(population) max_pop,
    MIN(population_growth) min_pop_grwth,
    MAX(population_growth) max_pop_grwth
FROM facts;

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_grwth,max_pop_grwth
0,7256490011,0.0,4.02


We can notice that there are some weird values in our database. The minimum population is 0 and the maximum value looks close to whole population. We will look at every mentioned element in detail.

In [5]:
%%sql
SELECT name, MIN(population) FROM facts

 * sqlite:///factbook.db
Done.


name,MIN(population)
Antarctica,0


The minimum population is on Antarctica, which is rather a continent than a country.

In [6]:
%%sql
SELECT name, MAX(population) FROM facts

 * sqlite:///factbook.db
Done.


name,MAX(population)
World,7256490011


The maximum value in `population` column refers to the whole world population.

In [17]:
%%sql
SELECT name, population FROM facts ORDER BY population DESC LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population
World,7256490011
China,1367485388
India,1251695584
European Union,513949445
United States,321368864
Indonesia,255993674
Brazil,204259812
Pakistan,199085847
Nigeria,181562056
Bangladesh,168957745


The biggest population can be found in China, India, United States, Indonesia, Brazil, Pakistan, Nigeria and Bangladesh. All European Union countries are third on the list of the highest population areas if we do not count World.

## Average Population and Area

In this part we will calculate and explore the average value of population and area columns.

In [8]:
%%sql
SELECT 
    AVG(population) avg_pop,
    AVG(area) avg_area
FROM facts;

 * sqlite:///factbook.db
Done.


avg_pop,avg_area
62094928.32231405,555093.546184739


The average population is over 62 million of people and the average area about 555,000.

## Densely Populated Countries

The next thing we would like to research are the countries for which the `population` is above average and `area` below average.

In [9]:
%%sql
SELECT * FROM facts
WHERE population > (
    SELECT AVG(population)
    FROM facts
)
  AND area < (
    SELECT AVG(area)
    FROM facts
)

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
173,th,Thailand,513120,510890,2230,67976405,0.34,11.19,7.8,0.0
185,uk,United Kingdom,243610,241930,1680,64088222,0.54,12.17,9.35,2.54
192,vm,Vietnam,331210,310070,21140,94348835,0.97,15.96,5.93,0.3


The examples of most densely populated countries are Bangladesh, Germany, Japan, Philippines.

## Water to Land Ratio

In this part we would like to investigate which countries have the highest ratios of water to land and which have more water than land. The division of column `area_water` and `area_land` will indicate the result of the ratio. If the ratio is above 1, it means that the water area in the country is higher than land area. The results below 1 will refer to higher land area.

In [10]:
%%sql
SELECT name, CAST(area_water as float) / CAST(area_land as float) water_land_ratio FROM facts ORDER BY water_land_ratio DESC LIMIT 10;

 * sqlite:///factbook.db
Done.


name,water_land_ratio
British Indian Ocean Territory,905.6666666666666
Virgin Islands,4.520231213872832
Puerto Rico,0.5547914317925592
"Bahamas, The",0.3866133866133866
Guinea-Bissau,0.2846728307254623
Malawi,0.2593962585034013
Netherlands,0.2257103236656536
Uganda,0.2229223744292237
Eritrea,0.1643564356435643
Liberia,0.1562396179401993


The biggest ratio has been found for British Indian Ocean Territory and Virgin Islands. The countries with significant water area are Puerto Rico, the Bahamas, Guinea-Bissau, Malawi, Netherlands, Uganda, Eritrea and Liberia.

## Population Growth

Another thing we would like to investigate are countries which have the highest population growth. 

In [11]:
%%sql
SELECT name, population_growth FROM facts ORDER BY population_growth DESC LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24
Qatar,3.07
Burkina Faso,3.03
Mali,2.98
Cook Islands,2.95
Iraq,2.93


The biggest population growth can be observed in South Sudan and Malawi. We will check how the issue looks like in the reverse case.

In [12]:
%%sql
SELECT name, population_growth FROM facts WHERE population_growth IS NOT NULL ORDER BY population_growth LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population_growth
Holy See (Vatican City),0.0
Cocos (Keeling) Islands,0.0
Greenland,0.0
Pitcairn Islands,0.0
Greece,0.01
Norfolk Island,0.01
Tokelau,0.01
Falkland Islands (Islas Malvinas),0.01
Guyana,0.02
Slovakia,0.02


Very low or almost no population growth appears to be in Vatican, Cocos Islands, Greenland, Greece or Slovakia.

## Birth and Death Rates

Before we come to conclusions we would like to understand the differences in various countries regarding birth and death rates. It could be interesting to know if there are countries where the death rate is higher than the birth rate.

In [13]:
%%sql
SELECT name, birth_rate FROM facts ORDER BY birth_rate DESC LIMIT 10;

 * sqlite:///factbook.db
Done.


name,birth_rate
Niger,45.45
Mali,44.99
Uganda,43.79
Zambia,42.13
Burkina Faso,42.03
Burundi,42.01
Malawi,41.56
Somalia,40.45
Angola,38.78
Mozambique,38.58


The highest birth rates can be found in African countries like Niger, Mali, Uganda etc. Let's see in which countries the death rate is the highest.

In [14]:
%%sql
SELECT name, death_rate FROM facts ORDER BY death_rate DESC LIMIT 10;

 * sqlite:///factbook.db
Done.


name,death_rate
Lesotho,14.89
Ukraine,14.46
Bulgaria,14.44
Guinea-Bissau,14.33
Latvia,14.31
Chad,14.28
Lithuania,14.27
Namibia,13.91
Afghanistan,13.89
Central African Republic,13.8


The highest death rate is Lesotho (Southern Africa), Ukraine (Eastern Europe), Bulgaria (Balkans, Europe). As we can see in the table above, in this case the countries are coming from various continents. However, what about the birth rate in these countries? Now we will check what countries have higher death rate than birth rate and see if any of the highest death_rate countries can be found also there.

In [15]:
%%sql
SELECT name FROM facts WHERE death_rate > birth_rate ORDER BY death_rate DESC LIMIT 10;

 * sqlite:///factbook.db
Done.


name
Ukraine
Bulgaria
Latvia
Lithuania
Russia
Serbia
Belarus
Hungary
Moldova
Estonia


The death rate is higher than birth rate in Ukraine, Bulgaria, Latvia etc. All countries from the list above can be found in Europe. Some countries with the highest death rate have also lesser birth rate (Ukraine). Because I am living in Poland we will check how the issue looks there.

In [16]:
%%sql
SELECT * FROM facts WHERE name = 'Poland';

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46


We can observe that in Poland population growth is small and birth rate is lower than death rate which looks typical for European countries.

# Conclusions
---

In this project we investigated the data coming from the CIA Factbook. We made the following findings:
- the biggest population has been found in China, India, United States, Indonesia, Brazil, Pakistan, Nigeria and Bangladesh,
- all European Union countries are third on the list of the highest population after China and India,
- the average population is over 62 million of people and the average area about 555,000,
- the examples of most densely populated countries are Bangladesh, Germany, Japan, Philippines,
- the countries with significant water area are Puerto Rico, the Bahamas, Guinea-Bissau, Malawi, Netherlands, Uganda, Eritrea and Liberia,
- the biggest population growth can be observed in South Sudan and Malawi,
- very low or almost no population growth appears to be in Vatican, Cocos Islands, Greenland, Greece or Slovakia,
- the highest birth rates can be found in African countries like Niger, Mali, Uganda,
- the highest death rate is in Lesotho, Ukraine, Bulgaria.

The outcomings are based on data from 2015.