# Analyzing CIA Factbook Data Using SQL
---

This CIA World Factbook is a collection of statistics and data about every country on Earth.  It contains demographic, geographic, ecomomic, and government related data.  

In this project we will use SQL do analyze the data from the database.


Here are the descriptions for some of the columns:

- ```name``` — the name of the country.
- ```area``` — the country's total area (both land and water) in square kilometers
- ```area_land``` — the country's land area in square kilometers.
- ```area_water``` — the country's waterarea in square kilometers.
- ```population``` — the country's population.
- ```population_growth``` — the country's population growth as a percentage.
- ```birth_rate``` — the country's birth rate, or the number of births per year per 1,000 people.
- ```death_rate``` — the country's death rate, or the number of death per year per 1,000 people.

In [1]:
#!conda install -yc conda-forge ipython-sql

### Prerequisite

In [2]:
%load_ext sql
%sql sqlite:///factbook.db

'Connected: @factbook.db'

In [3]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


### Analysis

In [4]:
%%sql
SELECT * FROM facts LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


#### 1. Summary Statistics

In [5]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) max_pop_growth 
FROM facts;

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


As we can see here, there are two points that might raise concern. The first is there is a country which have 0 population and there is also a country with 7 billion population. These two points can be considered as outliers

#### 2. Exploring Outliers

In [6]:
%%sql
SELECT name, population 
    FROM facts
    WHERE population=(SELECT MIN(population) FROM facts) OR
    population=(SELECT MAX(population) FROM facts);

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0
World,7256490011


After searching further, we found that the respective countries which we consider as outliers are Antartica and the World. Thus in further analysis, we are going to exclude them.

#### 3. Summary Statistics Without Outliers

In [7]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) max_pop_growth 
 FROM facts
WHERE name <> 'Antarctica'
  AND name <> 'World';

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
48,1367485388,0.0,4.02


After excluding the two outliers, we found that the minimum population for a country at the time is 48 and the maximum is 1.3 billion. The minimum population growth is 0 and the maximum is 4.02

#### 3. Finding Average Population and Area

In [8]:
%%sql
SELECT ROUND(AVG(population),3) AS Ave_pop,
       ROUND(AVG(area),3) AS Ave_area
    FROM facts
    WHERE name <> 'Antarctica'
      AND name <> 'World';

 * sqlite:///factbook.db
Done.


Ave_pop,Ave_area
32377011.013,555093.546


Based on the calculation above, we can see that the average population is around 32.3 million with the average area for a country being 555,093 squared kilometres.

#### 4. Densest Country

In [9]:
%%sql
SELECT name,population,area,
    ROUND(CAST(population AS Float)/CAST(area AS FLoat),3) as 'density'
    FROM facts
    WHERE population>(SELECT AVG(population) FROM facts 
                      WHERE name <> 'Antarctica'
                        AND name <> 'World')
      AND area<(SELECT AVG(area) FROM facts 
                      WHERE name <> 'Antarctica'
                        AND name <> 'World')
    AND (name <> 'Antarctica' OR name <> 'World')
    ORDER BY density DESC;

 * sqlite:///factbook.db
Done.


name,population,area,density
Bangladesh,168957745,148460,1138.069
"Korea, South",49115196,99720,492.531
Philippines,100998376,300000,336.661
Japan,126919659,377915,335.842
Vietnam,94348835,331210,284.861
United Kingdom,64088222,243610,263.077
Germany,80854408,357022,226.469
Italy,61855120,301340,205.267
Uganda,37101745,241038,153.925
Thailand,67976405,513120,132.477


Here we are trying to find countries which have below average area and above average population. The countries above are those who meets our criteria.

In [10]:
%%sql
SELECT name,population,area,
    ROUND(CAST(population AS Float)/CAST(area AS FLoat),3) as 'density'
    FROM facts
    WHERE name <> 'Antarctica' AND name <> 'World'
    ORDER BY density DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area,density
Macau,592731,28,21168.964
Monaco,30535,2,15267.5
Singapore,5674472,697,8141.28
Hong Kong,7141106,1108,6445.042
Gaza Strip,1869055,360,5191.819
Gibraltar,29258,6,4876.333
Bahrain,1346613,760,1771.859
Maldives,393253,298,1319.641
Malta,413965,316,1310.016
Bermuda,70196,54,1299.926


That being said, if we only account on the density of the country without setting a criteria where the population needs to be greater than average and the area needs to be lower than average, we can see that Macau is actually the densest country in the world with 21,169 people/sq.kilometre followed by Monaco in second with 15,268 people/sq.kilometre

#### 5. Most Populated Country and Fastest Growing Country

In [11]:
%%sql
SELECT name,population
    FROM facts
    WHERE population=(SELECT max(population) FROM facts 
                      WHERE name <> 'Antarctica'
                        AND name <> 'World')
    AND (name <> 'Antarctica' OR name <> 'World');

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


It is no surprise that China is the most populated country on earth with over 1.3 billion inhabitants. What about the fastest growing country?

In [12]:
%%sql
SELECT name,population_growth
    FROM facts
    WHERE population_growth=(SELECT max(population_growth) FROM facts 
                      WHERE name <> 'Antarctica'
                        AND name <> 'World')
    AND (name <> 'Antarctica' OR name <> 'World');

 * sqlite:///factbook.db
Done.


name,population_growth
South Sudan,4.02


South Sudan is the fastest growing country at the time the data was gathered with 4.02 percent

#### 6. Water-Land Ratio

In [13]:
%%sql
SELECT name, area_water,area_land,
    ROUND(CAST(area_water as float)/CAST(area_land as float),2) AS 'WL_ratio'
    FROM facts
    WHERE name <> 'Antarctica' OR name <> 'World'
    ORDER BY WL_ratio DESC
    LIMIT 10
    ;

 * sqlite:///factbook.db
Done.


name,area_water,area_land,WL_ratio
British Indian Ocean Territory,54340,60,905.67
Virgin Islands,1564,346,4.52
Puerto Rico,4921,8870,0.55
"Bahamas, The",3870,10010,0.39
Guinea-Bissau,8005,28120,0.28
Malawi,24404,94080,0.26
Netherlands,7650,33893,0.23
Uganda,43938,197100,0.22
Eritrea,16600,101000,0.16
Liberia,15049,96320,0.16


After conducting some analysis, we can clearly see that the British Indian Ocean Territory has the biggest Water to Land ratio by far with 905 to 1. Second is the Virgin Islands with 4 to 1. Now we are going to find countries which have more water than land

In [14]:
%%sql
SELECT name, area_water,area_land
    FROM facts
    WHERE (name <> 'Antarctica' OR name <> 'World')
    AND area_water>area_land
    ORDER BY area_water DESC
    LIMIT 10
    ;

 * sqlite:///factbook.db
Done.


name,area_water,area_land
British Indian Ocean Territory,54340,60
Virgin Islands,1564,346


Those two countries mentioned above are the only two countries which have more Water Area than Land Mass

#### 7. Which countries will add the most people to their populations next year?

In [15]:
%%sql
SELECT name,population,population_growth,
    CAST(population_growth/100*population as int) AS "population_increase"
    FROM facts
    WHERE name <> 'Antarctica' AND name <> 'World'
    ORDER BY population_increase DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,population_growth,population_increase
India,1251695584,1.22,15270686
China,1367485388,0.45,6153684
Nigeria,181562056,2.45,4448270
Pakistan,199085847,1.46,2906653
Ethiopia,99465819,2.89,2874562
Bangladesh,168957745,1.6,2703323
United States,321368864,0.78,2506677
Indonesia,255993674,0.92,2355141
"Congo, Democratic Republic of the",79375136,2.45,1944690
Philippines,100998376,1.61,1626073


Based on the simple calculation where we multiply the percentage population growth with each country's current population, we can see that next year, India will have the greatest population increase with 15 million people followed by China in second with 6 million.

#### 8. Countries With Higher Mortality Rate than Birth Rate

In [16]:
%%sql
SELECT name,birth_rate,death_rate,
    ROUND(death_rate-birth_rate,2) as "net_difference"
    FROM facts
    WHERE (name <> 'Antarctica' AND name <> 'World')
    AND death_rate>birth_rate
    ORDER BY net_difference DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,birth_rate,death_rate,net_difference
Bulgaria,8.92,14.44,5.52
Serbia,9.08,13.66,4.58
Latvia,10.0,14.31,4.31
Lithuania,10.1,14.27,4.17
Ukraine,10.72,14.46,3.74
Hungary,9.16,12.73,3.57
Germany,8.47,11.42,2.95
Slovenia,8.42,11.37,2.95
Romania,9.14,11.9,2.76
Croatia,9.45,12.18,2.73


As we can see from the table above, Bulgaria have the greatest difference between mortality rate and birth_rate. Serbia is in second followed by Latvia in third. It seems like countries in the Balkan region suffers from higher mortality rate compared to birth rate.

### Conclusion

After conducting this project, there are a few things that we can conclude. Those are:
1. China is the most populated country with over 1.3 billion people
2. South Sudan is the fastest growing country with 4.02% per year
3. The average population is around 32.3 million with the average area for a country being 555,093 squared kilometres.
4. Macau is the most densely populated country
5. The British Indian Ocean Territory is the country with the largest water to land ratio (905.67:1)
6. India will have the greatest population increase
7. Bulgaria has the largest difference between mortality rate and birth rate