## Analyzing CIA Factbook Data Using SQL
### Guided Project - Dataquest

The goal of this project is to analyse data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- `population`: The global population.
- `population_growth`: The annual population growth rate, as a percentage.
- `area`: The total land and water area.

We downloaded the SQLite factbook.db database [here](https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db).

Firstly, we need to install *ipython-sql* to connect our Jupyter Notebook to our database file:

In [1]:
!conda install -yc conda-forge ipython-sql

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Retrieving notices: ...working... done


In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

Let's explore our database:

In [3]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


We found our database has two tables: sqlite_sequence with two columns (name and seq) and facts with 11 columns (id, code, name, area, area_land, area_water, population, population_growth, birth_rate, death_rate, migration_rate)

- `name`: The name of the country.
- `area`: The country's total area (both land and water).
- `area_land`:The country's land area in square kilometers.
- `area_water`: The country's waterarea in square kilometers.
- `population`: The country's population.
- `population_growth`: The country's population growth as a percentage.
- `birth_rate`: The country's birth rate, or the number of births per year per 1,000 people.
- `death_rate`: The country's death rate, or the number of death per year per 1,000 people.

In [4]:
%%sql
SELECT *
 FROM facts
 LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Summary statistics
We are going to start calculating some summary statistics and looking for any outlier countries:
- Minimum population
- Maximum population
- Minimum population growth
- Maximum population growth

In [5]:
%%sql
SELECT MIN(population) AS 'Minimum population', 
    MAX(population) AS 'Maximum population', 
    MIN(population_growth) AS 'Minimum population growth',
    MAX(population_growth) AS 'Maximum population growth'
  FROM facts

 * sqlite:///factbook.db
Done.


Minimum population,Maximum population,Minimum population growth,Maximum population growth
0,7256490011,0.0,4.02


In [6]:
%%sql
SELECT name, population 
  FROM facts
  WHERE population = (SELECT MIN(population) FROM facts)

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0


In [7]:
%%sql
SELECT name, population 
  FROM facts
  WHERE population = (SELECT MAX(population) FROM facts)

 * sqlite:///factbook.db
Done.


name,population
World,7256490011


The table contains a row for world data which explains the population of 7256490011 people (which has no sense thinking in terms of countries).

On the other hand we can observe the Antarctica as a country with a population of 0. As they explained, the Antarctica does not have indigenous inhabitants (just research stations).

In [10]:
%%sql
SELECT MIN(population) AS 'Minimum population', 
    MAX(population) AS 'Maximum population', 
    MIN(population_growth) AS 'Minimum population growth',
    MAX(population_growth) AS 'Maximum population growth'
  FROM facts
  WHERE name != 'World'

 * sqlite:///factbook.db
Done.


Minimum population,Maximum population,Minimum population growth,Maximum population growth
0,1367485388,0.0,4.02


In [9]:
%%sql
SELECT name, population 
  FROM facts
  WHERE name != 'World'
  ORDER BY population DESC
  LIMIT 1;

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


It is China the country with the biggest population (over 1.36 billion)!

Let's see the average population and area and which countries are densely populated (above-average values for population and below-average values for area).

In [11]:
%%sql
SELECT AVG(population) AS 'Average Population', AVG(area) AS 'Average Area'
    FROM facts
    WHERE name <> 'World'

 * sqlite:///factbook.db
Done.


Average Population,Average Area
32242666.56846473,555093.546184739


Countries with above-average values for population and below-average values for area:

In [44]:
%%sql
SELECT name, population, area
 FROM facts
 WHERE name <> 'World' AND (population > (SELECT AVG(population) FROM facts WHERE name <> 'World'))
 ORDER BY population DESC
 LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area
China,1367485388,9596960
India,1251695584,3287263
European Union,513949445,4324782
United States,321368864,9826675
Indonesia,255993674,1904569
Brazil,204259812,8515770
Pakistan,199085847,796095
Nigeria,181562056,923768
Bangladesh,168957745,148460
Russia,142423773,17098242


In [48]:
%%sql
SELECT name, population, area
 FROM facts
 WHERE name <> 'World' AND (area < (SELECT AVG(area) FROM facts WHERE name <> 'World'))
 ORDER BY area
 LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area
Holy See (Vatican City),842.0,0
Monaco,30535.0,2
Coral Sea Islands,,3
Ashmore and Cartier Islands,,5
Navassa Island,,5
Spratly Islands,,5
Clipperton Island,,6
Gibraltar,29258.0,6
Wake Island,,6
Paracel Islands,,7


In [39]:
%%sql
SELECT name, population, area, CAST(population AS Float) / area AS density
 FROM facts
 WHERE name <> 'World' 
      AND (area < (SELECT AVG(area) FROM facts WHERE name <> 'World')) 
      AND (population > (SELECT AVG(population) FROM facts WHERE name <> 'World'))
 ORDER BY density DESC

 * sqlite:///factbook.db
Done.


name,population,area,density
Bangladesh,168957745,148460,1138.0691432035565
"Korea, South",49115196,99720,492.531046931408
Philippines,100998376,300000,336.6612533333333
Japan,126919659,377915,335.8418136353413
Vietnam,94348835,331210,284.8610700160019
United Kingdom,64088222,243610,263.0771396904889
Germany,80854408,357022,226.4689795026637
Italy,61855120,301340,205.26687462666757
Uganda,37101745,241038,153.92487906471175
Thailand,67976405,513120,132.47662340193327


### Which countries have the highest population/area ratio, and how does it compare to list we found in the previous screen?
We checked countries with above-average values for population and below-average values for area but we can check density directly with all countries to find very small and densely populated countries with less population than the average.
We can see the list changes and now it is fulfilled of city-states, islands as most densely populated countries, which makes sense.

In [50]:
%%sql
SELECT name, population, area, CAST(population AS Float) / area AS density
 FROM facts
 WHERE name <> 'World'
 ORDER BY density DESC
 LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area,density
Macau,592731,28,21168.964285714286
Monaco,30535,2,15267.5
Singapore,5674472,697,8141.279770444763
Hong Kong,7141106,1108,6445.041516245487
Gaza Strip,1869055,360,5191.819444444444
Gibraltar,29258,6,4876.333333333333
Bahrain,1346613,760,1771.8592105263158
Maldives,393253,298,1319.6409395973155
Malta,413965,316,1310.01582278481
Bermuda,70196,54,1299.925925925926


### Which country has the most people?

In [52]:
%%sql
SELECT name, populatibon
    FROM facts
    WHERE population = (SELECT MAX(population) FROM facts WHERE name<>'World')

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


### Which country has the highest growth rate?

In [53]:
%%sql
SELECT name, population_growth
    FROM facts
    WHERE population_growth = (SELECT MAX(population_growth) FROM facts WHERE name<>'World')

 * sqlite:///factbook.db
Done.


name,population_growth
South Sudan,4.02


### Which countries have the highest ratios of water to land?

In [54]:
%%sql
SELECT name, area, area_land, area_water, CAST(area_water AS Float) / area AS area_water_ratio
    FROM facts
    WHERE name<>'World'
    ORDER BY area_water_ratio DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,area,area_land,area_water,area_water_ratio
British Indian Ocean Territory,54400,60,54340,0.9988970588235294
Virgin Islands,1910,346,1564,0.818848167539267
Puerto Rico,13791,8870,4921,0.3568269161047059
"Bahamas, The",13880,10010,3870,0.2788184438040346
Guinea-Bissau,36125,28120,8005,0.2215916955017301
Malawi,118484,94080,24404,0.2059687383950575
Netherlands,41543,33893,7650,0.1841465469513516
Uganda,241038,197100,43938,0.1822866104099768
Eritrea,117600,101000,16600,0.141156462585034
Liberia,111369,96320,15049,0.135127369375679


### Which countries have more water than land?

In [55]:
%%sql
SELECT name, area, area_land, area_water
    FROM facts
    WHERE name<>'World' AND area_land < area_water
    ORDER BY area_water DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,area,area_land,area_water
British Indian Ocean Territory,54400,60,54340
Virgin Islands,1910,346,1564


### Which countries will add the most people to their populations next year?

In [56]:
%%sql
SELECT name, population, population_growth, population_growth*population/100 AS New_people
    FROM facts
    WHERE name<>'World'
    ORDER BY New_people DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,population_growth,New_people
India,1251695584,1.22,15270686.1248
China,1367485388,0.45,6153684.246
Nigeria,181562056,2.45,4448270.372
Pakistan,199085847,1.46,2906653.3662
Ethiopia,99465819,2.89,2874562.1691
Bangladesh,168957745,1.6,2703323.92
United States,321368864,0.78,2506677.1392
Indonesia,255993674,0.92,2355141.8008000003
"Congo, Democratic Republic of the",79375136,2.45,1944690.832
Philippines,100998376,1.61,1626073.8536


### Which countries have a higher death rate than birth rate?

In [59]:
%%sql
SELECT name, birth_rate, death_rate, round(death_rate-birth_rate,2) AS dif
    FROM facts
    WHERE name<>'World' AND death_rate > birth_rate
    ORDER BY dif DESC
    LIMIT 10;

 * sqlite:///factbook.db
Done.


name,birth_rate,death_rate,dif
Bulgaria,8.92,14.44,5.52
Serbia,9.08,13.66,4.58
Latvia,10.0,14.31,4.31
Lithuania,10.1,14.27,4.17
Ukraine,10.72,14.46,3.74
Hungary,9.16,12.73,3.57
Germany,8.47,11.42,2.95
Slovenia,8.42,11.37,2.95
Romania,9.14,11.9,2.76
Croatia,9.45,12.18,2.73
