# Analysis of CIA Factbook Data Using SQL
----
<p style='text-align: justify;'>
In this project, we'll work with data from the <a href='https://www.cia.gov/library/publications/the-world-factbook/'>CIA World Factbook</a>, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like:
</p>
 - <code>population</code>: The population as of 2015.
 - <code>population_growth</code>: The annual population growth rate, as a percentage.
 - <code>area</code>: The total land and water area.
 
Let's connect our notebook to our database file below.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

We can also query the database to get the name of the table and what it looks like.

Note the need to put <code>%%sql</code> right before the start of the query.

In [9]:
%%sql
    SELECT * 
    FROM sqlite_master 
    WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Let's look at the first five rows of the <code>facts</code> table.

In [12]:
%%sql
    SELECT * 
    FROM facts 
    LIMIT 5; 

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions of some of the columns:
- <code>name</code>: The name of the country.
- <code>area</code>: The total land and sea area of the country.
- <code>population</code>: The country's population.
- <code>population_growth</code>: The country's population growth as a percentage.
- <code>birth_rate</code>: The country's birth rate, or the number of births a year per 1,000 people.
- <code>death_rate</code>: The country's death rate, or the number of death a year per 1,000 people.
- <code>area</code>: The country's total area (both land and water).
- <code>area_land</code>: The country's land area in square kilometers.
- <code>area_water</code>: The country's waterarea in square kilometers.

### Summary Statistics

Let's first calculate some summary statistics while keeping our eyes open for any outlier countries.

In [13]:
%%sql
    SELECT
        MIN(population) as min_population,
        MAX(population) as max_population,
        MIN(population_growth) as min_population_growth,
        MAX(population_growth) as max_population_growth
    FROM facts;

Done.


min_population,max_population,min_population_growth,max_population_growth
0,7256490011,0.0,4.02


<p style='text-align: justify;'>
As we can see, there are suspicious data points wherein a country has <code>0</code> population and another has over <code>7.2</code> billion of people. That's almost the global total (7.7 billion as of 2020)!
</p>

Let's try to find out which countries these are.

In [14]:
%%sql
    SELECT *
    FROM facts
    WHERE population == (SELECT MAX(population) FROM facts)
       OR population == (SELECT MIN(population) FROM facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000.0,,0,,,,
261,xx,World,,,,7256490011,1.08,18.6,7.8,


<p style='text-align: justify;'>
It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook <a href='https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html'>page for Antarctica</a>, which states that there are "no indigenous inhabitants..." in the region except for staffed research stations.
</p>

Let's continue by calculating some averages.

In [16]:
%%sql
    SELECT
        AVG(population) as avg_population,
        AVG(area) as avg_area
    FROM facts;

Done.


avg_population,avg_area
62094928.32231405,555093.546184739


Using these averages, let's find counties that have <b>above average values for population</b> and <b>below average values for area</b>.

In [20]:
%%sql
    SELECT *
    FROM facts
    WHERE population > (SELECT AVG(population) FROM facts)
      AND area < (SELECT AVG(area) FROM facts)
    ORDER BY population DESC;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
192,vm,Vietnam,331210,310070,21140,94348835,0.97,15.96,5.93,0.3
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
173,th,Thailand,513120,510890,2230,67976405,0.34,11.19,7.8,0.0
185,uk,United Kingdom,243610,241930,1680,64088222,0.54,12.17,9.35,2.54


The top three countries by population that also have below average total land and sea area are:
    1. Bangladesh
    2. Japan
    3. Philippines
    
All three of these countries are Asian countries.

### Additional Questions

Let's also try answering the following:
- Which countries have the highest ratios of water to land? Which countries have more water than land?
- Which countries will add the most people to their population next year?
- Which countries have a higher death rate than birth rate?

In [27]:
%%sql
    SELECT *, area_water/area_land as wat_land_rat
    FROM facts
    ORDER BY wat_land_rat DESC
    LIMIT 3;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,wat_land_rat
228,io,British Indian Ocean Territory,54400,60,54340,,,,,,905
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67,4
1,af,Afghanistan,652230,652230,0,32564342.0,2.32,38.57,13.89,1.51,0


<p style='text-align: justify;'>
The top country that has the highest water-to-land ratio is the <b>British Indian Ocean Territory</b> followed by the <b>Virgin Islands</b>.
</p>

In [30]:
%%sql
    SELECT *
    FROM facts
    ORDER BY birth_rate DESC
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
128,ng,Niger,,1266700,300,18045729,3.25,45.45,12.42,0.56
109,ml,Mali,1240192.0,1220190,20002,16955536,2.98,44.99,12.89,2.26
182,ug,Uganda,241038.0,197100,43938,37101745,3.24,43.79,10.69,0.74
194,za,Zambia,752618.0,743398,9220,15066266,2.88,42.13,12.67,0.68
27,uv,Burkina Faso,274200.0,273800,400,18931686,3.03,42.03,11.72,0.0


<p style='text-align: justify;'>
Countries having very high birth rates are <b>Nigeria, Mali, and Uganda</b>. These are all countries located on the African continent.
</p>

In [32]:
%%sql
    SELECT *
    FROM facts
    WHERE death_rate > birth_rate
    ORDER BY death_rate DESC
    LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
183,up,Ukraine,603550,579330,24220,44429471,0.6,10.72,14.46,2.25
26,bu,Bulgaria,110879,108489,2390,7186893,0.58,8.92,14.44,0.29
96,lg,Latvia,64589,62249,2340,1986705,1.06,10.0,14.31,6.26
102,lh,Lithuania,65300,62680,2620,2884433,1.04,10.1,14.27,6.27
143,rs,Russia,17098242,16377742,720500,142423773,0.04,11.6,13.69,1.69


The country with tha highest death rate greater than its birth rate it <b>Ukraine</b> followed by <b>Bulgaria</b>.