# Analyzing CIA Factbook Data Using SQL
In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all the countries on Earth.
The Factbook contains demographic information like the following:
* name — the name of the country.
* area— the country's total area (both land and water).
* area_land — the country's land area in square kilometers.
* area_water — the country's waterarea in square kilometers.
* population — the country's population.
* population_growth— the country's population growth as a percentage.
* birth_rate — the country's birth rate, or the number of births per year per 1,000 people.
* death_rate — the country's death rate, or the number of death per year per 1,000 people.


In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

In [2]:
%%sql
select *
from facts
limit 2;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3


### Overview of the Data

In [3]:
%%sql

select *
from sqlite_master
where type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [4]:
%%sql
select *
from facts
limit 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Summary Statistics
We'll start by calculating some summary statistics and look for any outlier countries.

In [5]:
%%sql
select min(population) as min_population,
       max(population) as max_population,
       min(population_growth) as min_pop_growth,
       max(population_growth) as max_pop_growth
from facts;

 * sqlite:///factbook.db
Done.


min_population,max_population,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


### Exploring Outliers
From the above query, we see that:
* There's a country with a population of 0, and
* There's a country with a population of 7256490011 (or more than 7.2 billion people)

We'll zoom into these countries

In [6]:
%%sql
select name, min(population) as min_population
from facts;

 * sqlite:///factbook.db
Done.


name,min_population
Antarctica,0


In [7]:
%%sql
select name, max(population) as max_population
from facts;

 * sqlite:///factbook.db
Done.


name,max_population
World,7256490011


The table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0.

We'll recompute the summary statistics you found earlier while excluding the row for the whole world.

In [8]:
%%sql
select min(population) as min_population,
       max(population) as max_population,
       min(population_growth) as min_pop_growth,
       max(population_growth) as max_pop_growth
from facts
where name != 'World';

 * sqlite:///factbook.db
Done.


min_population,max_population,min_pop_growth,max_pop_growth
0,1367485388,0.0,4.02


### Exploring Average Population and Area
We'll explore density. Density depends on the population and the country's area. We'll look at the average values for these two columns.

We will discard the row for the whole planet.

### Finding Densely Populated
We'll build on the query above to find countries that are densely populated. We'll identify countries that have the following:
* Above-average values for population.
* Below-average values for area.

In [57]:
%%sql
select name, population, area
from facts
where name != 'World'
and population > (select avg(population)
                    from facts
                    where population < (select max(population)
                                        from facts))
and area < (select avg(area)
            from facts
            where population < (select max(population)
                                from facts));

 * sqlite:///factbook.db
Done.


name,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Iraq,37056169,438317
Italy,61855120,301340
Japan,126919659,377915
Kenya,45925301,580367
"Korea, South",49115196,99720
Morocco,33322699,446550
Philippines,100998376,300000
Poland,38562189,312685


### Next Steps
Here are some next steps to explore:
* Which country has the most people? Which country has the highest growth rate?
* Which countries have the highest ratios of water to land? Which countries have more water than land?
* Which countries will add the most people to their populations next year?
* Which countries have a higher death rate than birth rate?
* Which countries have the highest population/area ratio, and how does it compare to list we found in the previous screen?

Which country has the most people?

In [13]:
%%sql
select name, population
from facts
where population = (select max(population)
                    from facts
                    where name != 'World');

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


From the query above, China is the country with the most people, with a population in excess of 1.3 billion people.

In [18]:
%%sql
-- Which country has the highest growth rate?
select name, population, population_growth
from facts
where population_growth = (select max(population_growth)
                    from facts
                    where name != 'World');

 * sqlite:///factbook.db
Done.


name,population,population_growth
South Sudan,12042910,4.02


South Sudan is the country with the highest population growth rate, i.e. 4%. The query below results in the same answer.

In [17]:
%%sql
-- same as query above
select name, max(population_growth)
from facts;

 * sqlite:///factbook.db
Done.


name,max(population_growth)
South Sudan,4.02


In [24]:
%%sql
-- Which countries have the highest ratios of water to land?
select name, round(cast (area_water as float) / cast(area_land as float), 3) as water_to_land_ratios
from facts
order by water_to_land_ratios desc
limit 10;

 * sqlite:///factbook.db
Done.


name,water_to_land_ratios
British Indian Ocean Territory,905.667
Virgin Islands,4.52
Puerto Rico,0.555
"Bahamas, The",0.387
Guinea-Bissau,0.285
Malawi,0.259
Netherlands,0.226
Uganda,0.223
Eritrea,0.164
Liberia,0.156


There are 2 countries with significant water to land ratios:
* British Indian Ocean Territory at 905, and
* The Virgin Islands at 4.52
* Other water to land ratios are negligible.

We will conduct further analysis to find out which Virgin Islands this is.

In [49]:
%%sql
select name, round(cast (area_water as float) / cast(area_land as float), 3) as water_to_land_ratios
from facts
where name like '%Virgin Islands%'

 * sqlite:///factbook.db
Done.


name,water_to_land_ratios
British Virgin Islands,0.0
Virgin Islands,4.52


Since the British Virgin Islands exist in the dataset, the Virgin Islands are the US Virgin Islands.

In [23]:
%%sql
-- Which countries have more water than land?
select name, area_water, area_land
from facts
where area_water > area_land
limit 10;

 * sqlite:///factbook.db
Done.


name,area_water,area_land
British Indian Ocean Territory,54340,60
Virgin Islands,1564,346


The top 2 countries with the highest water to land ratios are the countries with more water than land.

In [51]:
%%sql
-- Which countries will add the most people to their populations next year?
-- Using population increase
select name, birth_rate, migration_rate, population_growth, round(birth_rate + migration_rate, 2) as population_increase
from facts
where name != 'World'
order by population_increase desc
limit 10;

 * sqlite:///factbook.db
Done.


name,birth_rate,migration_rate,population_growth,population_increase
Somalia,40.45,8.49,1.83,48.94
South Sudan,36.91,11.47,4.02,48.38
Mali,44.99,2.26,2.98,47.25
Niger,45.45,0.56,3.25,46.01
Uganda,43.79,0.74,3.24,44.53
American Samoa,22.89,21.13,0.3,44.02
Sao Tome and Principe,34.23,8.63,1.84,42.86
Zambia,42.13,0.68,2.88,42.81
Burkina Faso,42.03,0.0,3.03,42.03
Burundi,42.01,0.0,3.28,42.01


In [52]:
%%sql
-- Which countries will add the most people to their populations next year?
-- Using population growth
select name, birth_rate, migration_rate, population_growth, round(birth_rate + migration_rate, 2) as population_increase
from facts
where name != 'World'
order by population_growth desc
limit 10;

 * sqlite:///factbook.db
Done.


name,birth_rate,migration_rate,population_growth,population_increase
South Sudan,36.91,11.47,4.02,48.38
Malawi,41.56,0.0,3.32,41.56
Burundi,42.01,0.0,3.28,42.01
Niger,45.45,0.56,3.25,46.01
Uganda,43.79,0.74,3.24,44.53
Qatar,9.84,22.39,3.07,32.23
Burkina Faso,42.03,0.0,3.03,42.03
Mali,44.99,2.26,2.98,47.25
Cook Islands,14.33,,2.95,
Iraq,31.45,1.62,2.93,33.07


We've considered 2 measures of assessing which countries will add the most people to their populations next year. Population increase adds birth and migration rate will population growth calculates the difference in birth and death rates divided by number of years.
* Using population increase, Somalia, South Sudan, Mali, Niger and Uganda will add the most to their population
* Using population growth rate, South Sudan, Malawi, Burundi, Niger and Uganda will add the most to their population

Generally, Somalia, Niger and Uganda will add the most to their population in the following year.
From our assessment, most of the additions will be by birth more than by migration.
A majority of the top 10 countries from this query are African countries.

In [53]:
%%sql
-- Which countries have a higher death rate than birth rate?
select name, birth_rate, death_rate, round(death_rate - birth_rate, 2) as difference
from facts
where name != 'World'
and death_rate > birth_rate
order by death_rate desc
limit 10;

 * sqlite:///factbook.db
Done.


name,birth_rate,death_rate,difference
Ukraine,10.72,14.46,3.74
Bulgaria,8.92,14.44,5.52
Latvia,10.0,14.31,4.31
Lithuania,10.1,14.27,4.17
Russia,11.6,13.69,2.09
Serbia,9.08,13.66,4.58
Belarus,10.7,13.36,2.66
Hungary,9.16,12.73,3.57
Moldova,12.0,12.59,0.59
Estonia,10.51,12.4,1.89


In [54]:
%%sql
-- Which countries have a higher death rate than birth rate?
select name, birth_rate, death_rate, round(death_rate - birth_rate, 2) as difference
from facts
where name != 'World'
and death_rate > birth_rate
order by difference desc
limit 10;

 * sqlite:///factbook.db
Done.


name,birth_rate,death_rate,difference
Bulgaria,8.92,14.44,5.52
Serbia,9.08,13.66,4.58
Latvia,10.0,14.31,4.31
Lithuania,10.1,14.27,4.17
Ukraine,10.72,14.46,3.74
Hungary,9.16,12.73,3.57
Germany,8.47,11.42,2.95
Slovenia,8.42,11.37,2.95
Romania,9.14,11.9,2.76
Croatia,9.45,12.18,2.73


Most of the countries with deat rates higher than birth rates are Eastern European countries.

In [58]:
%%sql
-- Which countries have have the highest population/area ratio, and how does it compare to list of densely populated countries?
select name, round(cast(population as float)/area, 3) as pop_area_ratio
from facts
where name != 'World'
order by pop_area_ratio desc
limit 10;

 * sqlite:///factbook.db
Done.


name,pop_area_ratio
Macau,21168.964
Monaco,15267.5
Singapore,8141.28
Hong Kong,6445.042
Gaza Strip,5191.819
Gibraltar,4876.333
Bahrain,1771.859
Maldives,1319.641
Malta,1310.016
Bermuda,1299.926


The countries with a high population to area ratio are different from those with above average population and below average area.