# Project_ Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like:
* `population` - The population as of 2015.
* `population_growth` - The annual population growth rate, as a percentage.
* `area` - The total land and water area.

In [1]:
%%capture
## initialize and load database

%load_ext sql
%sql sqlite:///factbook.db

In [2]:
sql

 * sqlite:///factbook.db


'Connected: @factbook.db'

## 1. Overview of the Data
We'll begin by getting a sense of what the data looks like.

In [3]:
%%sql

SELECT * 
FROM sqlite_master
WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [4]:
%%sql

SELECT *
FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions for some of the columns:
* `name` - The name of the country.
* `area`- The country's total area (both land and water).
* `area_land` - The country's land area in [square kilometers](https://www.cia.gov/library/publications/the-world-factbook/rankorder/2147rank.html).
* `area_water` - The country's waterarea in square kilometers.
* `population` - The country's population.
* `population_growth` - The country's population growth as a percentage.
* `birth_rate` - The country's birth rate, or the number of births a year per 1,000 people.
* `death_rate` - The country's death rate, or the number of death a year per 1,000 people.

## 2. summary statistics

In [5]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth 
    FROM facts;

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


A few things stick out from the summary statistics in the last screen:

* There's a country with a population of 0
* There's a country with a population of 7256490011 (or more than 7.2 billion people)

Let's use subqueries to zoom in on just these countries without using the specific values.

## 3. Exploring Outliers


In [6]:
%%sql

SELECT *
FROM facts
WHERE population ==(SELECT MIN(population)
                    FROM facts
                    );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


Antarctica has 0 population which accord with the information CIA Factbook [page for Antarctica](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html).

In [7]:
%%sql

SELECT *
FROM facts
WHERE population ==(SELECT MAX(population)
                    FROM facts
                    );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


We can see that world row contains over 7.2 million population. When we doing further summary statistics, we need to exclude the row for the whole world.

In [8]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth 
FROM facts
WHERE name NOT IN ('World');

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,1367485388,0.0,4.02


In [9]:
%%sql
SELECT *
FROM facts
WHERE population==(SELECT MAX(population)
       FROM facts
       WHERE name NOT IN ('World')
      );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


We can find China has the largest population in the world, which over 1.3 billion people.

## 4. Exploring Average Population and Area
Let's explore density. Density depends on the population and the country's area. Let's look at the average values for these two columns.

We should take care of discarding the row for the whole planet.

In [10]:
%%sql

SELECT AVG(population) AS avg_pop,
       AVG(area) AS avg_area
FROM facts
WHERE name NOT IN ('World');

 * sqlite:///factbook.db
Done.


avg_pop,avg_area
32242666.56846473,555093.546184739


We find that the average population is around 32 million and the average area is 555 thousand square kilometers.

## 5. Finding Densely Populated Countries

To finish, we'll build on the query above to find countries that are densely populated. We'll identify countries that have:

1. Above average values for population.
2. Below average values for area.

In [11]:
%%sql

SELECT *
FROM facts
WHERE population > (SELECT AVG(population)
                     FROM facts
                     WHERE name NOT IN ('World'))
  AND area < (SELECT AVG(area) 
               FROM facts
               WHERE name NOT IN ('World'));

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


We can see from above result that there are 14 countries which have population more than average population and area less than average area, these countires are also known for densely populated countries which supports our result.

## 6. Country with the highest population and highest growth rate


In [12]:
%%sql

SELECT name AS Highest_Population_Country, population
FROM facts
WHERE population == (SELECT MAX(population)
                     FROM facts
                     WHERE name NOT IN ('World')
                    );

 * sqlite:///factbook.db
Done.


Highest_Population_Country,population
China,1367485388


In [13]:
%%sql

SELECT name, population_growth AS Highest_Growth_Rate 
FROM facts
WHERE population_growth == (SELECT MAX(population_growth)
                            FROM facts
                            WHERE name <> 'World'
                           );

 * sqlite:///factbook.db
Done.


name,Highest_Growth_Rate
South Sudan,4.02


## 7. Estimate top 5 countries population next year

In [14]:
%%sql

SELECT name, population, population_growth, 
       CAST(population * population_growth AS INTEGER) AS Population_Next_Year
FROM facts
WHERE name <> 'World'
ORDER BY Population_Next_Year DESC
LIMIT 5;

 * sqlite:///factbook.db
Done.


name,population,population_growth,Population_Next_Year
India,1251695584,1.22,1527068612
China,1367485388,0.45,615368424
Nigeria,181562056,2.45,444827037
Pakistan,199085847,1.46,290665336
Ethiopia,99465819,2.89,287456216


We can find that India is the country where highest popuation will add up next year and China despite of highest populated country but with lowest growth rate which was casused by [Family Planning Policy](https://en.wikipedia.org/wiki/Family_planning_policy) in China.

## 8. Country with Highest Water to land ratio

In [15]:
%%sql
SELECT name, area_water, area_land, CAST(area_water/area_land AS INTEGER) AS ratio_water_to_land
FROM facts
ORDER BY ratio_water_to_land DESC
LIMIT 5;

 * sqlite:///factbook.db
Done.


name,area_water,area_land,ratio_water_to_land
British Indian Ocean Territory,54340,60,905
Virgin Islands,1564,346,4
Afghanistan,0,652230,0
Albania,1350,27398,0
Algeria,0,2381741,0


We can find British Indian Territory (BIOT) has the highest water to land ratio. The second one is Virgin Islands, ratio is 4:1. These two countries have more water land than land. All other countires have land area more than water area.

## 9. countries with the higher death rate than birth rate

In [16]:
%%sql

SELECT name, death_rate, birth_rate
FROM facts
WHERE death_rate > birth_rate
ORDER BY death_rate DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,death_rate,birth_rate
Ukraine,14.46,10.72
Bulgaria,14.44,8.92
Latvia,14.31,10.0
Lithuania,14.27,10.1
Russia,13.69,11.6
Serbia,13.66,9.08
Belarus,13.36,10.7
Hungary,12.73,9.16
Moldova,12.59,12.0
Estonia,12.4,10.51


The above table shows the top 10 countries with higher death rate than birth rate.

## 10. Countries have the highest population/area ratio

In [17]:
%%sql

SELECT name, population,area, CAST(population/area AS INTEGER) AS population_density
FROM facts
ORDER BY population_density DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area,population_density
Macau,592731,28,21168
Monaco,30535,2,15267
Singapore,5674472,697,8141
Hong Kong,7141106,1108,6445
Gaza Strip,1869055,360,5191
Gibraltar,29258,6,4876
Bahrain,1346613,760,1771
Maldives,393253,298,1319
Malta,413965,316,1310
Bermuda,70196,54,1299


The above table shows the top 10 countries have highest population density in the world.