# Guided Project: Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

`population` — the global population.
`population_growth` — the annual population growth rate, as a percentage.
`area` — the total land and water area.



## 2. Introduction
We'll use the following code to connect our Jupyter Notebook to our database file:

`%%capture`

`%load_ext sql`

`%sql sqlite:///factbook.db`

You should add this code block as the first cell in your notebook.

In [10]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

## 3. Overview of the Data

1. Write a query to return information on the tables in the database.
2. In a different code cell, write and run another query that returns the first five rows of the facts table in the database.

This query gives information on the tables inside the database. 

In [11]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [12]:
%%sql

SELECT *
FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## 4. Summary Statistics
Let's start by calculating some summary statistics and look for any outlier countries.

In [13]:
%%sql

SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth
FROM facts;

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


## 5. Exploring Outliers

We see a few interesting things in the summary statistics on the previous screen:

- There's a country with a `population` of 0
- There's a country with a `population` of 7256490011 (or more than 7.2 billion people)

Let's use subqueries to zoom in on just these countries without using the specific values.

In [14]:
%%sql

SELECT *
    FROM facts
    WHERE population == (SELECT MIN(population)
                                FROM facts
                                );




 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


In [15]:
%%sql

SELECT *
    FROM facts
    WHERE population == (SELECT MAX(population)
                                FROM facts
                                );




 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


## 6. Exploring Average Population Area

It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0. 

Now that we know this, we should recalculate the summary statistics we calculated earlier — this time excluding the row for the whole world.


Recompute the summary statistics you found earlier while excluding the row for the whole world. Include the following:

- Minimum population
- Maximum population
- Minimum population growth
- Maximum population growth

In a different code cell, calculate the average value for the following columns:

- `population`
- `area`

First re-visiting summary statistics

In [16]:

%%sql
SELECT MIN(population) AS min_pop, 
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth 
FROM facts
 WHERE name <> 'World';

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,1367485388,0.0,4.02


Finding out the average for population and area, exlude Wolrd


In [17]:
%%sql

SELECT AVG(population) AS avg_population, AVG(area) AS avg_area
  FROM facts
 WHERE name <> 'World';



 * sqlite:///factbook.db
Done.


avg_population,avg_area
32242666.56846473,555093.546184739


## 7. Finding Densly Populated Countries

To finish, we'll build on the query we wrote for the previous screen to find countries that are densely populated. We'll identify countries that have the following:

- Above-average values for population.
- Below-average values for area.

In [18]:
%%sql

SELECT *
FROM facts

WHERE population > (SELECT AVG(population) AS avg_population
                        FROM facts
                        WHERE name <> 'World')
                        
AND area < (SELECT AVG(area) AS avg_area
                        FROM facts
                        WHERE name <> 'World');


 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


## EXTRAS

Which country has the most people? Which country has the highest growth rate?


In [32]:
%%sql

SELECT name, 
       MAX(population) AS max_pop
FROM facts
 WHERE name <> 'World';


 * sqlite:///factbook.db
Done.


name,max_pop
China,1367485388


In [33]:
%%sql

SELECT name, 
       MAX(population_growth) AS max_pop_growth
FROM facts
 WHERE name <> 'World';


 * sqlite:///factbook.db
Done.


name,max_pop_growth
South Sudan,4.02


Which countries have the highest ratios of water to land? Which countries have more water than land?

In [39]:
%%sql

SELECT *
FROM facts

WHERE 'area_water'>'area_land'

LIMIT 10;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46
6,ac,Antigua and Barbuda,442,442,0,92436,1.24,15.85,5.69,2.21
7,ar,Argentina,2780400,2736690,43710,43431886,0.93,16.64,7.33,0.0
8,am,Armenia,29743,28203,1540,3056382,0.15,13.61,9.34,5.8
9,as,Australia,7741220,7682300,58920,22751014,1.07,12.15,7.14,5.65
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56


In [45]:
%%sql

SELECT *
FROM facts

ORDER BY (area_water/area_land) DESC

LIMIT 10;



 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
228,io,British Indian Ocean Territory,54400,60,54340,,,,,
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67
1,af,Afghanistan,652230,652230,0,32564342.0,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278.0,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166.0,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580.0,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353.0,2.78,38.78,11.49,0.46
6,ac,Antigua and Barbuda,442,442,0,92436.0,1.24,15.85,5.69,2.21
7,ar,Argentina,2780400,2736690,43710,43431886.0,0.93,16.64,7.33,0.0
8,am,Armenia,29743,28203,1540,3056382.0,0.15,13.61,9.34,5.8
