In this project, I'll be working with data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/ "CIA World Factbook"), a compendium of statistics about all of the countries on Earth. The factbook contains demographic information like the following: 
- **population**: the global population 
- **population_growth** - the annual population growth rate, as a percentage. 
- **area** - the total land and water area. 

I'll use SQL in jupyter Notebook to analyze data from this database. 

The code below is used to connect jupyter notebook to the database file:

In [9]:


%%capture
%load_ext sql
%sql sqlite:///factbook.db



'Connected: None@factbook.db'

To run queries in the project, we add %%sql on its own line to the start of our query. So to execute the query above, well use the following code,


## Overview of the Data 

I'll begin by getting a sense of what the data looks like. 

In [10]:
%%sql

SELECT *
  FROM sqlite_master
 WHERE type='table';

# query to give information about the different tables

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [11]:
%%sql 

Select * 
from facts 
limit 5; 

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the description for some of the columns:

* name - The name of the country. 
* area - The total land and sea area of the country. 
* area_land - the country's land area in square kilometers. 
* area_water the country's waterarea in square kilometers.
* population - the country's population. 
* population_growth - The country's population growth as a percentage. 
* birth_rate - the country's birth rate, or the number of births per year per 1000 people.
* death_rate - the country's death rate, or the number of death per year per 1000 people. 

Let's start by calculating some summary statistics and look for any outlier countries.


In [12]:
%%sql 

select min(population) as min_pop, 
       max(population) as max_pop, 
       min(population_growth) as min_pop_growth,
       max(population_growth) as max_pop_growth
    from facts;
    

Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


Judging from my result above, a few things stick out from the summary statistics: 
    
    * One of the country has a population of 0
    * A country has population of approx 7.3 billion
This insight is intresting and I'll be using subqueries to zoom in 

### Exploring Outliers


In [14]:
%%sql 

Select * 
From facts
Where Population == (Select Min(population)
                       From facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


From the result above the country with zero population is Antarctica, which explains the initial result from the summary

In [15]:
%%sql 

Select * 
From facts
where Population == (Select max(population)
                     from facts);


Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


Looking at the result above the country with approx 7.3 Billion is a row in the table for the whole world which explains the maximum population of approx 7.3 Billion we found earlier. Armed with this new knowledge I'll query the summary statistics exculding the name(country) Anatarctica and World. 

In [18]:
%%sql 

select min(population) as min_pop, 
       max(population) as max_pop, 
       min(population_growth) as min_pop_growth,
       max(population_growth) as max_pop_growth
    from facts
  Where name <> 'World';

Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,1367485388,0.0,4.02


Looking at the result from above, there is a country with approx 1.4 billion population.

### Exploring Average Population and Area

Let's explore density. Density depends on the population and the country's area. Let's look at the average values for these two columns.

We should take care of discarding the row for the whole planet.


In [19]:
%%sql 

Select avg(population) as avg_population, avg(area) as avg_area
from facts
where name <> 'World';

Done.


avg_population,avg_area
32242666.56846473,555093.546184739


Looking at the result above, one can see the avg population is approx 32 million and the average area is approx 555 thousand square kilometers. 

### Finding Densely Populated Countries

To finish, I'll build on the query above to find countries that are densely populated. I'll identify countries that have: 
 * Above average values for population.
 * Below-average values for area.

In [20]:
%%sql 

Select * 
From facts 
where population > (Select avg(population)
                       from facts
                       where name <> 'World'
                    )
    and area < (Select avg(area)
               From facts
               where name <> 'World'
               );

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


Looking at the result above, the names listed are generally known to be densely populated. 