# Guided Project - CIA World Factbook
In this project, we'll work with data from the **CIA World Factbook**, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- `population` — the global population.
- `population_growth` — the annual population growth rate, as a percentage.
- `area` — the total land and water area.

## Installing ipython-sql

In [1]:
!conda install -yc conda-forge ipython-sql

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\anaconda3

  added / updated specs:
    - ipython-sql


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    conda-4.10.1               |   py38haa244fe_0         3.1 MB  conda-forge
    ipython-sql-0.3.9          |py38h32f6830_1002          28 KB  conda-forge
    prettytable-2.1.0          |     pyhd8ed1ab_0          23 KB  conda-forge
    python_abi-3.8             |           1_cp38           4 KB  conda-forge
    sqlparse-0.4.1             |     pyh9f0ad1d_0          34 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be INSTALLED:

  ipython-sql        conda-forge/win-64::ipython-sql-0.3.9-py38h32f6830_

### Connecting to factbook database

In [3]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

## Overview of data
To start, we will take a look at the available in the database:

In [6]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Now we will take a look at the facts table:

In [5]:
%%sql
SELECT *
  FROM facts
 LIMIT 10;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46
6,ac,Antigua and Barbuda,442,442,0,92436,1.24,15.85,5.69,2.21
7,ar,Argentina,2780400,2736690,43710,43431886,0.93,16.64,7.33,0.0
8,am,Armenia,29743,28203,1540,3056382,0.15,13.61,9.34,5.8
9,as,Australia,7741220,7682300,58920,22751014,1.07,12.15,7.14,5.65
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56


The full description of each columns are the following:

- `name` - The name of the country.
- `area` - The total land and sea area of the country.
- `population` - The country's population.
- `population_growth` - The country's population growth as a percentage.
- `birth_rate` - The country's birth rate, or the number of births a year per 1,000 people.
- `death_rate` - The country's death rate, or the number of death a year per 1,000 people.
- `area` - The country's total area (both land and water).
- `area_land` - The country's land area in square kilometers.
- `area_water` - The country's waterarea in square kilometers.

We can also note some data is missing in here, as it would be highly improbable for there to be no areas of water in Afghanistan or Algeria, for instance.

For now we will start by calculating some summary statistics and look for any outlier countries.

#### Summary Statistics (Minimum and Maximum values)

In [7]:
%%sql
SELECT MIN(population) AS "Minimum Population", MAX(population) AS "Max Population", MIN(population_growth) AS "Minimum Population Growth", MAX(population_growth) AS "Maximum Population Growth"
    FROM facts;

 * sqlite:///factbook.db
Done.


Minimum Population,Max Population,Minimum Population Growth,Maximum Population Growth
0,7256490011,0.0,4.02


We see a few interesting things in the summary statistics on the previous screen:

There's a country with a population of `0`
There's a country with a population of `7256490011` (or more than 7.2 billion people)

Let's find out about the countries with these values:

#### Finding Outliers

In [13]:
%%sql
SELECT *
    FROM facts
    WHERE population == (SELECT MIN(population) FROM facts) or population == (SELECT MAX(population) FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000.0,,0,,,,
261,xx,World,,,,7256490011,1.08,18.6,7.8,


It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0.

We will next perform the summary statistics excluding the `World`, and calculate the averages for `population` and `area`:

In [9]:
%%sql
SELECT MIN(population) AS "Minimum Population", MAX(population) AS "Max Population", MIN(population_growth) AS "Minimum Population Growth", MAX(population_growth) AS "Maximum Population Growth"
    FROM facts
    WHERE name != "World"; # <> is the ANSI compliant version of "does not equal"

 * sqlite:///factbook.db
Done.


Minimum Population,Max Population,Minimum Population Growth,Maximum Population Growth
0,1367485388,0.0,4.02


#### Average Summary Statistics

In [11]:
%%sql
SELECT ROUND(AVG(population),2) AS "Aveage Population", ROUND(AVG(area),2) AS "Average Area"
    FROM facts;

 * sqlite:///factbook.db
Done.


Aveage Population,Average Area
62094928.32,555093.55


To finish, we'll build on the query we wrote for the previous screen to find countries that are densely populated. We'll identify countries that have the following:

- Above-average values for `population`.
- Below-average values for `area`.

We will also sort each country descending by population.

#### Finding densely populated countries

In [17]:
%%sql
SELECT *
    FROM facts
    WHERE population > (SELECT ROUND(AVG(population),2) 
                        FROM facts
                       WHERE name != "World") 
    AND area < (SELECT ROUND(AVG(area),2) 
                FROM facts
               WHERE name != "World")
    ORDER BY population DESC;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
192,vm,Vietnam,331210,310070,21140,94348835,0.97,15.96,5.93,0.3
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
173,th,Thailand,513120,510890,2230,67976405,0.34,11.19,7.8,0.0
185,uk,United Kingdom,243610,241930,1680,64088222,0.54,12.17,9.35,2.54
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31
