# Dataquest - SQL Fundamentals <br/> <br/> Project Title: Analysing CIA FactBook Data Using SQL

## 1) Introduction

Provided by: [Dataquest.io](https://www.dataquest.io/)

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- **population** — the global population.
- **population_growth** — the annual population growth rate, as a percentage.
- **area** — the total land and water area.

We'll use the following code to connect our Jupyter Notebook to our database file:

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

In [2]:
%%sql
/*
Write a query to return information on the tables in the database.
*/

SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [3]:
%%sql
/* 
In a different code cell, write and run another query that returns the first five rows of the facts table in the database. 
*/

SELECT
    *
FROM
    facts
LIMIT
    5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## 2) Summary Statistics

Here are the descriptions for some of the columns:
- **name** — the name of the country.
- **area**— the country's total area (both land and water).
- **area_land** — the country's land area in square kilometers.
- **area_water** — the country's waterarea in square kilometers.
- **population** — the country's population.
- **population_growth**— the country's population growth as a percentage.
- **birth_rate** — the country's birth rate, or the number of births per year per 1,000 people.
- **death_rate** — the country's death rate, or the number of death per year per 1,000 people.

Let's start by calculating some summary statistics and look for any outlier countries. 

In [39]:
%%sql
/*
Write a single query that returns the following:  
- Minimum population
- Maximum population
- Minimum population growth
- Maximum population growth
*/

SELECT
    MIN(population) AS min_population,
    MAX(population) AS max_population,
    MIN(population_growth) AS min_population_growth,
    MAX(population_growth) AS max_population_growth
FROM
    facts;

 * sqlite:///factbook.db
Done.


min_population,max_population,min_population_growth,max_population_growth
0,7256490011,0.0,4.02


## 3) Exploring Outliers

We see a few interesting things in the summary statistics on the previous screen:

- There's a country with a population of 0
- There's a country with a population of 7256490011 (or more than 7.2 billion people)

Let's use subqueries to zoom in on just these countries without using the specific values.

In [35]:
%%sql
/*
Write a query that returns the countries with the minimum population.
Write a query that returns the countries with the maximum population.
*/


SELECT
    facts.name,
    facts.population
FROM
    facts
WHERE
    facts.population =  (
                        SELECT
                            MIN(population) AS min_population
                        FROM
                            facts
                        )
    OR
    facts.population =  (
                        SELECT
                            MAX(population) AS max_population
                        FROM
                            facts
                        );

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0
World,7256490011


## 4) Exploring Average Population And Area

- It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. 
- It also seems like the table contains a row for Antarctica, which explains the population of 0.

Now that we know this, we should recalculate the summary statistics we calculated earlier — this time excluding the row for the whole world.

In [69]:
%%sql
/*
Recompute the summary statistics you found earlier while excluding the row for the whole world. Include the following:
- Minimum population
- Maximum population
- Minimum population growth
- Maximum population growth  
*/

WITH sub AS (
            SELECT
                *
            FROM
                facts
            Where
                population <    (
                                SELECT
                                    MAX(population)
                                FROM
                                    facts
                                )
            )

SELECT
    MIN(population) AS 'Minimum population',
    MAX(population) AS 'Maximum population',
    MIN(population_growth) AS 'Minimum population growth',
    MAX(population_growth) AS 'Maximum population growth'
FROM
    sub;

 * sqlite:///factbook.db
Done.


Minimum population,Maximum population,Minimum population growth,Maximum population growth
0,1367485388,0.0,4.02


In [75]:
%%sql
/*
Calculate the average value for the following columns:
- population
- area
*/

SELECT
    AVG(population),
    AVG(area)
FROM
    facts;

 * sqlite:///factbook.db
Done.


AVG(population),AVG(area)
62094928.32231405,555093.546184739


## 5) Finding Densely Populated Countries

To finish, we'll build on the query we wrote for the previous screen to find countries that are densely populated. We'll identify countries that have **both** the following:

- Above-average values for population
- Below-average values for area

In [82]:
%%sql
/*
Write a query that finds all countries meeting both of the following criteria:
- The population is above average.
- The area is below average.
*/

SELECT
    name
FROM
    facts
WHERE
    population >    
                (
                SELECT
                    AVG(population)
                FROM
                    facts
                )
                AND
    area <  
            (
            SELECT
                AVG(area)
            FROM
                facts
            );

 * sqlite:///factbook.db
Done.


name
Bangladesh
Germany
Japan
Philippines
Thailand
United Kingdom
Vietnam


## 6) Conclusion

#### Potential areas for further analysis
- Which country has the most people? Which country has the highest growth rate?
- Which countries have the highest ratios of water to land? Which countries have more water than land?
- Which countries will add the most people to their populations next year?
- Which countries have a higher death rate than birth rate?
- Which countries have the highest **population/area** ratio, and how does it compare to list we found in the previous works?