# Calculating Summary Statistics in SQL

## Objective

Calculating summary statistics in SQL to better understand population data and projecting the following year's population for each country using SQL arithmetic. 

## Database

The database come [from Github](https://github.com/factbook/factbook.sql), It contains compendium of facts about countries. The Factbook contains demographic information for each country in the world, including:

    •	name - The name of the country
    •	area - The total land and sea area of the country
    •	population - The country's population
    •	birth_rate - The country's birth rate
    •	created_at - The date the record was created
    •	updated_at - The date the record was updated
    
Here are the first few rows of facts table:

| id | code | name        | area    | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate | created_at                 | updated_at                 |
|----|------|-------------|---------|-----------|------------|------------|-------------------|------------|------------|----------------|----------------------------|----------------------------|
| 1  | af   | Afghanistan | 652230  | 652230    | 0          | 32564342   | 2.32              | 38.57      | 13.89      | 1.51           | 2015-11-01 13:19:49.461734 | 2015-11-01 13:19:49.461734 |
| 2  | al   | Albania     | 28748   | 27398     | 1350       | 3029278    | 0.3               | 12.92      | 6.58       | 3.3            | 2015-11-01 13:19:54.431082 | 2015-11-01 13:19:54.431082 |
| 3  | ag   | Algeria     | 2381741 | 2381741   | 0          | 39542166   | 1.84              | 23.67      | 4.31       | 0.92           | 2015-11-01 13:19:59.961286 | 2015-11-01 13:19:59.961286 |


## Exploring the Database

Calculate the means of the population, population_growth, birth_rate, and death_rate columns.

In [1]:
import sqlite3
conn = sqlite3.connect("factbook.db")

averages = "SELECT avg(population), avg(population_growth), avg(birth_rate), avg(death_rate) FROM facts;"
avg_results = conn.execute(averages).fetchall()

pop_avg = avg_results[0][0]
pop_growth_avg = avg_results[0][1]
birth_rate_avg = avg_results[0][2]
death_rate_avg = avg_results[0][3]

print(avg_results)
print(pop_avg)
print(pop_growth_avg)
print(birth_rate_avg)
print(death_rate_avg)

[(62094928.32231405, 1.2009745762711865, 19.32855263157894, 7.8212719298245625)]
62094928.32231405
1.2009745762711865
19.32855263157894
7.8212719298245625


## Find Ranges

Calculate the ranges to know what the data lower and upper bounds are and look for outliers.

In [2]:
conn = sqlite3.connect("factbook.db")

minimum = "SELECT min(population), min(population_growth), min(birth_rate), min(death_rate) FROM facts;"
maximum = "SELECT max(population), max(population_growth), max(birth_rate), max(death_rate) FROM facts;"
min_tuple = conn.execute(minimum).fetchall()
max_tuple = conn.execute(maximum).fetchall()

pop_min = min_tuple[0][0]
pop_max = max_tuple[0][0]
pop_growth_min = min_tuple[0][1]
pop_growth_max = max_tuple[0][1]
birth_rate_min = min_tuple[0][2]
birth_rate_max = max_tuple[0][2]
death_rate_min = min_tuple[0][3]
death_rate_max = max_tuple[0][3]

print(pop_min)
print(pop_max)
print(pop_growth_min)
print(pop_growth_max)
print(birth_rate_min)
print(birth_rate_max)
print(death_rate_min)
print(death_rate_max)

0
7256490011
0.0
4.02
6.65
45.45
1.53
14.89


#### Observations:
Notice that the outliers. The max for population is 7,256,490,011, while the minimum is 0. We know that China, the most populated country in the world, has less than 2 billion people. The max value for the population column is over 7 billion, however. The minimum value for the population column is also problematic, because no country has 0 people.

These quirks exist because the database contains rows for entities that aren't countries. There's a row representing the entire world, for example (hence the 7 billion population), and some rows representing oceanic areas (hence the population of 0).

## Filter Values

Write query that returns the following minimum and maximum values for countries where population is less than 2 billion and population is greater than 0:

In [3]:
conn = sqlite3.connect("factbook.db")

min_and_max = '''
SELECT min(population), max(population), min(population_growth), max(population_growth),
min(birth_rate), max(birth_rate), min(death_rate), max(death_rate)
FROM facts WHERE population > 0 and population < 2000000000;
'''
results = conn.execute(min_and_max).fetchall()
print(results)

# population column
pop_min = results[0][0]
pop_max = results[0][1]
# population_growth column
pop_growth_min = results[0][2]
pop_growth_max = results[0][3]
# birth_rate column
birth_rate_min = results[0][4]
birth_rate_max = results[0][5]
# death_rate column
death_rate_min = results[0][6]
death_rate_max = results[0][7]

print(pop_min)
print(pop_max)
print(pop_growth_min)
print(pop_growth_max)
print(birth_rate_min)
print(birth_rate_max)
print(death_rate_min)
print(death_rate_max)

[(48, 1367485388, 0.0, 4.02, 6.65, 45.45, 1.53, 14.89)]
48
1367485388
0.0
4.02
6.65
45.45
1.53
14.89


## Predict Future Population Growth

In [4]:
conn = sqlite3.connect("factbook.db")

projected_population_query = '''
SELECT round(population + population * (population_growth/100), 0) FROM facts
WHERE population > 0 AND population < 7000000000 
AND population is not null and population_growth is not null;
'''

projected_population = conn.execute(projected_population_query).fetchall()
print(projected_population[0:10])

[(33319835.0,), (3038366.0,), (40269742.0,), (85683.0,), (20170938.0,), (93582.0,), (43835803.0,), (3060967.0,), (22994450.0,), (8713211.0,)]


## Explore Projected Population

To understand how global population would shift under the projections, calculate the minimum, maximum, and average values.

In [5]:
conn = sqlite3.connect("factbook.db")
proj_pop_query = '''
select round(min(population + population * (population_growth/100)), 0), 
round(max(population + population * (population_growth/100)), 0), 
round(avg(population + population * (population_growth/100)), 0)
from facts 
where population > 0 and population < 7000000000 and 
population is not null and population_growth is not null;
'''

proj_results = conn.execute(proj_pop_query).fetchall()

pop_proj_min = proj_results[0][0]
pop_proj_max = proj_results[0][1]
pop_proj_avg = proj_results[0][2]

print("Projected Population,", "Minimum: ", pop_proj_min)
print("Projected Population,", "Maximum: ", pop_proj_max)
print("Projected Population,", "Average: ", pop_proj_avg)

Projected Population, Minimum:  48.0
Projected Population, Maximum:  1373639072.0
Projected Population, Average:  33405469.0
