## 1: Introduction
In this challenge, you'll practice calculating summary statistics in SQL while exploring data from factbook.db. Recall that factbook.db contains information about all of the countries in the world. You'll work with the facts table, where each row represents a single country. Here are the descriptions for some of the columns:

- name - The name of the country
- area - The total land and sea area of the country
- population - The country's population
- birth_rate - The country's birth rate
- created_at - The date the record was created
- updated_at - The date the record was updated
- Here are the first few rows of facts:

| id | code | name        | area    | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate | created_at                 | updated_at                 |
|----|------|-------------|---------|-----------|------------|------------|-------------------|------------|------------|----------------|----------------------------|----------------------------|
| 1  | af   | Afghanistan | 652230  | 652230    | 0          | 32564342   | 2.32              | 38.57      | 13.89      | 1.51           | 2015-11-01 13:19:49.461734 | 2015-11-01 13:19:49.461734 |
| 2  | al   | Albania     | 28748   | 27398     | 1350       | 3029278    | 0.3               | 12.92      | 6.58       | 3.3            | 2015-11-01 13:19:54.431082 | 2015-11-01 13:19:54.431082 |
| 3  | ag   | Algeria     | 2381741 | 2381741   | 0          | 39542166   | 1.84              | 23.67      | 4.31       | 0.92           | 2015-11-01 13:19:59.961286 | 2015-11-01 13:19:59.961286 |


In this challenge, you'll use the population values for each country to predict the populations for the following year. First, you'll need to explore the data and look for any quality issues.

#### Instructions:
- In SQL, calculate the means of the population, population_growth, birth_rate, and death_rate columns.
- Assign the mean of the population column to pop_avg.
- Assign the mean of the population_growth column to pop_growth_avg.
- Assign the mean of the birth_rate column to birth_rate_avg.
- Assign the mean of the death_rate column to death_rate_avg.

In [1]:
import sqlite3
conn = sqlite3.connect("data/factbook.db")

In [9]:
query = '''select avg(population), avg(population_growth), avg(birth_rate), avg(death_rate) 
from facts'''

results = conn.execute(query).fetchall()
print(results)

pop_avg = results[0][0]
pop_growth_avg = results[0][1]
birth_rate_avg = results[0][2]
death_rate_avg = results[0][3]

[(62094928.32231405, 1.2009745762711865, 19.32855263157894, 7.8212719298245625)]


## 2: Find Ranges
While the averages give you some sense of the values in these columns, you should also calculate the ranges so you know what their lower and upper bounds are. This will also allow you to look for outliers.

#### Instructions:
Calculate the minimum and maximum values for the columns from the previous screen:
- Assign the minimum of the population column to pop_min.
- Assign the maximum of the population column to pop_max.
- Assign the minimum of the population_growth column to pop_growth_min.
- Assign the maximum of the population_growth column to pop_growth_max.
- Assign the minimum of the birth_rate column to birth_rate_min.
- Assign the maximum of the birth_rate column to birth_rate_max.
- Assign the minimum of the death_rate column to death_rate_min.
- Assign the maximum of the death_rate column to death_rate_max.

You can observe these values using print statements, or the variables display below the output box.

In [10]:
query = '''select min(population), max(population), min(population_growth), 
max(population_growth), min(birth_rate), max(birth_rate), min(death_rate), 
max(death_rate) from facts;'''

results = conn.execute(query).fetchall()
print(results)

pop_min = results[0][0]
pop_max = results[0][1]
pop_growth_min = results[0][2]
pop_growth_max = results[0][3]
birth_rate_min = results[0][4]
birth_rate_max = results[0][5]
death_rate_min = results[0][6]
death_rate_max = results[0][7]

[(0, 7256490011, 0.0, 4.02, 6.65, 45.45, 1.53, 14.89)]


## 3: Filter Values

If you observed the values on the previous screen, you may have noticed the outliers. The max for population is 7,256,490,011, while the minimum is 0. We know that China, the most populated country in the world, has less than 2 billion people. The max value for the population column is over 7 billion, however. The minimum value for the population column is also problematic, because no country has 0 people.

These quirks exist because the database contains rows for entities that aren't countries. There's a row representing the entire world, for example (hence the 7 billion population), and some rows representing oceanic areas (hence the population of 0).

#### Instructions: 

Write a single query that returns the following minimum and maximum values for countries where population is less than 2 billion and population is greater than 0:

- Assign the minimum of the population column to pop_min.
- Assign the maximum of the population column to pop_max.
- Assign the minimum of the population_growth column to pop_growth_min.
- Assign the maximum of the population_growth column to pop_growth_max.
- Assign the minimum of the birth_rate column to birth_rate_min.
- Assign the maximum of the birth_rate column to birth_rate_max.
- Assign the minimum of the death_rate column to death_rate_min.
- Assign the maximum of the death_rate column to death_rate_max.

In [12]:
query = '''select min(population), max(population), min(population_growth), 
max(population_growth), min(birth_rate), max(birth_rate), min(death_rate), 
max(death_rate) from facts where population < 2000000000 and population > 0;'''

results = conn.execute(query).fetchall()
print(results)

pop_min = results[0][0]
pop_max = results[0][1]
pop_growth_min = results[0][2]
pop_growth_max = results[0][3]
birth_rate_min = results[0][4]
birth_rate_max = results[0][5]
death_rate_min = results[0][6]
death_rate_max = results[0][7]

[(48, 1367485388, 0.0, 4.02, 6.65, 45.45, 1.53, 14.89)]


## 4: Predict Future Population Growth
These measures seem to align more with reality. Now let's predict next year's population for each country using the following formula:


``projected_population = population + (population * (population_growth/100))``

We need to divide by 100 because the values in population_growth are percentage values (e.g. 2.32) instead of proportional values (e.g. 0.0232).

#### Instructions:

Use SQL arithmetic to return the projected population values using the above formula and the following parameters:

- Round the values to the nearest whole number (population can't contain a fractional value).
- Filter out any rows with NULL as the value for either population or population_growth.
- Restrict the query to countries with a population that's less than 7 billion and greater than 0.
- Assign the resulting projections to projected_population.

In [19]:
query = '''select round(population + (population * (population_growth/100)))
from facts where population is not null and population_growth is not null and 
population < 7000000000 and population > 0'''

projected_population = results = conn.execute(query).fetchall()
projected_population = [i[0] for i in projected_population]
projected_population[:10]

[33319835.0,
 3038366.0,
 40269742.0,
 85683.0,
 20170938.0,
 93582.0,
 43835803.0,
 3060967.0,
 22994450.0,
 8713211.0]

## 5: Explore Projected Population

To understand how global population would shift under the projections, calculate the minimum, maximum, and average values.

#### Instructions: 

Write a single query that returns:

- the minimum of the projected population values, and assigns it to pop_proj_min.
- the maximum of the projected population values, and assigns it to pop_proj_max.
- the average of the projected population values, and assigns it to pop_proj_avg.

Be sure to:

- Round all fractional values to the nearest whole number.
- Filter out any rows with NULL as the value for either population or population_growth.
- Restrict the query to countries with a population of less than 7 billion and greater than 0.
- Use print statements or the variables display below the output box to observe these values.


In [25]:
query = '''select round(min(population + (population * (population_growth/100)))), 
round(max(population + (population * (population_growth/100)))),
round(avg(population + (population * (population_growth/100))))
from facts where population is not null and population_growth is not null and 
population < 7000000000 and population > 0'''

pop_proj_min = conn.execute(query).fetchall()[0][0]
pop_proj_max = conn.execute(query).fetchall()[0][1]
pop_proj_avg = conn.execute(query).fetchall()[0][2]

print(pop_proj_min, pop_proj_max, pop_proj_avg)

48.0 1373639072.0 33405469.0


## 6: Next Steps

In this challenge, you calculated summary statistics to understand the data better, and then projected the following year's population for each country using SQL arithmetic. In the next mission, you'll learn about group summary techniques for segmenting data in your queries.