# 1. Jupyter

You'll also need ipython-sql which you can install by starting Jupyter and in a code cell running the following code:

!conda install -yc conda-forge ipython-sql

# 2. Introduction

In this project, we'll work with data from the [CIA World Factbook](https://www.cia.gov/library/publications/the-world-factbook/), a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like:

* population - The population as of 2015.
* population_growth - The annual population growth rate, as a percentage.
* area - The total land and water area.

In this guided project, we'll use SQL in Jupyter Notebook to explore and analyze data from this database.

We'll use the following code to connect our Jupyter Notebook to our database file:

`%%capture
%load_ext sql
%sql sqlite:///factbook.db`

`!conda install -yc conda-forge ipython-sql`

In [1]:
%%capture
%load_ext sql
%sql sqlite:///facebook.db

# 3. Overview of the Data

In [18]:
%%sql

SELECT *
  FROM sqlite_master
 WHERE type='table'

 * sqlite:///facebook.db
Done.


type,name,tbl_name,rootpage,sql


In [19]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///facebook.db
(sqlite3.OperationalError) no such table: facts
[SQL: SELECT *
  FROM facts
 LIMIT 5;]
(Background on this error at: http://sqlalche.me/e/e3q8)


Here are the descriptions for some of the columns:

* name - The name of the country.
* area - The total land and sea area of the country.
* population - The country's population.
* population_growth- The country's population growth as a percentage.
* birth_rate - The country's birth rate, or the number of births a year per 1,000 people.
* death_rate - The country's death rate, or the number of death a year per 1,000 people.
* area- The country's total area (both land and water).
* area_land - The country's land area in square kilometers.
* area_water - The country's waterarea in square kilometers.

Let's start by calculating some summary statistics and see what they tell us.

# 4. Summary Statistics

In [4]:
%%sql

SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) max_pop_growth 
  FROM facts;

 * sqlite:///facebook.db
(sqlite3.OperationalError) no such table: facts
[SQL: SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) max_pop_growth 
  FROM facts;]
(Background on this error at: http://sqlalche.me/e/e3q8)


A few things stick out from the summary statistics in the last screen:

There's a country with a population of 0
There's a country with a population of 7256490011 (or more than 7.2 billion people)
Let's use subqueries to zoom in on just these countries without using the specific values.

# 5. Exploring Outliers

In [5]:
%%sql
SELECT *
  FROM facts
 WHERE population == (SELECT MIN(population)
                        FROM facts
                     );

 * sqlite:///facebook.db
(sqlite3.OperationalError) no such table: facts
[SQL: SELECT *
  FROM facts
 WHERE population == (SELECT MIN(population)
                        FROM facts
                     );]
(Background on this error at: http://sqlalche.me/e/e3q8)


In [6]:
%%sql
SELECT *
  FROM facts
 WHERE population == (SELECT MAX(population)
                        FROM facts
                     );

 * sqlite:///facebook.db
(sqlite3.OperationalError) no such table: facts
[SQL: SELECT *
  FROM facts
 WHERE population == (SELECT MAX(population)
                        FROM facts
                     );]
(Background on this error at: http://sqlalche.me/e/e3q8)


We also see that the table contains a row for the whole world, which explains the maximum population of over 7.2 billion we found earlier.

Now that we know this, we should recalculate the summary statistics we calculated earlier, while excluding the row for the whole world.

**Summary Statistics Revisited**

In [7]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth 
  FROM facts
 WHERE name <> 'World';

 * sqlite:///facebook.db
(sqlite3.OperationalError) no such table: facts
[SQL: SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth 
  FROM facts
 WHERE name <> 'World';]
(Background on this error at: http://sqlalche.me/e/e3q8)


# 6. Exploring Average Population and Area

Let's explore density. Density depends on the population and the country's area. Let's look at the average values for these two columns.

We should take care of discarding the row for the whole planet.

In [8]:
%%sql
SELECT AVG(population) AS avg_population, AVG(area) AS avg_area
  FROM facts
 WHERE name <> 'World';

 * sqlite:///facebook.db
(sqlite3.OperationalError) no such table: facts
[SQL: SELECT AVG(population) AS avg_population, AVG(area) AS avg_area
  FROM facts
 WHERE name <> 'World';]
(Background on this error at: http://sqlalche.me/e/e3q8)


# 7. Finding Densely Populated Countries


To finish, we'll build on the query above to find countries that are densely populated. We'll identify countries that have:

Above average values for population.
Below average values for area.

In [9]:
%%sql
SELECT *
  FROM facts
 WHERE population > (SELECT AVG(population)
                       FROM facts
                    )
   AND area < (SELECT AVG(area)
                 FROM facts
);

 * sqlite:///facebook.db
(sqlite3.OperationalError) no such table: facts
[SQL: SELECT *
  FROM facts
 WHERE population > (SELECT AVG(population)
                       FROM facts
                    )
   AND area < (SELECT AVG(area)
                 FROM facts
);]
(Background on this error at: http://sqlalche.me/e/e3q8)


Some of these countries are generally known to be densely populated, so we have confidence in our results!