# CIA World Factbook database.

What is intended with the following analysis is to work with Jupyter the MySQL database and the possibility of graphing data together.

<br>

The [CIA World Factbook database](https://www.cia.gov/the-world-factbook/) is a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information, as we will always create our data dictionary, in this way it will be much easier to know how to interpret in case of doubt the information with which we are working.

<br>

- `name` — the name of the country.
- `area`— the country's total area (both land and water).
- `area_land` — the country's land area in square kilometers.
- `area_water` — the country's waterarea in square kilometers.
- `population` — the country's population.
- `population_growth`— the country's population growth as a percentage.
- `birth_rate` — the country's birth rate, or the number of births per year per 1,000 people.
- `death_rate` — the country's death rate, or the number of death per year per 1,000 people.

<br>

Through the following mind map we can have a clearer idea of what will be the steps we are going to do in our brief analysis.



![CIA_Factbook_Data](CIA_Factbook_Data.png)

In [2]:
!pwd

/home/ion/Formacion/Dataquest/Data Scientist in Python/Step-5/5-1-SQL Fundamentals/04_Guided Project: Analyzing CIA Factbook Data Using SQL


### Loading libraries

In [1]:
import seaborn as sns
import matplotlib.pyplot as plt

To connect two independent systems like **Jupyter / SQL**, it is necessary to make use of two [magic cells](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture) provided by Jupyter:

- `%%capture`
- `%load_ext sql`

In [3]:
!systemctl | grep mysql.service

  mysql.service                                                                                                           loaded active     running   MySQL Community Server                                                       


We see that the `MySQL Community Server` database is up and running.

-----

 `%%capture`
    
  `%capture [--no-stderr] [--no-stdout] [--no-display] [output]`
    
  *run the cell, capturing stdout, stderr, and IPython’s rich display() calls.*

  **positional arguments:**
  
   *output The name of the variable in which to store output. This is a
    utils.io.CapturedIO object with stdout/err attributes for the text of the captured output. 
    CapturedOutput also has a show() method for displaying the output, and __call__ as well, so you can use that to quickly display the output. If unspecified, captured output is discarded.*

  `options:`
  **--no-stderr**
    Don’t capture stderr.

  **--no-stdout**
    Don’t capture stdout.

   **--no-display**
    Don’t capture IPython’s rich display.

-----

- `%load_ext sql`
   
 **Load an IPython extension** by its module name.

In [None]:
%%capture
%load_ext sql

#### We call an iPython extension using magic cells called sql that allows us to connect our database

In [None]:
%sql sqlite:///factbook.db 

### Overview of the Data

Information about the database we are going to work on.

In [None]:
%%sql
SELECT *
  FROM sqlite_master 
WHERE type='table';

### Column names:

In [None]:
%%sql

SELECT * FROM facts
LIMIT 0;

#### Missing values:

In [None]:
%%sql

SELECT COUNT(*) AS 'TOTAL ROWS',
        COUNT(*) - COUNT(code) AS "NULL's in CODE",
        COUNT(*) - COUNT(name) AS "NULL's in NAME",
        COUNT(*) - COUNT(area) AS "NULL's in AREA",
        COUNT(*) - COUNT(area_land) AS "NULL's in AREA LAND",
        COUNT(*) - COUNT(area_water) AS "NULL's in AREA H2O",
        COUNT(*) - COUNT(population) AS "NULL's in POP",
        COUNT(*) - COUNT(population_growth) AS "NULL's in POP_GRW",
        COUNT(*) - COUNT(birth_rate) AS "NULL's in BIRTH RATE",
        COUNT(*) - COUNT(death_rate) AS "NULL's in DEATH RATE",
        COUNT(*) - COUNT(migration_rate) AS "NULL's in MIGRATION RATE"
    FROM facts;

This is what the first five rows of the `facts` table in the database.

In [None]:
%%sql

SELECT *
    FROM facts
LIMIT 5;

### Summary Statistics.

Single query with `Minimum / Maximum population` and `Minimum / Maximum growth`.

In [None]:
%%sql

SELECT MIN(population) AS `Minimum population on a Country`,
    MAX(population) AS `Maximum number of population`, 
    MIN(population_growth) AS `Minimum Growth population`,  
    MAX(population_growth) AS `Maximum growth population`
FROM facts;

Then we will make a series of requests to the database to obtain the information we need.

### Exploring Outliers:

#### Which is the country with `0` population?

In [None]:
%%sql

SELECT name AS 'Country', population AS Population
    FROM facts
    WHERE population = (SELECT MIN(population)
                       FROM facts)
    ORDER BY population DESC;

# 🐧

#### Which are the ten countries with the smallest population?

In [None]:
%%sql

SELECT name AS 'Country', population AS 'Population'
    FROM facts
    WHERE population > (SELECT MIN(population)
                       FROM facts)
    ORDER BY population ASC
    LIMIT 10;

In [None]:
import matplotlib.pyplot as plt

smallest_pop = _ 
df_smallest_pop = smallest_pop.DataFrame()

countries = df_smallest_pop['Country']
population = df_smallest_pop['Population']

plt.title("Countries vs Population")
plt.barh(countries, population)

plt.show()

#### The amount of Population in the World.

In [None]:
%%sql 

SELECT name AS 'Country', population AS 'Population'
    FROM facts
    WHERE population = (SELECT MAX(population)
                       FROM facts)
    ORDER BY population DESC;

# 🌐

#### Which are the ten countries with the largest population?

In [None]:
%%sql

SELECT name AS 'Country', population AS 'Population'
    FROM facts
    WHERE population < (SELECT MAX(population)
                        FROM facts)
    ORDER BY population DESC
    LIMIT 10;

#### Is there a country where children are not born?

In [None]:
%%sql

SELECT name AS 'Country', population_growth AS 'Number of births'
    FROM facts
    WHERE population_growth = (SELECT MIN(population_growth)
                       FROM facts)
    ORDER BY population DESC;

- [Geendland](https://visitgreenland.com/)

- [Holy See (Vatican City)](https://www.vatican.va/content/vatican/en.html)

- [Cocos (Keeling) Islands](https://www.cocoskeelingislands.com.au/)

- [Pitcairn Islands](https://www.government.pn/)

#### Which are the ten countries with the highest birth rate on the planet?

In [None]:
%%sql

SELECT name, population_growth AS 'Births'
    FROM facts
    WHERE population_growth <= (SELECT MAX(population_growth)
                              FROM facts)
    ORDER BY population_growth DESC
    LIMIT 10;

# 🐣

#### Exploring Average Population and Area

In [None]:
%%sql

SELECT  MIN(population) AS 'Minim population',
        MAX(population) AS 'Max population', 
        MIN(population_growth) AS 'Minim growth', 
        MAX(population_growth) AS 'Max growth'
    FROM facts
    WHERE (population < (SELECT MAX(population)
                       FROM facts)
          )
    AND (population_growth = (SELECT MAX(population_growth)
                       FROM facts)
        );

In [None]:
%%sql

SELECT  ROUND(AVG(population)) AS 'Avg # population',
        ROUND(AVG(area)) AS 'Avg area Km^2'
    FROM facts;

#### The most Densely Populated Countries:

In [None]:
%%sql

SELECT name, population
    FROM facts
        WHERE population > (SELECT AVG(population) FROM facts) 
        AND area < (SELECT AVG(area) FROM facts)               
        ORDER BY population DESC;

#### Ten countries with the lowest population density:

In [None]:
%%sql

SELECT name, population
    FROM facts
        WHERE population < (SELECT AVG(population) FROM facts) 
        AND area > (SELECT AVG(area) FROM facts)               
        ORDER BY population DESC
        LIMIT 10;

#### The country with the largest number of people?

In [None]:
%%sql

SELECT name, MAX(population)
FROM facts
WHERE population < (SELECT MAX(population) FROM facts);

#### Which country has the highest growth rate?

In [None]:
%%sql

SELECT name, population_growth
FROM facts
WHERE population_growth = (SELECT MAX(population_growth) FROM facts);

#### Countries with more water than land:

In [None]:
%%sql

SELECT name, area_water, area_land,
    CAST(area_water AS FLOAT) / area_land AS 'ratio: H2O/Km²'
    FROM facts
    WHERE area_water <= (SELECT MAX(area_water) FROM facts)
    AND area_land < (SELECT area_water facts)               
    ORDER BY  CAST(area_water AS FLOAT) / area_land DESC
LIMIT 10;

- [British Indian Ocean Territory	](https://www.biot.gov.io/)

- [https://www.bvitourism.com/](https://www.bvitourism.com/)

#### Ten countries have the highest ratios of water to land? 

In [None]:
%%sql

SELECT name, area_water, area_land,
    CAST(area_water AS FLOAT) / area_land AS 'H2O/Km²'
    FROM facts
    WHERE area_water <= (SELECT MAX(area_water) FROM facts)
    ORDER BY  area_water DESC
LIMIT 10;

In [None]:
data = _
info = data.DataFrame()

sns.relplot(data=info, y='area_water', x='area_land',
            hue = 'name', palette='RdYlGn',
           size = 'H2O/Km²', sizes = (1,1000))
plt.show()

#### Ten countries will add the most people to their populations next year:

In [None]:
%%sql

SELECT name,birth_rate,death_rate, CAST (birth_rate AS FLOAT) / CAST(death_rate AS FLOAT) AS 'ratio birth'
    FROM facts
    WHERE birth_rate > (SELECT MAX(death_rate) FROM facts )
    ORDER BY (CAST (birth_rate AS FLOAT) / CAST(death_rate AS FLOAT)) DESC
LIMIT 10;

In [None]:
data = _
info = data.DataFrame()

sns.relplot(data=info, x='birth_rate', y='death_rate',
            hue = 'name', palette='RdYlGn',
           size = 'ratio birth', sizes = (1,1000))
plt.show()

#### Ten countries with a higher death rate than birth rate? 

In [None]:
%%sql

SELECT name,birth_rate,death_rate, CAST (death_rate AS FLOAT) / CAST (birth_rate AS FLOAT) AS 'ratio death'
    FROM facts
    WHERE birth_rate <= (SELECT MAX(death_rate) FROM facts )
    ORDER BY (CAST (death_rate AS FLOAT) / CAST (birth_rate AS FLOAT)) DESC
    LIMIT 10;

In [None]:
data = _
info = data.DataFrame()

sns.relplot(data=info, x='birth_rate', y='death_rate',
            hue = 'name', palette='RdYlGn',
           size = 'ratio death', sizes = (1,1000))
plt.show()

####  Ten countries with the highest population/area ratio:

In [None]:
%%sql

SELECT name, population, area, CAST (population AS FLOAT) / CAST (area AS FLOAT) AS 'population/area ratio'
    FROM facts
    WHERE population < (SELECT MAX(population) FROM facts)
    ORDER BY (CAST(population AS FLOAT) / CAST(area AS FLOAT)) DESC
LIMIT 10;

In [None]:
data = _
info = data.DataFrame()

sns.relplot(data=info, x='population', y='area',
            hue = 'name', palette='RdYlGn',
           size = 'population/area ratio', sizes = (1,1000))
plt.show()