# Analysing CIA Factbook Data Using SQL

## Introduction

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- **population** — the global population.
- **population_growth** — the annual population growth rate, as a percentage.
- **area** — the total land and water area.

First, we must connect out Jupyter Notebook to our database file via the following chunk of code:

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

## Overview of the Data

Let's start off by writing a query to return information on the tables within the database.

In [2]:
%%sql
SELECT *
    FROM sqlite_master
    WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Now, we'll write and run a query that returns the first five rows of the **facts** table.

In [3]:
%%sql
SELECT * FROM facts
LIMIT 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


## Summary Statistics

To further explore the data, we'll calculate some summary statistics and look for any countries with outlier values relating to population.

In [4]:
%%sql
SELECT MIN(population) AS min_pop,
    MAX(population) AS max_pop,
    MIN(population_growth) AS min_growth,
    MAX(population_growth) AS max_growth
FROM facts;

Done.


min_pop,max_pop,min_growth,max_growth
0,7256490011,0.0,4.02


## Exploring Outliers

We see a few interesting things in the summary statistics on the previous screen:

- There's a country with a population of 0
- There's a country with a population of 7256490011 (or **more than 7.2 billion people**)
- Let's use subqueries to zoom in on just these countries without using the specific values.

Let's find the countries with the minimum population

In [5]:
%%sql
SELECT name FROM facts
WHERE population = (
    SELECT MIN(population)
    FROM facts
);

Done.


name
Antarctica


Now, we'll find the ones with the maximum population

In [6]:
%%sql
SELECT name FROM facts
WHERE population = (
    SELECT MAX(population)
    FROM facts
);

Done.


name
World


## Exploring Population and Area

### Recalculate Population Summary Statistics

It seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. It also seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook page for Antarctica.

Now that we know this, we should recalculate the summary statistics we calculated earlier — this time excluding the row for the whole world.

In [7]:
%%sql
SELECT MIN(population) AS min_pop,
    MAX(population) AS max_pop,
    MIN(population_growth) AS min_growth,
    MAX(population_growth) AS max_growth
FROM facts
WHERE name <> 'World';

Done.


min_pop,max_pop,min_growth,max_growth
0,1367485388,0.0,4.02


### Calculate Average Population and Area

The average value for population is:

In [9]:
%%sql
SELECT AVG(population) AS avg_pop
FROM facts;

Done.


avg_pop
62094928.32231405


And the average value for area is:

In [10]:
%%sql
SELECT AVG(area) AS avg_area
FROM facts;

Done.


avg_area
555093.546184739


## Finding Densely Populated Countries

To finish, we'll build on the query we wrote for the previous screen to find countries that are densely populated. We'll identify countries that have the following:

- Above-average values for population
- Below-average values for area

In [11]:
%%sql
SELECT name FROM facts
WHERE population > (
    SELECT AVG(population)
    FROM facts
);

Done.


name
Bangladesh
Brazil
China
"Congo, Democratic Republic of the"
Egypt
Ethiopia
France
Germany
India
Indonesia


In [12]:
%%sql
SELECT name FROM facts
WHERE area < (
    SELECT AVG(area)
    FROM facts
);

Done.


name
Albania
Andorra
Antigua and Barbuda
Armenia
Austria
Azerbaijan
"Bahamas, The"
Bahrain
Bangladesh
Barbados
