# Analyzing CIA Factbook Data Using SQL

In this project, I've worked on data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like:

 1. population - The population as of 2015.
 2. population_growth - The annual population growth rate, as a percentage.
 3. area - The total land and water area.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db # To connect with Jupyter to factbook database.

'Connected: None@factbook.db'

In [2]:
%%sql
SELECT * 
    FROM sqlite_master 
    WHERE type='table'

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [3]:
%%sql
SELECT * 
    FROM facts 
    LIMIT 5

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Summary Statistics

In [17]:
%%sql
SELECT MIN(population), MAX(population),
       MIN(population_growth),MAX(population_growth) 
       FROM facts

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


- There's a country with a population of 0
- There's a country with a population of 7256490011 (or more than 7.2 billion people)

### Exploring Outliers

In [6]:
%%sql
SELECT name 
       FROM facts 
       WHERE population =(SELECT MIN(population) FROM facts)

Done.


name
Antarctica


In [7]:
%%sql
SELECT name 
       FROM facts 
       WHERE population =(SELECT MAX(population) FROM facts)

Done.


name
World


As per above results the table contains a row for the whole world, which explains the population of over 7.2 billion and also contains a row for Antarctica, which explains the population of 0.

In [18]:
%%sql
SELECT MIN(population), MAX(population),
       MIN(population_growth),MAX(population_growth) 
       FROM facts 
        WHERE name != 'World'          

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,1367485388,0.0,4.02


In [23]:
%%sql
SELECT AVG(population), AVG(area) 
    FROM facts

Done.


AVG(population),AVG(area),name
62094928.32231405,555093.546184739,World


### Finding Densely Populated Countries

The countries are said to be overly populated when they have above average values for population and below average values for area.

In [22]:
%%sql
SELECT name 
    FROM facts
    WHERE population > ( SELECT AVG(population) 
                          FROM facts)
      AND area < (SELECT AVG(area) 
                           FROM facts)

Done.


name
Bangladesh
Germany
Japan
Philippines
Thailand
United Kingdom
Vietnam


Countries having the highest ratios of water to land.

In [45]:
%%sql
SELECT name, 
       MAX(CAST(area_water AS FLOAT)/CAST(area_land AS FLOAT)) MAX_WATER_LAND 
    FROM facts


Done.


name,MAX_WATER_LAND
British Indian Ocean Territory,905.6666666666666


Countries having more water than land.

In [46]:
%%sql
SELECT name 
    FROM facts 
    WHERE area_water > area_land

Done.


name
British Indian Ocean Territory
Virgin Islands


Countries would be adding the most people to their population next year.

In [47]:
%%sql
SELECT name 
    FROM facts 
    WHERE population_growth> (SELECT AVG(population_growth) 
                                    FROM facts)

Done.


name
Afghanistan
Algeria
Angola
Antigua and Barbuda
Bahrain
Bangladesh
Belize
Benin
Bolivia
Botswana


Countries having a higher death rate than birth rate.

In [49]:
%%sql
SELECT name 
    FROM facts 
    WHERE death_rate > birth_rate

Done.


name
Austria
Belarus
Bosnia and Herzegovina
Bulgaria
Croatia
Czech Republic
Estonia
Germany
Greece
Hungary


Countries having the highest population/area ratio.

In [50]:
%%sql
SELECT name, 
       MAX(CAST(population AS FLOAT)/CAST(area AS FLOAT)) MAX_POPULATION_AREA 
    FROM facts

Done.


name,MAX_POPULATION_AREA
Macau,21168.964285714286
