# Analyzing CIA World Facbtook Data Using SQL
In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

- population — the global population.
- population_growth — the annual population growth rate, as a percentage.
- area — the total land and water area.

In [2]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db
#to connect Jupyter Notebook to database file

'Connected: None@factbook.db'

In [3]:
!conda install -yc conda-forge ipython-sql
#to install ipython-sql 

/bin/sh: 1: conda: not found


In [4]:
%%sql  
SELECT *
    from sqlite_master
    where type = 'table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


In [5]:
%%sql  
select *
    from facts
    limit 5;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Data Dictionary
Here are the descriptions for some of the columns:

- name — the name of the country.
- area— the country's total area (both land and water).
- area_land — the country's land area in square kilometers.
- area_water — the country's waterarea in square kilometers.
- population — the country's population.
- population_growth— the country's population growth as a percentage.
- birth_rate — the country's birth rate, or the number of births per year per 1,000 people.
- death_rate — the country's death rate, or the number of death per year per 1,000 people.

Let's start by calculating some summary statistics and look for any outlier countries.

### Summary Statistics

In [6]:
%%sql
select count(*)
    from facts

Done.


count(*)
261


The dataframe has total of 261 rows including Null values.

In [7]:
%%sql
select min(population),  max(population), min(population_growth), max(population_growth)
    from facts;

Done.


min(population),max(population),min(population_growth),max(population_growth)
0,7256490011,0.0,4.02


### Section 2. Exploring outliers

In [8]:
%%sql
select name
    from facts
    where population == 0

Done.


name
Antarctica


In [9]:
%%sql
select name
    from facts
    where population == 7256490011

Done.


name
World


Now we know that Antarctica has no population. And for maximum population, it appears total population for the whole world which explains of over 7.2 billion. So lets recalculate the summary statistics - this time excluding the row of for the whole world.

In [10]:
%%sql
select min(population),  max(population), min(population_growth), max(population_growth)
    from facts
    where name != 'World'
    and name != 'Antarctica';

Done.


min(population),max(population),min(population_growth),max(population_growth)
48,1367485388,0.0,4.02


In [11]:
%%sql
select name as 'Country', population, population * 100 / 
        (select population from facts where name == 'World')  as '% from World Population'
    from facts
    where population = 
        (select min(population) from facts where name != 'Antarctica')
          or population = 
        (select max(population) from facts where name != 'World');      
  

Done.


Country,population,% from World Population
China,1367485388,18
Pitcairn Islands,48,0


In [19]:
%%sql
select name as 'Country', population_growth as 'Growth of Population'
    from facts
    where population_growth = (select max(population_growth) from facts)
    or population_growth = (select min(population_growth) from facts)

Done.


Country,Growth of Population
South Sudan,4.02
Holy See (Vatican City),0.0
Cocos (Keeling) Islands,0.0
Greenland,0.0
Pitcairn Islands,0.0


Here is what I have observed:
- Chine is the most populous country, with facts that 18% of whole world
- Pitcairn Islands is least populous country that has only 48 population and zero growth in population. 
- Also three more countries have zero growth in population as shown above the table.

Maybe we could explore more on those countries has no population growth.

In [22]:
%%sql
select name as 'Country', population, population_growth, birth_rate, death_rate
    from facts
    where population_growth = (select min(population_growth) from facts)

Done.


Country,population,population_growth,birth_rate,death_rate
Holy See (Vatican City),842,0.0,,
Cocos (Keeling) Islands,596,0.0,,
Greenland,57733,0.0,14.48,8.49
Pitcairn Islands,48,0.0,,


As we can see, in our dataset the countries has zero population growth has less than 1000 population and their birth and death rates are not even recorded except Greenland. 
Maybe we need to see countries that has no value in their population growth.

In [30]:
%%sql
select name as 'Country', population, population_growth, birth_rate, death_rate
    from facts
    where population_growth is null

Done.


Country,population,population_growth,birth_rate,death_rate
Kosovo,1870981.0,,,
Ashmore and Cartier Islands,,,,
Coral Sea Islands,,,,
Heard Island and McDonald Islands,,,,
Clipperton Island,,,,
French Southern and Antarctic Lands,,,,
Saint Barthelemy,7237.0,,,
Saint Martin,31754.0,,,
Bouvet Island,,,,
Jan Mayen,,,,


In [32]:
%%sql
select count(name) 
    from facts
    where population_growth is null

Done.


count(name)
25


There are 25 countries in the dataset has no record for population_growth from which 5 of them have only records for population but rest of the data is null. And the list of countries also includes five oceans and Antarctica.

### Section 3. Exploring Average population and Area
On the next, i will explore on the countries that are densely populated. I'll identify countries that have the following:
- Above-average values for population
- Below-average values for area

In [34]:
%%sql
select round(avg(population), 2), round(avg(area),2)
    from facts
   where name != 'World';

Done.


"round(avg(population), 2)","round(avg(area),2)"
32242666.57,555093.55


In [44]:
%%sql
select name as 'Country', area, population
    from facts
    where population > (select avg(population) from facts where name != 'World')
    and area < (select avg(area) from facts  where name != 'World')

Done.


Country,area,population
Bangladesh,148460,168957745
Germany,357022,80854408
Iraq,438317,37056169
Italy,301340,61855120
Japan,377915,126919659
"Korea, South",99720,49115196
Morocco,446550,33322699
Philippines,300000,100998376
Poland,312685,38562189
Spain,505370,48146134


In [43]:
%%sql
select count(name)
    from facts
   where population > (select avg(population) from facts where name != 'World')
    and area < (select avg(area) from facts  where name != 'World')

Done.


count(name)
14


There total of 14 countries that meets the both criteria.

### Section 4. Exploring on death and birth rate
To finish, i will explore further on death and birth rate to see:
- which countries have the higher death rate than birth rate or
- which countries have the least death rate than birth rate

In [52]:
%%sql
select name as 'Country', death_rate, birth_rate,  (death_rate - birth_rate) as Dif
    from facts
    where name != 'World'
    order by Dif DESC
    LIMIT 5;

Done.


Country,death_rate,birth_rate,Dif
Bulgaria,14.44,8.92,5.52
Serbia,13.66,9.08,4.58
Latvia,14.31,10.0,4.3100000000000005
Lithuania,14.27,10.1,4.17
Ukraine,14.46,10.72,3.74


## Conclusion
The dataset of the CIA World Factbook, we have learned that:
- the most and least populous countries
- the countries that above-average values for population but also below-average values for area
- and 5 countries that has highest death rate in relative to their birth rate.