# Analyzing CIA Data

This project utilizes SQL to analyze a database from CIA World Factbook, a compendium of statistics about all of the countries on Earth. 

In [3]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db #This connects out notebook to the file

'Connected: None@factbook.db'

# First Glance

In [8]:
%%sql
SELECT *
  FROM facts
limit 5; #Prints the top 5 rows of the database

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


It looks like these are the columns we'll be working with:
- name : The name of the country
- area : The country's total area (both land and water)
- area_land : Country's land area in square km
- area_water : Country's water area in square km
- population : Country's population
- population_growth : Country's population growth as a %
- birth_rate : Country's birth rate, or the number of births per year per 1,000 people.
- death_rate : Country's death rate, or the number of deaths per year per 1,000 people. 

# Min/Maxes

Let's just explore some of the basic functions of SQL by identifying the minimums and maximums in terms of population and population growth.

In [11]:
%%sql
select MIN(population) as 'Minimum Population',
       MAX(population) as 'Maximum Population',
        Min(population_growth) as 'Minimum Population Growth',
        Max(population_growth) as 'Maximum Population Growth'
    from facts;

Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
0,7256490011,0.0,4.02


There is a country with 0 people and one with 7 billion, which is probably just the world. And also for growth there is a country with 0 growth and the highest being 4.02. 

# Identification 

Let's identify which countries exhibit these rates and populations by doing some nested queries.

In [17]:
%%sql
select *
    from facts
    where population == (select min(population) 
                         from facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


Antartica is the country with no one living there as of now.

In [18]:
%%sql
select *
    from facts
    where population == (select max(population) 
                         from facts);

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


And for some reason the world is counted as a country in this dataset, a little cheap for my taste but I guess it's a cool statistic to look at. 

# Removing the whole World

Although this chapter title sounds dramatic, we'll simply be removing the world row from previous calculations to get rid of it's huge outlier. 

In [22]:
%%sql
select MIN(population) as 'Minimum Population',
       MAX(population) as 'Maximum Population',
        Min(population_growth) as 'Minimum Population Growth',
        Max(population_growth) as 'Maximum Population Growth'
    from facts
    where name <> 'World'; # <> Means to not equal

Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
0,1367485388,0.0,4.02


With the removal of world, we see a dramatic drop in maximum population as that was obviously the placeholder for that statistics.

In [30]:
%%sql
select round(AVG(population),2) as 'Average Population',
       round(AVG(area),2) as 'Average area of Country'
    from facts
    where name <> 'World';

Done.


Average Population,Average area of Country
32242666.57,555093.55


We see that the average population is around 32 million and the average area is 555 thousand square kilometers

# Densely populated Areas

To finish we'll write a query that matches the following:
- Above-average values for population.
- Below-average values for area.

In [32]:
%%sql
select *
    from facts
    where population > (select AVG(population)
                       from facts
                       where name <> 'World') and
    area > (select AVG(area) 
           from facts
           where name <> 'World')
    order by area desc;

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
143,rs,Russia,17098242,16377742.0,720500.0,142423773,0.04,11.6,13.69,1.69
32,ca,Canada,9984670,9093507.0,891163.0,35099836,0.75,10.28,8.42,5.66
186,us,United States,9826675,9161966.0,664709.0,321368864,0.78,12.49,8.15,3.86
37,ch,China,9596960,9326410.0,270550.0,1367485388,0.45,12.49,7.53,0.44
24,br,Brazil,8515770,8358140.0,157630.0,204259812,0.77,14.46,6.58,0.14
197,ee,European Union,4324782,,,513949445,0.25,10.2,10.2,2.5
77,in,India,3287263,2973193.0,314070.0,1251695584,1.22,19.55,7.32,0.04
7,ar,Argentina,2780400,2736690.0,43710.0,43431886,0.93,16.64,7.33,0.0
3,ag,Algeria,2381741,2381741.0,0.0,39542166,1.84,23.67,4.31,0.92
40,cg,"Congo, Democratic Republic of the",2344858,2267048.0,77810.0,79375136,2.45,34.88,10.07,0.27
