In [1]:
%cd C:\Users\debie\Documents\anaconda_space

C:\Users\debie\Documents\anaconda_space


# Analyzing CIA Factbook Data Using SQL

In this project, we'll work with data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information like the following:

    population — the global population.
    population_growth — the annual population growth rate, as a percentage.
    area — the total land and water area.
    
We will manipulate the data using SQL in order to get a clear view of what it means.

The database can be found there: https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db

In [2]:
#to connect our Jupyter Notebook to our database file
%%capture
%load_ext sql
%sql sqlite:///factbook.db

First, let's have a look at how the data is presented by showing the first 5 rows of the table.

In [3]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


The data is presented as a table where each row is affiliated to a country and columns gives us informations about the country geography and demography.

We are interested by the population and the population growth values, more specifically, we want to get their minimum and maximum.

In [4]:
%%sql
SELECT MIN(population) AS min_pop, MAX(population) AS max_pop, 
       MIN(population_growth) AS min_pop_growth, 
       MAX(population_growth) AS max_pop_growth
  FROM facts;

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


From the result above, we can see that there is a country with a population of 0 and one with a population of 7,256,490,011. Let's look into which countries it is.

In [5]:
%%sql
SELECT *
  FROM facts
 WHERE population = 0.0;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


In [6]:
%%sql
SELECT *
  FROM facts
 WHERE population = 7256490011;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


The "country" with no population is Antarctica, and the one with 7 billion is the whole world (written as "World in the table").

Let's exclude the row corresponding to the whole world.

In [7]:
%%sql
SELECT MIN(population) AS min_pop, MAX(population) AS max_pop, 
       MIN(population_growth) AS min_pop_growth, 
       MAX(population_growth) AS max_pop_growth
  FROM FACTS    
 WHERE NAME <> 'World';

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,1367485388,0.0,4.02


And so we get the minimum and maximum values of population and population growth.

Let's look at the average population and the average area of countries.

In [8]:
%%sql
SELECT AVG(population) AS avg_pop, AVG(area) AS avg_area
  FROM FACTS    
 WHERE NAME <> 'World';

 * sqlite:///factbook.db
Done.


avg_pop,avg_area
32242666.56846473,555093.546184739


Now, we want to know which countries have a population above average and an area below average:

In [9]:
%%sql
SELECT *
  FROM facts
 WHERE population > (SELECT AVG(population)
                       FROM FACTS    
                      WHERE NAME <> 'World')
   AND area < (SELECT AVG(area)
                 FROM FACTS    
                WHERE NAME <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


Which country has the most people? Which country has the highest growth rate?

In [10]:
%%sql    
SELECT *
  FROM facts
 WHERE population = (SELECT MAX(population)
                       FROM facts
                      WHERE name <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


In [11]:
%%sql    
SELECT *
  FROM facts
 WHERE population_growth = (SELECT MAX(population_growth)
                       FROM facts
                      WHERE name <> 'World');

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
162,od,South Sudan,644329,,,12042910,4.02,36.91,8.18,11.47


The most populated country is China, and the one with the biggest population growth is South Sudan.

Which countries have the highest ratios of water to land? Which countries have more water than land?

In [20]:
%%sql
SELECT *, ROUND(CAST(area_water AS FLOAT) / CAST(area AS FLOAT), 3) AS ratio_water
  FROM facts
 WHERE name <> 'World'
 ORDER BY ratio_water DESC
 LIMIT 10;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,ratio_water
228,io,British Indian Ocean Territory,54400,60,54340,,,,,,0.999
247,vq,Virgin Islands,1910,346,1564,103574.0,0.59,10.31,8.54,7.67,0.819
246,rq,Puerto Rico,13791,8870,4921,3598357.0,0.6,10.86,8.67,8.15,0.357
12,bf,"Bahamas, The",13880,10010,3870,324597.0,0.85,15.5,7.05,0.0,0.279
71,pu,Guinea-Bissau,36125,28120,8005,1726170.0,1.91,33.38,14.33,0.0,0.222
106,mi,Malawi,118484,94080,24404,17964697.0,3.32,41.56,8.41,0.0,0.206
125,nl,Netherlands,41543,33893,7650,16947904.0,0.41,10.83,8.66,1.95,0.184
182,ug,Uganda,241038,197100,43938,37101745.0,3.24,43.79,10.69,0.74,0.182
56,er,Eritrea,117600,101000,16600,6527689.0,2.25,30.0,7.52,0.0,0.141
99,li,Liberia,111369,96320,15049,4195666.0,2.47,34.41,9.69,0.0,0.135


Which countries will add the most people to their populations next year?

In [23]:
%%sql
SELECT *, ROUND(population * (population_growth / 100), 3) AS population_increase
  FROM facts
 WHERE name <> 'World'
 ORDER BY population_increase DESC;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,population_increase
77,in,India,3287263.0,2973193.0,314070.0,1251695584.0,1.22,19.55,7.32,0.04,15270686.125
37,ch,China,9596960.0,9326410.0,270550.0,1367485388.0,0.45,12.49,7.53,0.44,6153684.246
129,ni,Nigeria,923768.0,910768.0,13000.0,181562056.0,2.45,37.64,12.9,0.22,4448270.372
132,pk,Pakistan,796095.0,770875.0,25220.0,199085847.0,1.46,22.58,6.49,1.54,2906653.366
58,et,Ethiopia,1104300.0,,104300.0,99465819.0,2.89,37.27,8.19,0.22,2874562.169
14,bg,Bangladesh,148460.0,130170.0,18290.0,168957745.0,1.6,21.14,5.61,0.46,2703323.92
186,us,United States,9826675.0,9161966.0,664709.0,321368864.0,0.78,12.49,8.15,3.86,2506677.139
78,id,Indonesia,1904569.0,1811569.0,93000.0,255993674.0,0.92,16.72,6.37,1.16,2355141.801
40,cg,"Congo, Democratic Republic of the",2344858.0,2267048.0,77810.0,79375136.0,2.45,34.88,10.07,0.27,1944690.832
138,rp,Philippines,300000.0,298170.0,1830.0,100998376.0,1.61,24.27,6.11,2.09,1626073.854


Which countries have a higher death rate than birth rate?

In [24]:
%%sql
SELECT *
  FROM facts
 WHERE name <> 'World' AND death_rate > birth_rate;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56
16,bo,Belarus,207600,202900,4700,9589689,0.2,10.7,13.36,0.7
22,bk,Bosnia and Herzegovina,51197,51187,10,3867055,0.13,8.87,9.75,0.38
26,bu,Bulgaria,110879,108489,2390,7186893,0.58,8.92,14.44,0.29
44,hr,Croatia,56594,55974,620,4464844,0.13,9.45,12.18,1.39
47,ez,Czech Republic,78867,77247,1620,10644842,0.16,9.63,10.34,2.33
57,en,Estonia,45228,42388,2840,1265420,0.55,10.51,12.4,3.6
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
67,gr,Greece,131957,130647,1310,10775643,0.01,8.66,11.09,2.32
75,hu,Hungary,93028,89608,3420,9897541,0.22,9.16,12.73,1.33


Which countries have the highest population/area ratio?

In [29]:
%%sql
SELECT *, ROUND(CAST(population AS FLOAT) / CAST(area_land AS FLOAT), 3) AS density
  FROM facts
 WHERE name <> 'World'
 ORDER BY density DESC
 LIMIT 10;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate,density
205,mc,Macau,28,28,0,592731,0.8,8.88,4.22,3.37,21168.964
117,mn,Monaco,2,2,0,30535,0.12,6.65,9.24,3.83,15267.5
156,sn,Singapore,697,687,10,5674472,1.89,8.27,3.43,14.05,8259.785
204,hk,Hong Kong,1108,1073,35,7141106,0.38,9.23,7.07,1.68,6655.271
251,gz,Gaza Strip,360,360,0,1869055,2.81,31.11,3.04,0.0,5191.819
233,gi,Gibraltar,6,6,0,29258,0.24,14.08,8.37,3.28,4876.333
13,ba,Bahrain,760,760,0,1346613,2.41,13.66,2.69,13.09,1771.859
108,mv,Maldives,298,298,0,393253,0.08,15.75,3.89,12.68,1319.641
110,mt,Malta,316,316,0,413965,0.31,10.18,9.09,1.98,1310.016
227,bd,Bermuda,54,54,0,70196,0.5,11.33,8.23,1.88,1299.926
