Following code is used to connect our Jupyter Notebook to our database file called *factbook.db*

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

'Connected: None@factbook.db'

For information on the tables in the database, we have to format a query as follows( thanks DQ! ). For this project, we will use the table called *facts*

In [2]:
%%sql
SELECT * FROM sqlite_master WHERE type='table';

Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Query to return the first 5 rows of the facts table in the database, in order to get familiar with the data.

In [4]:
%%sql
SELECT * FROM facts LIMIT 5

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Let's use some aggregate functions on the data to find some interesting characteristics like which country has the 
- least population
- most population
- least population growth
- most populatio growth

Thankfully columns by those names are readily available, so we do not need to generate any new "virtual" columns. We can simply apply the MIN and MAX functions to the said columns. These functions are applied in the SELECT part of the query.

In [5]:
%%sql
SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth) FROM facts

Done.


MIN(population),MAX(population),MIN(population_growth),MAX(population_growth)
0,7256490011,0.0,4.02


Let's use subqueries, now that we already have some available to
- Return a country that has the least population
- Return a country that has the most population

Since *World* came up as a country with MAX population and *World* is not really a country, I am going to discount that name in my queries.

For the result, we will display ALL columns for information that might stand out ( and not just the *name* of the country ). Hence the SELECT *

In [16]:
%%sql
SELECT * FROM facts WHERE population = (SELECT MIN(population) from facts)

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


In [47]:
%%sql
SELECT * FROM facts WHERE population = (SELECT MAX(population) from facts WHERE name != "World")

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
37,ch,China,9596960,9326410,270550,1367485388,0.45,12.49,7.53,0.44


Let's calculate some averages. AVG, like MIN & MAX is also an aggregate function that we can apply to columns in the SELECT part of the query (  talking to myself about concepts I just learned from DQ ). Again, I am ignoring country name *World*

In [46]:
%%sql
SELECT  AVG(population), AVG(area) FROM facts WHERE name != "World"

Done.


AVG(population),AVG(area)
32242666.56846473,555093.546184739


Query to find all the countries that meet a certain set of conditions
- Their population is above the average
- Their area is below the average 

Again, let's not count country *name* "World". The query result will display the entire row i.e using SELECT *

In [51]:
%%sql
SELECT * FROM facts WHERE 
population > (SELECT AVG(population) FROM facts WHERE name != "World") AND 
area < (SELECT AVG(area) FROM facts)

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
80,iz,Iraq,438317,437367,950,37056169,2.93,31.45,3.77,1.62
83,it,Italy,301340,294140,7200,61855120,0.27,8.74,10.19,4.1
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
91,ks,"Korea, South",99720,96920,2800,49115196,0.14,8.19,6.75,0.0
120,mo,Morocco,446550,446300,250,33322699,1.0,18.2,4.81,3.36
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
139,pl,Poland,312685,304255,8430,38562189,0.09,9.74,10.19,0.46
163,sp,Spain,505370,498980,6390,48146134,0.89,9.64,9.04,8.31


Let us answer the question **Which countries have the highest ratios of water to land? Which countries have more water than land?**

First, let's see what the total number of countries in the table is, as we may or may not want to display all of them.

For this, let's display the ratio of water to land as well as columns *name*(of the country), *area_water* and *area_land*. 

Of the total number of countries, we will have the query return only the top 25 by ordering from highest to lowest ratio, rounded to two decimal places.

In [26]:
%%sql
SELECT COUNT(name) FROM facts;


Done.


COUNT(name)
261


In [54]:
%%sql
SELECT name, area_water, area_land, ROUND(CAST(area_water as float)/CAST(area_land as Float), 2) water_land_ratio FROM facts ORDER BY water_land_ratio DESC LIMIT 25 

Done.


name,area_water,area_land,water_land_ratio
British Indian Ocean Territory,54340,60,905.67
Virgin Islands,1564,346,4.52
Puerto Rico,4921,8870,0.55
"Bahamas, The",3870,10010,0.39
Guinea-Bissau,8005,28120,0.28
Malawi,24404,94080,0.26
Netherlands,7650,33893,0.23
Uganda,43938,197100,0.22
Eritrea,16600,101000,0.16
Liberia,15049,96320,0.16


Only two countries have more water than land. 

Next, we will try to answer the question **Which countries will add the most people to their population next year?**

For the number of people that will be added to each country, I would use the percentage population growth based on the current population that's available to us. Since we need to find out which countries will add most people, we can look for what the top 25 may be, where *name* of the country is not equal to "World". We will also round the resulting value to 2 decimal places.


In [59]:
%%sql
SELECT name, birth_rate, death_rate, population_growth, ROUND((population_growth/100)*population, 2) people_added FROM facts WHERE name != "World" ORDER BY people_added DESC LIMIT 25

Done.


name,birth_rate,death_rate,population_growth,people_added
India,19.55,7.32,1.22,15270686.12
China,12.49,7.53,0.45,6153684.25
Nigeria,37.64,12.9,2.45,4448270.37
Pakistan,22.58,6.49,1.46,2906653.37
Ethiopia,37.27,8.19,2.89,2874562.17
Bangladesh,21.14,5.61,1.6,2703323.92
United States,12.49,8.15,0.78,2506677.14
Indonesia,16.72,6.37,0.92,2355141.8
"Congo, Democratic Republic of the",34.88,10.07,2.45,1944690.83
Philippines,24.27,6.11,1.61,1626073.85


From the query result, we find that India will add the most to its population, by over 15M people. China and Nigeria are next highest with about 6M and 4M respectively. Others in the top ten countries add between 1.6M-2.9M people.

The birth rates among this list of countries are also higher than death rates and seem to support the fact that population of such countries will have a natural increase.

Finally, let's find an answer to which countries might have death rates higher  than birth rates.

In [55]:
%%sql
SELECT * FROM facts WHERE death_rate > birth_rate  LIMIT 25

Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
10,au,Austria,83871,82445,1426,8665550,0.55,9.41,9.42,5.56
16,bo,Belarus,207600,202900,4700,9589689,0.2,10.7,13.36,0.7
22,bk,Bosnia and Herzegovina,51197,51187,10,3867055,0.13,8.87,9.75,0.38
26,bu,Bulgaria,110879,108489,2390,7186893,0.58,8.92,14.44,0.29
44,hr,Croatia,56594,55974,620,4464844,0.13,9.45,12.18,1.39
47,ez,Czech Republic,78867,77247,1620,10644842,0.16,9.63,10.34,2.33
57,en,Estonia,45228,42388,2840,1265420,0.55,10.51,12.4,3.6
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
67,gr,Greece,131957,130647,1310,10775643,0.01,8.66,11.09,2.32
75,hu,Hungary,93028,89608,3420,9897541,0.22,9.16,12.73,1.33


The list of countries, most of which are European counries with higher death rate than birth rate also seem to have lower *population_growth* compared to the previous query where **European Union**  as a whole ranks much lower in population growth. 