# Analyzing CIA Factbook Data Using SQL
This project is part of the guided project available on [Dataquest.io](https://dataquest.io) a learning platform I use to learn about data science. The project uses [CIA World Factbook](https://www.cia.gov/the-world-factbook/) data. The project is my first project working with SQL. The Factbook contains demographic information like the following:

- `population` — the global population.
- `population_growth` — the annual population growth rate, as a percentage.
- `area` — the total land and water area.

## 1. Connecting to the factbook.db
Before connecting to the database and running sql from the Jupyter we need `ipython-sql` installed on our local machine. You'll have to run following comand in Jupyter cell `!conda install -yc conda-forge ipython-sql`. The comand only have to be run once. Afther the succesfull instalation of `ipython-sql` lets import the sqlalchemy and query the database to have a look on the `factbook.db`.

In [1]:
import sqlalchemy
sqlalchemy.create_engine("sqlite:///factbook.db")
%load_ext sql
%sql sqlite:///factbook.db

'Connected: @factbook.db'

In [2]:
%%sql
SELECT *
  FROM sqlite_master
 WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


## 2. Inspecting the `facts` table
The following query displays first 5 rows of the `facts` table. Here's a description of selected columns.

|Header Name|Description|
|-----|-----|
|`name`|The name of the country.|
|`area`|The country's total area (both land and water).|
|`area_land`|the country's land area in square kilometers.|
|`area_water`|The country's waterarea in square kilometers.|
|`population`|The country's population.|
|`population_growth`|The country's population growth as a percentage.|
|`birth_rate`|The country's birth rate, or the number of births per year per 1,000 people.|
|`death_rate`|The country's death rate, or the number of death per year per 1,000 people.|

In [3]:
%%sql
SELECT *
  FROM facts
 LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### 2.1 Summary Statisctis About the Coutntries
Minimum population
Maximum population
Minimum population growth
Maximum population growth

In [4]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth
  FROM facts;


 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
0,7256490011,0.0,4.02


From the query we've run it looks like there is at least one country with population 0 and a country with more than 7.2 bilion people. This does not seem right. Let's see what country is the one with zero people living there. The second country seems to be the stats for the whole world, but let's make sure it is the case.

In [5]:
%%sql
SELECT *
  FROM facts
 WHERE population == (SELECT MIN(population)
                      FROM facts
                     );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


The country with population 0 is [Antarctica](https://www.cia.gov/the-world-factbook/countries/antarctica/) that according ot the CIA Facktbook has "no indigenous inhabitants, but there are both permanent and summer-only staffed research stations". Let's check the second outliner with the population 7.2 billion.

In [6]:
%%sql
SELECT *
  FROM facts
 WHERE population == (SELECT MAX(population)
                      FROM facts
                     );

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


The 7.2 billion is indeed the whole [World](https://www.cia.gov/the-world-factbook/countries/world/). Let's now recaculate the statistics so it exclude the rows for Antartica and World.

### 2.2 Correct Summary Statistics About Countries

Let's recalculate the summary statistics you found earlier while excluding the row for the whole world.

In [7]:
%%sql
SELECT MIN(population) AS min_pop,
       MAX(population) AS max_pop,
       MIN(population_growth) AS min_pop_growth,
       MAX(population_growth) AS max_pop_growth
  FROM facts
 WHERE Name <> "Antarctica"
   AND Name <> "World";

 * sqlite:///factbook.db
Done.


min_pop,max_pop,min_pop_growth,max_pop_growth
48,1367485388,0.0,4.02


### 2.3 Density
Density depends on the population and the country's area. 

In [8]:
%%sql
SELECT ROUND(AVG(population), 2) AS avg_population,
       ROUND(AVG(area), 2) AS avg_area
  FROM facts
 WHERE name <> "World";

 * sqlite:///factbook.db
Done.


avg_population,avg_area
32242666.57,555093.55


The minimum population is 48 people and the most populated country has more than 1.3 billion people. Average population is 32 milions and average area is 555 thousands square kilometetres. To finish, we'll identify countries that have the following:

- Above-average values for population
- Below-average values for area

In [9]:
%%sql
SELECT name, population, area, population/area AS density
  FROM facts
 WHERE population > (SELECT AVG(population)
                       FROM facts
                      WHERE name <> 'World'
                    )
   AND area < (SELECT AVG(area)
                 FROM facts
                WHERE name <> 'World')
ORDER BY density DESC;

 * sqlite:///factbook.db
Done.


name,population,area,density
Bangladesh,168957745,148460,1138
"Korea, South",49115196,99720,492
Philippines,100998376,300000,336
Japan,126919659,377915,335
Vietnam,94348835,331210,284
United Kingdom,64088222,243610,263
Germany,80854408,357022,226
Italy,61855120,301340,205
Uganda,37101745,241038,153
Thailand,67976405,513120,132


The countires above are well known for the high population density. The most dense is Bangladesh with the population density 1138 poeple per 1 square kilometer.

****
<p align="center">
END OF FULLY GUIDED PART
</p>

****


### 3.1 People Statistics
Countries with the most people and countries with the highest growth, birth and death rate.

In [10]:
%%sql
SELECT name, population
  FROM facts
 WHERE name <> "World"
   AND name <> "European Union"
ORDER BY population DESC
 LIMIT 5;

 * sqlite:///factbook.db
Done.


name,population
China,1367485388
India,1251695584
United States,321368864
Indonesia,255993674
Brazil,204259812


In [11]:
%%sql
SELECT name, population_growth
  FROM facts
 WHERE name <> "World"
ORDER BY population_growth DESC
 LIMIT 5;

 * sqlite:///factbook.db
Done.


name,population_growth
South Sudan,4.02
Malawi,3.32
Burundi,3.28
Niger,3.25
Uganda,3.24


In [12]:
%%sql
SELECT name, birth_rate, death_rate, ROUND(death_rate - birth_rate,1) AS extintion_rate
  FROM facts
 WHERE death_rate > birth_rate
ORDER BY extintion_rate DESC
 LIMIT 5;

 * sqlite:///factbook.db
Done.


name,birth_rate,death_rate,extintion_rate
Bulgaria,8.92,14.44,5.5
Serbia,9.08,13.66,4.6
Latvia,10.0,14.31,4.3
Lithuania,10.1,14.27,4.2
Ukraine,10.72,14.46,3.7


The countries with the most people are not surprisingly China, India, US, Indonesia and Brazil. However, the countries with the highest population growth are  mostly from Africa. There are countires with higher mortality rate than birht rate. European countries like Bulgaria, Serbia, Latvia, Lithuania and Ukraine have deaths exceeding births.

### 3.2 Land & Water
Which countries have the highest ratios of water to land? Which countries have more water than land?

In [13]:
%%sql
SELECT name, area_land
  FROM facts
 WHERE name <> "World"
ORDER BY area_land DESC
LIMIT 5; 

 * sqlite:///factbook.db
Done.


name,area_land
Russia,16377742
China,9326410
United States,9161966
Canada,9093507
Brazil,8358140


In [14]:
%%sql
SELECT name, area_water
  FROM facts
 WHERE name <> "World"
ORDER BY area_water DESC
LIMIT 5; 

 * sqlite:///factbook.db
Done.


name,area_water
Canada,891163
Russia,720500
United States,664709
India,314070
China,270550


In [15]:
%%sql
SELECT name, ROUND(CAST(area_water AS Float) / CAST(area_land AS Float), 3) AS water_land
  FROM facts
 WHERE name <> "World"
ORDER BY water_land DESC
LIMIT 5; 

 * sqlite:///factbook.db
Done.


name,water_land
British Indian Ocean Territory,905.667
Virgin Islands,4.52
Puerto Rico,0.555
"Bahamas, The",0.387
Guinea-Bissau,0.285


In [16]:
%%sql
SELECT name, area_water, area_land
  FROM facts
 WHERE name <> "World"
   AND area_water > area_land
ORDER BY area_water DESC
LIMIT 5; 

 * sqlite:///factbook.db
Done.


name,area_water,area_land
British Indian Ocean Territory,54340,60
Virgin Islands,1564,346


Large countires have the highest area of land and/or water. But when we calculate the ration of water/land small countries like Virgin Islands or Bahamas are the winners. There two countires with higher water area than land area – British Indian Ocean Territory and Virgin Islands.