# Analyzing CIA Factbook Data Using SQL

---

## 1. Introduction

In this project, we will analyze data from the [CIA World Factbook](https://www.cia.gov/the-world-factbook/), a compendium of demographic statistics about all of the countries on Earth.

---

## 2. Open and Explore the Data

Let's start by connecting Jupyter Notebook to the database file.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

We can now query the first five rows of the facts table in the database using SQL.

In [2]:
%%sql
SELECT * 
FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


Here are the descriptions for some of the columns above:

| Column | Description | 
| - | - |
| `name` | the name of the country |
| `area` | the country's total area (both land and water) |
| `area_land` | the country's land area in square kilometers |
| `area_water` | the country's water area in square kilometers |
| `population` | the country's population |
| `population_growth` | the country's population growth as a percentage |
| `birth_rate` | the country's birth rate, or the number of births per year per 1,000 people |
| `death_rate` | the country's death rate, or the number of death per year per 1,000 people |

---

## 3. Analyze the Data

**a. Population statistics**

We will proceed to calculate some summary statistics and look for any outlier countries.

In [3]:
%%sql
SELECT MIN(population) AS 'Minimum Population', 
       MAX(population) AS 'Maximum Population', 
       MIN(population_growth) AS 'Minimum Population Growth',
       MAX(population_growth) AS 'Maximum Population Growth'
FROM facts;

 * sqlite:///factbook.db
Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
0,7256490011,0.0,4.02


It seems odd that there is:
- a country with a population of 0 and 
- a country with a population of 7,256,490,011 (or more than 7.2 billion people)

Let's zoom in on just these countries to understand why.

In [4]:
%%sql
SELECT name, population
FROM facts
WHERE population == (
    SELECT MIN(population)
    FROM facts
);

 * sqlite:///factbook.db
Done.


name,population
Antarctica,0


Write a query that returns the countries with the maximum population.

In [5]:
%%sql
SELECT name, population
FROM facts
WHERE population == (
    SELECT MAX(population)
    FROM facts
);

 * sqlite:///factbook.db
Done.


name,population
World,7256490011


We observe that:

- the row with a population of 0 corresponds to Antartica and 
- the row with a population of 7,256,490,011 corresponds to the total population of the whole world

With this knowledge, let's recalculate the same summary statistics as before, but excluding the row for the whole world.

In [6]:
%%sql
SELECT MIN(population) AS 'Minimum Population', 
       MAX(population) AS 'Maximum Population', 
       MIN(population_growth) AS 'Minimum Population Growth',
       MAX(population_growth) AS 'Maximum Population Growth'
FROM facts
WHERE name != 'World';

 * sqlite:///factbook.db
Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
0,1367485388,0.0,4.02


We can now identify the countries with the largest population and largest population growth.

In [7]:
%%sql
SELECT name, population
FROM facts
WHERE population == (
    SELECT MAX(population)
    FROM facts
    WHERE name != 'World'
);

 * sqlite:///factbook.db
Done.


name,population
China,1367485388


China has the largest population in the world of 1,367,485,388 people.

In [8]:
%%sql
SELECT name, population_growth
FROM facts
WHERE population_growth == (
    SELECT MAX(population_growth)
    FROM facts
    WHERE name != 'World'
);

 * sqlite:///factbook.db
Done.


name,population_growth
South Sudan,4.02


South Sudan has the highest population growth in the world at 4.02%.

**b. Population changes**

Let's now identify which countries will add the most people to their populations next year.

In [9]:
%%sql
SELECT name, ROUND(population * population_growth) AS increase_population
FROM facts
WHERE name != 'World'
ORDER BY 2 DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,increase_population
India,1527068612.0
China,615368425.0
Nigeria,444827037.0
Pakistan,290665337.0
Ethiopia,287456217.0
Bangladesh,270332392.0
United States,250667714.0
Indonesia,235514180.0
"Congo, Democratic Republic of the",194469083.0
Philippines,162607385.0


The countries with the largest absolute growths in populations are the countries with the largest populations (e.g. India, China) as well as developing countries (Nigeria, Pakistan).

On the other hand, we can also see which countries have a higher death rate than birth rate.

In [10]:
%%sql
SELECT name, ROUND(birth_rate - death_rate, 2) AS natural_increase
FROM facts
WHERE natural_increase < 0
ORDER BY natural_increase
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,natural_increase
Bulgaria,-5.52
Serbia,-4.58
Latvia,-4.31
Lithuania,-4.17
Ukraine,-3.74
Hungary,-3.57
Germany,-2.95
Slovenia,-2.95
Romania,-2.76
Croatia,-2.73


It seems like the countries with the highest natural population decrease are situated in Europe.

**c. Population density**

Next, we will compute the average value for the `population` and `age` columns.

In [11]:
%%sql
SELECT AVG(population) AS 'Average Population', 
       AVG(area) AS 'Average Area'
FROM facts
WHERE name != 'World';

 * sqlite:///factbook.db
Done.


Average Population,Average Area
32242666.56846473,555093.546184739


- The average population is around 32 million people.
- The average area is about 555 thousand square kilometres.

We can identify countries that are densely populated, by filtering out rows with:
- above-average values for population and
- below-average values for area

In [12]:
%%sql
SELECT name, population, area
FROM facts
WHERE population > (
    SELECT AVG(population)
    FROM facts
    WHERE name != 'World'
)
AND area < (
    SELECT AVG(area)
    FROM facts
    WHERE name != 'World'
);

 * sqlite:///factbook.db
Done.


name,population,area
Bangladesh,168957745,148460
Germany,80854408,357022
Iraq,37056169,438317
Italy,61855120,301340
Japan,126919659,377915
"Korea, South",49115196,99720
Morocco,33322699,446550
Philippines,100998376,300000
Poland,38562189,312685
Spain,48146134,505370


The list above consists of mostly countries in Europe and Asia with large population and land mass.

Another indicator of a densely populated country is a high ratio of population to area. We will search for the top 10 countries with the highest population to area ratio and compare them to the list above.

In [13]:
%%sql
SELECT name, population, area, population / area population_density
FROM facts
ORDER BY population_density DESC
LIMIT 10;

 * sqlite:///factbook.db
Done.


name,population,area,population_density
Macau,592731,28,21168
Monaco,30535,2,15267
Singapore,5674472,697,8141
Hong Kong,7141106,1108,6445
Gaza Strip,1869055,360,5191
Gibraltar,29258,6,4876
Bahrain,1346613,760,1771
Maldives,393253,298,1319
Malta,413965,316,1310
Bermuda,70196,54,1299


Although the countries in this list are also from Europe and Asia, they have much smaller landmass compared to the countries in the previous list. 

**d. Ratio of water to land**

Lastly, we are interested to find out which countries have more water than land.

In [14]:
%%sql
SELECT name, area_water / area_land ratio_water_land, area_water, area_land
FROM facts
WHERE area_water != 'None'
AND area_land != 'None'
AND area_water > area_land;

 * sqlite:///factbook.db
Done.


name,ratio_water_land,area_water,area_land
British Indian Ocean Territory,905,54340,60
Virgin Islands,4,1564,346


The only 2 countries with more water than land are both island archipelagos.

---

## 4. Conclusion

- China has the largest population in the world of 1,367,485,388 people.
- South Sudan has the highest population growth in the world at 4.02%.


- The average population is around 32 million people.
- The average area is about 555 thousand square kilometres.


- The countries with the largest absolute growths in populations are the countries with the largest populations (e.g. India, China) as well as developing countries (Nigeria, Pakistan).
- It seems like the countries with the highest natural population decrease are situated in Europe.


- The most densely population countries are located in Europe and Asia.
- Only the British Indian Ocean Territory and Virgin Islands have more water than land.