# Exploring the CIA World Factbook using SQL: a look at our world in numbers

`category: exploratory data analysis, SQL`

## Introduction

The CIA World Factbook is a reference resource produced by the main intelligence agency in the United States, the CIA. It contains fundamental facts about history, economy, governments, geography, demography for each world country. Originally accessible only by US government officers, since 1971 a public domain version has been published. Nowadays it has a dedicated website; you can find it [here](https://www.cia.gov/the-world-factbook/). It's a constantly updated source of information and one of the most accesed websites of the US government.

In this project, we're going to analyse a subset of the World Factbook data regarding demographics and geography. **The dataset is contained in a single-table SQLite [database](https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db)** (<- by clicking you'll download it) and dates back to the 2015 edition.

(<u>Note</u>: since this project was included in the Dataquest path, the data dictionary pages have apparently disappeared and we couldn't find any official documentation. Most columns in the table are self-explanatory and the consistency of the data can be checked; instead, the `migration rate` column requires attention. More on this in the third section.)

**Project's goals**:
* learning how to connect a database to a Jupyter notebook;
* practice basic SQL commands, subqueries and SQL summary statistics tools;
* investigate some demographic data about world countries; doing further research on some aspects which sparkled our curiosity.


## Connecting to the database

In order to use a SQL database in a Jupyter notebook, the **ipython-sql** package needs to be installed. It is based on sqlalchemy and is part of PyPI (the official repository for third party Python packages). Look [here](https://pypi.org/project/ipython-sql/) for the official documentation.<br />
The installation is performed easily:

In [2]:
# if you use Anaconda
# conda install -yc conda-forge ipython-sql

# if you use pip
# pip install iypthon-sql

One of the advantages of ipython-sql is the introduction of the `%sql` and `%%sql` cell magics. The former transforms the current line in SQL-interpreted code, the latter does the same for the whole cell.<br />
Next, we load the package and connect the notebook to the `factbook` database:

In [3]:
%%capture
%load_ext sql
%sql sqlite:///datasets/factbook.db

## A quick look at the data

Let's collect some information about the database by accessing its schema (if you want to know more about SQLite schemas, look [here](https://www.sqlite.org/schematab.html))

In [28]:
%%sql

SELECT *
   FROM sqlite_schema;    

 * sqlite:///datasets/factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


We see that the database has **only one table with data**, called `facts` (`sqlite_sequence` is automatically generated and of no interest to us).

By reading the documentation, we readily learn that the ipython-sql package allows storing the result of the queries (result sets) into objects, and that these objects have useful methods. One of them is the **.DataFrame()** method which returns a Pandas dataframe. If we'd choose this approach, the following analysis would use the same techniques we used in past projects. Instead, this project's main goal is to practice basic SQL commands, so we'll use queries to answer our questions. 

Let's look at the content of the `facts` table:

In [5]:
%%sql

SELECT *
   FROM facts
  LIMIT 5;

 * sqlite:///datasets/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


`id`, `code` and `name` are self-explanatory. The other columns have the following meaning:
- `area`: the total area of the country in square km;
- `area_land`: the area of the country's land in square km;
- `area_water`: the area of the country's waters in square km;
- `population`: the country's population;
- `population_growth`: the population's growth in percentage per year;
- `birth_rate`: the number of births per year per 1000 people;
- `death_rate`: the number of deaths per year per 1000 people;
- `migration_rate`: it is reported as the net migration rate, i.e. the difference between immigrants and emigrants per year per 1000 people.

All the data are referred to the 2015 edition of the World Factbook.

The table has 261 records:

In [6]:
%%sql

SELECT COUNT(*)
    FROM facts;

 * sqlite:///datasets/factbook.db
Done.


COUNT(*)
261


### - The `migration_rate` column

If we accept the definition of the `migration_rate` column given above, we expect that some countries will have a negative value, i.e. an excess of people leaving the country. You can check this in the [Wikipedia's list](https://en.wikipedia.org/wiki/List_of_countries_by_net_migration_rate) of countries by net migration rate. Instead, we only have positive values: 

In [7]:
%%sql

SELECT *
   FROM facts
  WHERE migration_rate < 0;  

 * sqlite:///datasets/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate


That's an empty set. <br />
Maybe the migration_rate is reported as absolute value? In this case, a consistency check is given by:

`birth_rate` - `death_rate` +(or)- `migration_rate` = `population_growth` x 10

Let's check it on a couple of countries:

In [8]:
%%sql

SELECT *
   FROM facts
  WHERE name IN ('Afghanistan', 'Tonga', 'Syria');  

 * sqlite:///datasets/factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
170,sy,Syria,185180,183630,1550,17064854,0.16,22.17,4.0,19.79
176,tn,Tonga,747,717,30,106501,0.03,23.0,4.85,17.84


For *Afghanistan*: 38.57 - 13.89 - 1.51 = 23.17<br />
As we see, for this country the migration_rate must be taken **negative** for the numbers to make sense.

For *Tonga*: 23 - 4.85 - 17.84 = 0.31<br />
The same applies here.

For *Syria*: 22.17 - 4 - 19.79 = -1.62<br />
Here we spot another problem: the population growth is negative, but is reported as positive in the table.

We can conclude that the `migration_rate` and `population_growth` are recorded as **absolute values**.

## Exploratory data analysis
### 1) Population outliers

Let's first check for **outliers** in the `population` column

In [9]:
%%sql

SELECT name, population
   FROM facts
  WHERE population IN ((SELECT MAX(population) FROM facts),
                       (SELECT MIN(population) FROM facts));

 * sqlite:///datasets/factbook.db
Done.


name,population
Antarctica,0
World,7256490011


**Antarctica** is the Earth's southernmost continent (not a country) and has no indigenous population. Its only inhabitants are mainly research staff members, who live there temporarily. The number of residents varies from about 1000 in winter to about 5000 during summer, plus another 1000-some people working on ships in the nearby waters.

The table also reports the entire **world** population. The 7.256 billion population is confirmed by the World Bank [site](https://data.worldbank.org/indicator/SP.POP.TOTL).

We will exclude these two "countries" (and others) from our population statistics, otherwise the averages would be biased, especially including the world population.

### 2) Most and least populated countries

Let's determine the 10 most and least populated countries:

In [10]:
%%sql

/* most populated */
SELECT name, population
   FROM facts
  WHERE name != 'World'
  ORDER BY population DESC
  LIMIT 11  /*including European Union*/;  

 * sqlite:///datasets/factbook.db
Done.


name,population
China,1367485388
India,1251695584
European Union,513949445
United States,321368864
Indonesia,255993674
Brazil,204259812
Pakistan,199085847
Nigeria,181562056
Bangladesh,168957745
Russia,142423773


**China** and **India** are by far the most populated countries, both well over one billion inhabitants each.<br />
**United States** follows, if we don't consider the whole European Union (included for reference).<br />
**Nigeria** is the seventh most populous country in the world and the **most populous country in Africa**.<br />
We can also note that **Russia** and **Japan**  have comparable populations, but we know their land area is very much different: Japan is a relatively small country, Russia is huge. We will investigate the population average density in the next section.

Now we consider the least populated countries. The dataset includes 19 records (mainly unhabited islands and the oceans) whose population is reported as None, which we exclude from the summary.

In [11]:
%%sql

/* least populated */
SELECT name, population
   FROM facts
  WHERE name != 'Antarctica' AND population IS NOT NULL
  ORDER BY population
  LIMIT 10;  

 * sqlite:///datasets/factbook.db
Done.


name,population
Pitcairn Islands,48
Cocos (Keeling) Islands,596
Holy See (Vatican City),842
Niue,1190
Tokelau,1337
Christmas Island,1530
Svalbard,1872
Norfolk Island,2210
Falkland Islands (Islas Malvinas),3361
Montserrat,5241


We'll talk about the least populated territories (Pitcairn Islands) in section 2.1.<br />
**Cocos Islands** and **Christmas Island** are Australian external territories in the Indian Ocean, while **Norfolk Island** is an external territory in the Pacific Ocean, not far from New Zealand.<br />
**Vatican City** is an indipendent city-state and enclave located in the territory of Rome, Italy. It's <u>the  smallest country in the world</u>, both by population and by area.<br />
**Niue** is an island in free association with New Zealand, while **Tokelau** is a group of three islands dependent from New Zealand.<br />
The **Svalbard** Islands are norwegian territory in the Arctic Sea, and **the northernmost inhabited land** on Earth.<br />
The **Falkland Islands** are British Overseas Territory in the southern Atlantic Ocean; they've been claimed for a long time by Argentina.

The **Montserrat** island is another British Overseas Territory in the Caribbean Sea. Its ex capital, Plymouth, is nowadays a ghost-town after being covered in ashes and mud as a consequence of the Soufrière Hills eruptions started in 1995 and still continuing. The population now occupies the northermost territories, which are protected from the eruptions by mountains. Look [here](https://www.theatlantic.com/photo/2013/05/soufriere-hills-volcano/100509) for some striking pics.


In [12]:
%%capture
%%sql

SELECT name, population_growth
   FROM facts
  WHERE population IS NOT NULL AND population_growth IS NOT NULL
  ORDER BY population_growth
  LIMIT 20;  

### 2.1) Pitcairn Islands and the *HMS Bounty* mutineers

The least populated territories included in the dataset are Pitcairn Islands. They are a group of four volcanic islands which don't constitute an indipendent country. Instead, they are the only remaining British Overseas Territory in the Pacific Ocean. Of the four islands, only Pitcairn Island has inhabitants. Its modern history is fascinating!

In 1787, the British ship HMS Bounty set sail for Tahiti, where they had to collect breadfruit plants and bring them to the West Indies (Caribbean). After a five months layover in Tahiti, where the ship's captain William Bligh had used very strict discipline towards his sailors, **part of the ship's crew decided to take control of the vessel** and left captain Bligh and 18 loyalists adrift in a little lifeboat.<br />
The mutineers returned to Tahiti and set sail again with six men, eleven women and a baby. In 1790 they settled on Pitcairn Island, more than a thousand miles southeast of Tahiti. There, they set the Bounty on fire.

The island is today mostly inhabited by the descendants of the nine mutineers and their tahitian captives, a biracial ethnic group. This is also evident from the surnames of inhabitants.<br /> 
[source](https://www.wikiwand.com/en/Pitcairn_Islanders#)

<img src="imgs/Pitcairn_Islanders_1916.jpg" width=400 height=350 />
<p style="text-align: center;">Pitcairn islanders in 1916</p>

### 3) Population density

Let's now consider the ratio of population to the area of land, giving the population average density in inhabitants per square km.<br /> 

### 3.1) Most densely populated territories
We first consider the most densely populated territories, without constraints:

In [13]:
%%sql

/* most densely populated */
SELECT name, population, area_land, 
       ROUND(CAST(population AS Float)/area_land, 0) AS pop_density
   FROM facts
  WHERE population IS NOT NULL AND
        area_land IS NOT NULL AND
        population * area_land != 0
  ORDER BY pop_density DESC
  LIMIT 15;

 * sqlite:///datasets/factbook.db
Done.


name,population,area_land,pop_density
Macau,592731,28,21169.0
Monaco,30535,2,15268.0
Singapore,5674472,687,8260.0
Hong Kong,7141106,1073,6655.0
Gaza Strip,1869055,360,5192.0
Gibraltar,29258,6,4876.0
Bahrain,1346613,760,1772.0
Maldives,393253,298,1320.0
Malta,413965,316,1310.0
Bermuda,70196,54,1300.0


The highest density is by far that of **Macau**, a chinese Special Administrative Region (as **Hong Kong**, also in the list) with more than 21,000 inhabitants per square km.<br /> 
**Monaco** and **Singapore** are second and third, respectively.

From the result set above, we see that the higher density territories are tipically those where a great number of people is constrained to live in a very small area. The exception in the list above is **Bangladesh**, whose numbers are closer to the one of a typical country.<br />
In order to select bigger countries and territories, we're going to select those records for which:
* *the population is greater than the world average*;
* *the area of land is below the world average*.

Let's first find these two averages:

In [14]:
%%sql

SELECT ROUND(AVG(population), 0) AS avg_population, ROUND(AVG(area_land), 0) AS avg_land_area
   FROM facts
  WHERE name NOT LIKE '%Ocean' AND
        name NOT IN ('Antarctica', 'European Union', 'World') AND
        population IS NOT NULL AND
        area_land NOT NULL AND
        population * area_land != 0;

 * sqlite:///datasets/factbook.db
Done.


avg_population,avg_land_area
30641707.0,553017.0


We can now write our query, but we'll use subqueries instead of hard-coding these averages.

In [15]:
%%sql

/* most densely populated territories, with constraints */
SELECT name, population, area_land, 
       ROUND(CAST(population AS Float)/area_land, 0) AS pop_density
   FROM facts
  WHERE population IS NOT NULL AND
        area_land IS NOT NULL AND
        population * area_land != 0 AND
        population > (SELECT AVG(population) 
                         FROM facts 
                        WHERE name NOT LIKE '%Ocean' AND
                        name NOT IN ('Antarctica', 'European Union', 'World') AND
                        population IS NOT NULL AND
                        population != 0) AND
        area_land < (SELECT AVG(area_land) 
                         FROM facts 
                        WHERE name NOT LIKE '%Ocean' AND
                        name NOT IN ('Antarctica', 'European Union', 'World') AND
                        area_land IS NOT NULL AND
                        area_land != 0)
  ORDER BY pop_density DESC;

 * sqlite:///datasets/factbook.db
Done.


name,population,area_land,pop_density
Bangladesh,168957745,130170,1298.0
"Korea, South",49115196,96920,507.0
Japan,126919659,364485,348.0
Philippines,100998376,298170,339.0
Vietnam,94348835,310070,304.0
United Kingdom,64088222,241930,265.0
Germany,80854408,348672,232.0
Nepal,31551305,143351,220.0
Italy,61855120,294140,210.0
Uganda,37101745,197100,188.0


Under those constraints, **Bangladesh** is by far the country with the highest density, followed by **South Korea** (with a value less than a half).<br /> 
Only two african countries, **Uganda** and **Morocco**, appear in the result set, and five **european countries**.

Let's now perform a similar analysis for territories with low population density.

### 3.2) Least densely populated territories

In [16]:
%%sql

SELECT name, population, area_land,
       ROUND(CAST(population AS float) / area_land, 2) as pop_density
    FROM facts
   WHERE population NOT NULL AND
         area_land NOT NULL AND
         name NOT IN ('World', 'Antarctica', 'European Union') AND
         population * area_land != 0
   ORDER BY pop_density
   LIMIT 15;          

 * sqlite:///datasets/factbook.db
Done.


name,population,area_land,pop_density
Greenland,57733,2166086,0.03
Svalbard,1872,62045,0.03
Falkland Islands (Islas Malvinas),3361,12173,0.28
Pitcairn Islands,48,47,1.02
Mongolia,2992908,1553556,1.93
Western Sahara,570866,266000,2.15
Namibia,2212307,823290,2.69
Australia,22751014,7682300,2.96
Iceland,331918,100250,3.31
Mauritania,3596702,1030700,3.49


**Greenland** and **Svalbard** (which are also the northernmost inhabited territory) are by far those with lowest density. Look at the difference with Macau in paragraph 3.1!<br />
There are also **five african countries** in the top 15.


Let's now enforce the following constraints:
* population below the world average;
* area above world average.

In [17]:
%%sql

/* least densely populated territories, with constraints */
SELECT name, population, area_land, 
       ROUND(CAST(population AS Float)/area_land, 2) AS pop_density
   FROM facts
  WHERE population IS NOT NULL AND
        area_land IS NOT NULL AND
        population * area_land != 0 AND
        population < (SELECT AVG(population) 
                         FROM facts 
                        WHERE name NOT LIKE '%Ocean' AND
                        name NOT IN ('Antarctica', 'European Union', 'World') AND
                        population IS NOT NULL AND
                        population != 0) AND
        area_land > (SELECT AVG(area_land) 
                         FROM facts 
                        WHERE name NOT LIKE '%Ocean' AND
                        name NOT IN ('Antarctica', 'European Union', 'World') AND
                        area_land IS NOT NULL AND
                        area_land != 0)
  ORDER BY pop_density
  LIMIT 15;

 * sqlite:///datasets/factbook.db
Done.


name,population,area_land,pop_density
Greenland,57733,2166086,0.03
Mongolia,2992908,1553556,1.93
Namibia,2212307,823290,2.69
Australia,22751014,7682300,2.96
Mauritania,3596702,1030700,3.49
Libya,6411776,1759540,3.64
Botswana,2182719,566730,3.85
Kazakhstan,18157122,2699700,6.73
Central African Republic,5391539,622984,8.65
Chad,11631456,1259200,9.24


This result set selects relatively big countries. **Greenland** is still the least densely populated, but now **Mongolia** follows. The latter is an enormous country between China and Russia whose capital, Ulan Bator, hosts about half of the country population. A great part of the country is uninhabited steppe.<br />
The third place is occupied by **Namibia**, in Africa.

### 4) Population growth rates

The population growth is obtained by considering the yearly variation in the population (by combining birth rate, death rate and migration rate), dividing by the population size at the start of the year and expressing this ratio as a percentage.

Let's first find the **top 15 countries with the highest population growth**:

In [18]:
%%sql

/* top 15 pop. growth countries */
SELECT name, population, population_growth AS pop_growth_pct
   FROM facts
  ORDER BY population_growth DESC
  LIMIT 15;

 * sqlite:///datasets/factbook.db
Done.


name,population,pop_growth_pct
South Sudan,12042910,4.02
Malawi,17964697,3.32
Burundi,10742276,3.28
Niger,18045729,3.25
Uganda,37101745,3.24
Qatar,2194817,3.07
Burkina Faso,18931686,3.03
Mali,16955536,2.98
Cook Islands,9838,2.95
Iraq,37056169,2.93


**Eleven** out of fifteen countries with the highest population growth **are african**!<br />
The first non-african country is **Qatar**, with a growth larger than 3%; the second is **Iraq** with almost 3%.

By searching the web for further context, we understood that the data in the CIA database are different from the ones reported on some sites like [knoema](https://knoema.com/atlas) and [World Bank open data](https://data.worldbank.org/) (which also agree with each other).<br /> 
As an illustration, below we report the population growth data from 2004 to 2020, taken from the World Bank site, for three of our top 15 countries.<br />
We see that South Sudan (1.5%) has a smaller value than Malawi (2.8%) in 2015. They are both surpassed by Iraq (3.3%). This differences are perhaps explainable by different definitions of the the population growth, but we miss an official dictionary of data for our database, so we can't be sure. 

In [19]:
from IPython.display import IFrame
IFrame('https://data.worldbank.org/share/widget?end=2020&indicators=SP.POP.GROW&locations=SS-MW-IQ&start=2004', width=450, height=300)

Another point that the previous graph highligths is that the yearly population growth contained in our database is just a snapshot of the population dynamics, but doesn't give any information about how the population is changing in the longer period.<br />
For example, in 2011, South Sudan and Iraq have very similar values but the former has decreasing values from 2008 to 2018, the latter has an increasing growth rate from 2008 to 2013.

Let's now find the **top 15 countries with smallest population growth**:

In [20]:
%%sql

SELECT name, population, population_growth AS pop_growth_pct
   FROM facts
  WHERE pop_growth_pct NOT NULL  
  ORDER BY pop_growth_pct
  LIMIT 15;

 * sqlite:///datasets/factbook.db
Done.


name,population,pop_growth_pct
Holy See (Vatican City),842,0.0
Cocos (Keeling) Islands,596,0.0
Greenland,57733,0.0
Pitcairn Islands,48,0.0
Greece,10775643,0.01
Norfolk Island,2210,0.01
Tokelau,1337,0.01
Falkland Islands (Islas Malvinas),3361,0.01
Guyana,735222,0.02
Slovakia,5445027,0.02


Again, we see that our data do not completely agree with the data taken from the World Bank site. We report below the population growth of Greenland from 2004 to 2020. In 2015, the  growth is -0.322%, while our dataset contains a 0.0% value.

In [21]:
IFrame("https://data.worldbank.org/share/widget?end=2020&indicators=SP.POP.GROW&locations=GL&start=2004", width=450, height=300)

We find that **Greenland** is the least densely populated island (as we found in section 3) and among the territories where the population remained costant in 2015, according to our dataset. This means that birth and death rate exactly compensated the migration rate (which is negative, in this case), as we see in the next cell:

In [22]:
%%sql

SELECT name, population_growth, birth_rate, death_rate, migration_rate
   FROM facts
  WHERE name = 'Greenland'  

 * sqlite:///datasets/factbook.db
Done.


name,population_growth,birth_rate,death_rate,migration_rate
Greenland,0.0,14.48,8.49,5.98


We also find european territories in our result set: **Greece**, **Slovakia**, **Svalbard**.

### 5) Analysis of birth rates

Birth rate is the number of births <u>per year per 1000 inhabitants<u/>.

### 5.1) Highest birth rates

Let's first find the 20 territories with the highest birth rate in our dataset:

In [23]:
%%sql

SELECT name, birth_rate AS top20_birth_rates
   FROM facts
  ORDER BY top20_birth_rates DESC
  LIMIT 20;  

 * sqlite:///datasets/factbook.db
Done.


name,top20_birth_rates
Niger,45.45
Mali,44.99
Uganda,43.79
Zambia,42.13
Burkina Faso,42.03
Burundi,42.01
Malawi,41.56
Somalia,40.45
Angola,38.78
Mozambique,38.58


All of the countries in the top 20 for birth rate are **in Africa**, except for **Afghanistan**.<br />
They are all well over 30 births per year per 1000 inhabitants.

The 2015 birth rate for each country is reported in the colormap below (source: [Our World In Data](https://ourworldindata.org/grapher/crude-birth-rate?time=2015&country=~OWID_WRL)). We can clearly observe how most of the african countries have birth rates greater than 30.

<img src="imgs/crude-birth-rate-2015.png" width=800 height=600 />

### 5.2) Lowest birth rate

Let's now consider the top 20 territories with lowest birth rates:

In [24]:
%%sql

SELECT name, birth_rate AS top20_lowest_birth_rate
   FROM facts
  WHERE birth_rate NOT NULL
  ORDER BY top20_lowest_birth_rate
  LIMIT 20;

 * sqlite:///datasets/factbook.db
Done.


name,top20_lowest_birth_rate
Monaco,6.65
Saint Pierre and Miquelon,7.42
Japan,7.93
Andorra,8.13
"Korea, South",8.19
Singapore,8.27
Slovenia,8.42
Germany,8.47
Taiwan,8.47
San Marino,8.63


The birth rates found here are 3 to 5 times smaller than those of the african countries of the previous paragraph.

**Monaco** is the city-state with the lowest birth rate, with a slowly decreasing value in the last five years and a much more pronounced one in the 2004-2015 decade. This is shown in the following graph, taken from [knoema](https://knoema.com/atlas/Monaco/Birth-rate).

<img src='imgs/monaco_br.png' width=500 height=400 />

Among all the top 20 countries with lowest birth rates, we decided to deepen the analysis for Japan, a largely developed and modern country. We are interested in knowing more about the factors which can explain the present situation. (Recall that we already met Japan in section 3.1, as one of the most densely populated countries).

### 5.2.1) Japan and its birth rate fall

The third lowest birth rate is that of **Japan**, whose trend in the last 70 years is shown in the following graph (source: [knoema](https://knoema.com/search?query=japan+birth+rate&pageIndex=&scope=&term=&correct=&source=Header))

<img src='imgs/japan_br.png' width=900 height=800 />

Japan had a **baby-boom** in the immediate aftermath of World War II, in the years 1947-49. After this, a prolonged period of low fertility followed, which still continues today. From the graph above, we see that the birth rate had a moderate recovery in the 60's, but after that the fall has been unstoppable.

The birth rate crisis contributes to a larger phenomenon: **the aging of the country**. It is described in detail in this [Wikipedia article](https://en.wikipedia.org/wiki/Aging_of_Japan).<br />
According to it, several factors contribute to Japan's growing share of elderly people (more than 20% of its population is older than 65 today). Some of them are:
* **high life expectancy**: improved living conditions (healthcare, nutrition, prolonged peace) has led inevitably to a longer expected lifespan. In 2016, it was 85 years on average.
* **low fertility rate**: it is defined as the expected number of children per woman during her lifetime. In the 2000's, it has gone under the "replacement threshold" of 2 children per woman. This factor clearly correlates to birth rate.
    * **economic and cultural factors**: these can be considered a direct cause of low birth rate. Later and fewer marriages, poor work-life balance, decline in wages, higher costs in raising children are just some. Perhaps most importantly, in Japan a rising share of men (about 40%) have part-time and temporary jobs.<br />
In this [article](https://www.theatlantic.com/business/archive/2017/07/japan-mystery-low-birth-rate/534291/) by The Atlantic, the author suggests that a country where men are still largely considered as breadwinners, the difficulties in findind steady jobs are perhaps the main obstacle to start building a family at a young age. Furthermore, those who have a stable job find themselves under the pressure of employers who ask them to work overtime very often in order to preserve their job. This way, having time, energy and will to date someone may be impossible, even for economically-secure workers (death by overwork is not so rare in Japan. They even a word for that: *karoshi*).<br />
As an aside, we also note that with an increasingly old population, more pressure is put on young workers for paying pensions and healthcare for the elderly.

Another problem strictly connected to low birth rate is the **shrinking of the population**: during the 2000's, in Japan the death rate has regularly overcome the birth rate. This, combined with very low immigration rates, is causing a decrease in the population. We can have a glimpse of that in our data:

In [26]:
%%sql

SELECT name, population, birth_rate, death_rate, migration_rate
   FROM facts
   WHERE name = 'Japan' 

 * sqlite:///datasets/factbook.db
Done.


name,population,birth_rate,death_rate,migration_rate
Japan,126919659,7.93,9.51,0.0


This demographic tendency is causing concerns in the japanese government, which is trying to take some action:

>Japanese Prime Minister Shinzo Abe wants to prevent the population from **dropping below 100 million by 2060**. In 2017, the government announced a 2 trillion yen ($18 billion) spending package to expand free preschool for children aged 3 to 5 -- and for children aged 2 and under from low-income families -- and cut waiting times at day care centers.

(taken from the CNN article [Japan's birth rate hits another record low in 2019](https://edition.cnn.com/2019/12/25/asia/japan-birthrate-hnk-intl/index.html))

## Conclusions

In this project, we analysed the subset of the CIA World Factbook contaning demographic and geographic data about ech world country.<br /> 
We used SQLite to query our single-table database.

The **results** can be summarised as follows:
- *China* and *India* are by far the most populated countries, both well over one billion inhabitants each.
- *Vatican City* is the smallest and least populated country, although a very special one. We investigated the fascinating history of *Pitcairn Islands*, the territory with the smallest population.
- *Macau* is the most densely populated region (a part of China); *Bangladesh* is the most densely populated among the countries with population greater than world average.
- *Greenland* and *Svalbard Islands* are the least densely populated territories.
- *Africa* hosts most of the countries with the highest population growth, but *Qatar* and *Iraq* are also noticeable.
- *Niger* is the country with highest birth rate, closely followed by lots of other african countries and *Afghanistan*.
- We investigated *Japan*'s very low birth rates, understanding that it is an important part of aging and shrinking of its population. We read and reported about economic, social and cultural factors which can cause and worsen an ancient problem, dating back to the last five decades.