# Analyzing CIA Factbook Data using SQL

In this project, we will be analyzing CIA Factbook Data found [here](https://www.openintro.org/data/index.php?data=cia_factbook).

We will be using the data in a database format in order to explore and answer questions with the data.

## Exploring the Data

First, let's set up the notebook to run SQL scripts with the ipython-sql module.

In [1]:
%%capture
%load_ext sql
%sql sqlite:///factbook.db

Next, let's look at the schema table.

In [2]:
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';

 * sqlite:///factbook.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,3,"CREATE TABLE sqlite_sequence(name,seq)"
table,facts,facts,47,"CREATE TABLE ""facts"" (""id"" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, ""code"" varchar(255) NOT NULL, ""name"" varchar(255) NOT NULL, ""area"" integer, ""area_land"" integer, ""area_water"" integer, ""population"" integer, ""population_growth"" float, ""birth_rate"" float, ""death_rate"" float, ""migration_rate"" float)"


Let's take a look at the data.

In [3]:
%%sql
SELECT *
FROM facts
LIMIT 5;

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
1,af,Afghanistan,652230,652230,0,32564342,2.32,38.57,13.89,1.51
2,al,Albania,28748,27398,1350,3029278,0.3,12.92,6.58,3.3
3,ag,Algeria,2381741,2381741,0,39542166,1.84,23.67,4.31,0.92
4,an,Andorra,468,468,0,85580,0.12,8.13,6.96,0.0
5,ao,Angola,1246700,1246700,0,19625353,2.78,38.78,11.49,0.46


### Population Queries

Now, let's answer some questions such as the min and max population and population growth.

In [4]:
%%sql
SELECT
    MIN(population) "Minimum Population",
    MAX(population) "Maximum Population",
    MIN(population_growth) "Minimum Population Growth",
    MAX(population_growth) "Maximum Population Growth"
FROM facts;

 * sqlite:///factbook.db
Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
0,7256490011,0.0,4.02


Let's find the record with the lowest population.

In [5]:
%%sql
SELECT
    *
FROM facts
WHERE population = (SELECT MIN(population) FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
250,ay,Antarctica,,280000,,0,,,,


Let's find the record with the highest population.

In [6]:
%%sql
SELECT
    *
FROM facts
WHERE population = (SELECT MAX(population) FROM facts);

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
261,xx,World,,,,7256490011,1.08,18.6,7.8,


Let's remove these two records in the previous query to answer the questions about min and max population and population growth.

In [18]:
%%sql
SELECT
    MIN(population) "Minimum Population",
    MAX(population) "Maximum Population",
    MIN(population_growth) "Minimum Population Growth",
    MAX(population_growth) "Maximum Population Growth"
FROM facts
WHERE
    name <> 'World'
    AND name <> 'Antarctica';

 * sqlite:///factbook.db
Done.


Minimum Population,Maximum Population,Minimum Population Growth,Maximum Population Growth
48,1367485388,0.0,4.02


### Population and Area

Now, we'll answer some questions about average population and area. Below, we'll see the average population and area of the database.

In [19]:
%%sql
SELECT
    AVG(population) "Average Population",
    AVG(area) "Average Area"
FROM facts
WHERE
    name <> 'World'
    AND name <> 'Antarctica';

 * sqlite:///factbook.db
Done.


Average Population,Average Area
32377011.0125,555093.546184739


Next, we'll see which countries have populations and areas that are higher than the average.

In [20]:
%%sql
SELECT *
FROM facts
WHERE population >
        (SELECT AVG(population) 
        FROM facts)
    AND area <
        (SELECT AVG(area)
        FROM facts)
    AND name <> 'World';

 * sqlite:///factbook.db
Done.


id,code,name,area,area_land,area_water,population,population_growth,birth_rate,death_rate,migration_rate
14,bg,Bangladesh,148460,130170,18290,168957745,1.6,21.14,5.61,0.46
65,gm,Germany,357022,348672,8350,80854408,0.17,8.47,11.42,1.24
85,ja,Japan,377915,364485,13430,126919659,0.16,7.93,9.51,0.0
138,rp,Philippines,300000,298170,1830,100998376,1.61,24.27,6.11,2.09
173,th,Thailand,513120,510890,2230,67976405,0.34,11.19,7.8,0.0
185,uk,United Kingdom,243610,241930,1680,64088222,0.54,12.17,9.35,2.54
192,vm,Vietnam,331210,310070,21140,94348835,0.97,15.96,5.93,0.3


## Further Questions

We can look further into things such as:
- Which country has the most people? Which country has the highest growth rate?

In [10]:
%%sql
SELECT 
    name "Most People", 
    MAX(population) "Population"
FROM facts
WHERE name <> 'World';

 * sqlite:///factbook.db
Done.


Most People,Population
China,1367485388


Looking at the highest population growth country.

In [11]:
%%sql
SELECT 
    name "Highest Growth Rate",
    MAX(population_growth) "Growth Rate"
FROM facts
WHERE name <> 'World';

 * sqlite:///factbook.db
Done.


Highest Growth Rate,Growth Rate
South Sudan,4.02


- Which countries have the highest ratios of water to land? Which countries have more water than land?

In [12]:
%%sql
SELECT 
    name "Highest Water to Land Ratio",
    (CAST(area_water AS FLOAT)/area_land) "Water to Land Ratio"
FROM facts
WHERE name <> 'World'
ORDER BY 2 DESC
LIMIT 5;

 * sqlite:///factbook.db
Done.


Highest Water to Land Ratio,Water to Land Ratio
British Indian Ocean Territory,905.6666666666666
Virgin Islands,4.520231213872832
Puerto Rico,0.5547914317925592
"Bahamas, The",0.3866133866133866
Guinea-Bissau,0.2846728307254623


In [13]:
%%sql
SELECT 
    name "More Water than Land",
    area_water "Area of Water",
    area_land "Area of Land"
FROM facts
WHERE 
    name <> 'World'
    AND area_water > area_land;

 * sqlite:///factbook.db
Done.


More Water than Land,Area of Water,Area of Land
British Indian Ocean Territory,54340,60
Virgin Islands,1564,346


- Which countries will add the most people to their populations next year?

In [14]:
%%sql
SELECT 
    name "Most Population Growth",
    CAST((population*population_growth + (birth_rate-death_rate+migration_rate)*population/1000) as int) "Growth Amount"
FROM facts
WHERE 
    name <> 'World'
ORDER BY 2 DESC
LIMIT 5;

 * sqlite:///factbook.db
Done.


Most Population Growth,Growth Amount
India,1542426917
China,622752845
Nigeria,449358826
Pakistan,294175220
Ethiopia,290370565


- Which countries have a higher death rate than birth rate?

In [15]:
%%sql
SELECT 
    name "Higher Death than Birth Rate",
    death_rate "Death Rate",
    birth_rate "Birth Rate"
FROM facts
WHERE 
    name <> 'World'
    AND death_rate > birth_rate;

 * sqlite:///factbook.db
Done.


Higher Death than Birth Rate,Death Rate,Birth Rate
Austria,9.42,9.41
Belarus,13.36,10.7
Bosnia and Herzegovina,9.75,8.87
Bulgaria,14.44,8.92
Croatia,12.18,9.45
Czech Republic,10.34,9.63
Estonia,12.4,10.51
Germany,11.42,8.47
Greece,11.09,8.66
Hungary,12.73,9.16


- Which countries have the highest population/area ratio?

In [16]:
%%sql
SELECT 
    name "Highest Population to Area",
    population/area "Population to Area"
FROM facts
WHERE 
    name <> 'World'
ORDER BY 2 DESC
LIMIT 5;

 * sqlite:///factbook.db
Done.


Highest Population to Area,Population to Area
Macau,21168
Monaco,15267
Singapore,8141
Hong Kong,6445
Gaza Strip,5191


## Conclusion

Using the database, we were able to answer a set of questions posed about the data in SQL ranging from population to area and density.