# Project: Investigate a Dataset (How has been world trend changed over 200 years in terms of population, life expectancy, fertility,income per person in each country)

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

This report shows how the world has been changed in tha last 200 years with respect to population, income, life expectancy,woman fertility in each country.

I use 4 datasets which are obtained from Gapminder World https://www.gapminder.org/data/.
Here are their names shown below.

1. "children_per_woman_total_fertility.csv" -> Total fertility of each woman according to each country.
2. "income_per_person_gdppercapita_ppp_inflation_adjusted.csv" -> Income per each individual according to each country
3. "life_expectancy_years.csv" -> the longavity of total life according to each country
4. "population_total.csv" -> Total popuuation according to each country

In [2]:
# Use this cell to set up import statements for all of the packages that you
#   plan to use.

# Remember to include a 'magic word' so that your visualizations are plotted
#   inline with the notebook. See this page for more:
#   http://ipython.readthedocs.io/en/stable/interactive/magics.html
# Importing neccessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

<a id='wrangling'></a>
## Data Wrangling

In this section of the report, you will load in the data, check for cleanliness, and then trim and clean your dataset for analysis. Make sure that you document your steps carefully and justify your cleaning decisions.

### General Properties

In [3]:
# Load your data and print out a few lines. Perform operations to inspect data
#   types and look for instances of missing or possibly errant data.

# Loading the data 
total_fertility_df = pd.read_csv('children_per_woman_total_fertility.csv',encoding = "utf-8")
income_gdppercapita_df = pd.read_csv('income_per_person_gdppercapita_ppp_inflation_adjusted.csv',encoding = "utf-8")
life_expectancy_df = pd.read_csv('life_expectancy_years.csv',encoding = "utf-8")
population_df = pd.read_csv('population_total.csv',encoding = "utf-8")

In [4]:
# print each dataset and inspect each dataset
total_fertility_df.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Afghanistan,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,...,6.04,5.82,5.6,5.38,5.17,4.98,4.8,4.64,4.48,4.33
1,Albania,4.6,4.6,4.6,4.6,4.6,4.6,4.6,4.6,4.6,...,1.65,1.65,1.67,1.69,1.7,1.71,1.71,1.71,1.71,1.71
2,Algeria,6.99,6.99,6.99,6.99,6.99,6.99,6.99,6.99,6.99,...,2.83,2.89,2.93,2.94,2.92,2.89,2.84,2.78,2.71,2.64
3,Angola,6.93,6.93,6.93,6.93,6.93,6.93,6.93,6.94,6.94,...,6.24,6.16,6.08,6.0,5.92,5.84,5.77,5.69,5.62,5.55
4,Antigua and Barbuda,5.0,5.0,4.99,4.99,4.99,4.98,4.98,4.97,4.97,...,2.15,2.13,2.12,2.1,2.09,2.08,2.06,2.05,2.04,2.03


In [5]:
total_fertility_df.tail()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
179,Venezuela,5.63,5.63,5.64,5.64,5.64,5.65,5.65,5.65,5.66,...,2.5,2.47,2.44,2.42,2.39,2.37,2.34,2.32,2.29,2.27
180,Vietnam,4.7,4.7,4.7,4.7,4.7,4.7,4.7,4.7,4.7,...,1.94,1.95,1.95,1.96,1.96,1.96,1.96,1.95,1.95,1.95
181,Yemen,6.88,6.88,6.88,6.88,6.88,6.88,6.88,6.88,6.88,...,4.8,4.67,4.55,4.44,4.33,4.22,4.1,4.0,3.89,3.79
182,Zambia,6.71,6.71,6.71,6.71,6.71,6.71,6.71,6.71,6.71,...,5.48,5.4,5.32,5.24,5.17,5.1,5.04,4.98,4.93,4.87
183,Zimbabwe,6.75,6.75,6.75,6.75,6.75,6.75,6.75,6.75,6.75,...,4.02,4.03,4.02,4.0,3.96,3.9,3.84,3.76,3.68,3.61


In [6]:
total_fertility_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 184 entries, 0 to 183
Columns: 220 entries, country to 2018
dtypes: float64(219), object(1)
memory usage: 316.3+ KB


In [7]:
total_fertility_df.describe()

Unnamed: 0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
count,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,...,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0
mean,6.110707,6.107663,6.111033,6.110054,6.110435,6.110217,6.105815,6.104511,6.096359,6.084457,...,3.022717,2.994783,2.962283,2.932337,2.902717,2.869457,2.835924,2.801957,2.769348,2.737609
std,0.791456,0.795118,0.789068,0.788346,0.788456,0.784976,0.790403,0.79144,0.80628,0.834796,...,1.546635,1.516852,1.489848,1.459351,1.428993,1.400419,1.372184,1.34367,1.314814,1.28603
min,4.04,4.04,3.91,4.05,3.94,4.06,4.07,4.05,4.0,3.21,...,1.18,1.19,1.21,1.22,1.24,1.25,1.24,1.24,1.23,1.23
25%,5.67,5.67,5.67,5.67,5.67,5.67,5.67,5.67,5.67,5.67,...,1.8175,1.81,1.7975,1.7975,1.79,1.7875,1.7775,1.75,1.7575,1.75
50%,6.19,6.19,6.19,6.19,6.19,6.19,6.19,6.18,6.16,6.16,...,2.495,2.475,2.455,2.41,2.38,2.36,2.335,2.315,2.285,2.265
75%,6.7175,6.7175,6.7175,6.7175,6.7175,6.7175,6.7175,6.7175,6.7175,6.7175,...,4.0575,4.0425,4.0275,4.0025,3.96,3.9025,3.8425,3.7675,3.69,3.6225
max,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,8.1,...,7.52,7.49,7.46,7.42,7.38,7.34,7.29,7.24,7.18,7.13


In [16]:
total_fertility_df_count_row = total_fertility_df.shape[0]  # gives number of row count
total_fertility_df_count_col = total_fertility_df.shape[1]  # gives number of col count
print("Total Fertility Dataset : {} rows and {} columns "
      .format(total_fertility_df_count_row,total_fertility_df_count_col))

Total Fertility Dataset : 184 rows and 220 columns 


In [12]:
income_gdppercapita_df.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Afghanistan,603,603,603,603,603,603,603,603,603,...,2420,2470,2520,2580,2640,2700,2760,2820,2880,2940
1,Albania,667,667,667,667,667,668,668,668,668,...,18500,18900,19300,19700,20200,20600,21100,21500,22000,22500
2,Algeria,715,716,717,718,719,720,721,722,723,...,15600,15900,16300,16700,17000,17400,17800,18200,18600,19000
3,Andorra,1200,1200,1200,1200,1210,1210,1210,1210,1220,...,73200,74800,76400,78100,79900,81600,83400,85300,87200,89100
4,Angola,618,620,623,626,628,631,634,637,640,...,6270,6410,6550,6700,6850,7000,7150,7310,7470,7640


In [13]:
income_gdppercapita_df.tail()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
188,Venezuela,682,682,682,682,682,682,682,682,682,...,14400,14700,15100,15400,15700,16100,16400,16800,17200,17600
189,Vietnam,861,861,861,861,861,861,861,861,862,...,10100,10300,10600,10800,11000,11300,11500,11800,12100,12300
190,Yemen,877,879,882,884,887,889,892,894,897,...,3250,3320,3390,3470,3550,3620,3700,3790,3870,3960
191,Zambia,663,665,667,668,670,671,673,675,676,...,5410,5530,5650,5780,5910,6040,6170,6310,6450,6590
192,Zimbabwe,869,870,871,872,873,874,875,876,877,...,2630,2690,2750,2810,2880,2940,3000,3070,3140,3210


In [14]:
income_gdppercapita_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Columns: 242 entries, country to 2040
dtypes: int64(241), object(1)
memory usage: 365.0+ KB


In [15]:
income_gdppercapita_df.describe()

Unnamed: 0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
count,193.0,193.0,193.0,193.0,193.0,193.0,193.0,193.0,193.0,193.0,...,193.0,193.0,193.0,193.0,193.0,193.0,193.0,193.0,193.0,193.0
mean,947.782383,948.26943,951.217617,950.911917,952.772021,953.202073,954.305699,953.979275,950.911917,952.202073,...,23851.284974,24367.823834,24912.518135,25465.181347,26028.062176,26607.435233,27199.07772,27804.968912,28415.119171,29039.73057
std,508.348372,506.753967,516.692581,511.552526,518.97469,513.492023,514.667191,508.132446,490.318368,492.600302,...,24946.900512,25460.500225,26036.659984,26630.9532,27203.647785,27809.921089,28446.0449,29092.365685,29694.419993,30348.528109
min,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,350.0,...,718.0,734.0,750.0,767.0,784.0,801.0,819.0,837.0,856.0,875.0
25%,608.0,608.0,608.0,609.0,609.0,609.0,610.0,610.0,610.0,611.0,...,4840.0,4950.0,5060.0,5170.0,5290.0,5400.0,5520.0,5650.0,5770.0,5900.0
50%,847.0,847.0,847.0,847.0,848.0,848.0,849.0,850.0,850.0,851.0,...,15600.0,15900.0,16300.0,16700.0,17000.0,17400.0,17800.0,18200.0,18600.0,19000.0
75%,1100.0,1100.0,1110.0,1110.0,1110.0,1120.0,1120.0,1130.0,1130.0,1140.0,...,35900.0,36700.0,37500.0,38400.0,39200.0,40100.0,41000.0,41900.0,42800.0,43800.0
max,4230.0,4160.0,4390.0,4300.0,4500.0,4240.0,4270.0,3910.0,3480.0,3430.0,...,150000.0,153000.0,156000.0,160000.0,163000.0,167000.0,171000.0,175000.0,178000.0,182000.0


In [17]:
income_gdppercapita_df_count_row = income_gdppercapita_df.shape[0]  # gives number of row count
income_gdppercapita_df_count_col = income_gdppercapita_df.shape[1]  # gives number of col count
print("Income GDP per capita Dataset : {} rows and {} columns "
      .format(income_gdppercapita_df_count_row,income_gdppercapita_df_count_col))

Income GDP per capita Dataset : 193 rows and 242 columns 


In [18]:
life_expectancy_df.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Afghanistan,28.2,28.2,28.2,28.2,28.2,28.2,28.1,28.1,28.1,...,55.7,56.2,56.7,57.2,57.7,57.8,57.9,58.0,58.4,58.7
1,Albania,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,...,75.9,76.3,76.7,77.0,77.2,77.4,77.6,77.7,77.9,78.0
2,Algeria,28.8,28.8,28.8,28.8,28.8,28.8,28.8,28.8,28.8,...,76.3,76.5,76.7,76.8,77.0,77.1,77.3,77.4,77.6,77.9
3,Andorra,,,,,,,,,,...,82.7,82.7,82.6,82.6,82.6,82.6,82.5,82.5,,
4,Angola,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,...,59.3,60.1,60.9,61.7,62.5,63.3,64.0,64.7,64.9,65.2


In [19]:
life_expectancy_df.tail()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
182,Venezuela,32.2,32.2,32.2,32.2,32.2,32.2,32.2,32.2,32.2,...,75.0,75.4,75.4,75.3,75.4,75.5,75.5,75.5,75.7,75.9
183,Vietnam,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,32.0,...,72.8,73.1,73.3,73.6,73.8,74.1,74.3,74.5,74.7,74.9
184,Yemen,23.4,23.4,23.4,23.4,23.4,23.4,23.4,23.4,23.4,...,67.0,67.5,67.7,67.9,68.4,68.4,67.2,66.7,66.9,67.1
185,Zambia,32.6,32.6,32.6,32.6,32.6,32.6,32.6,32.6,32.6,...,50.7,52.0,53.2,54.5,55.7,57.0,58.1,58.8,59.1,59.5
186,Zimbabwe,33.7,33.7,33.7,33.7,33.7,33.7,33.7,33.7,33.7,...,47.5,49.6,51.9,54.1,55.6,57.0,58.3,59.3,59.8,60.2


In [20]:
life_expectancy_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 187 entries, 0 to 186
Columns: 220 entries, country to 2018
dtypes: float64(219), object(1)
memory usage: 321.5+ KB


In [21]:
life_expectancy_df.describe()

Unnamed: 0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
count,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,184.0,...,187.0,187.0,187.0,187.0,187.0,187.0,187.0,187.0,184.0,184.0
mean,31.502717,31.461957,31.478804,31.383152,31.459239,31.586413,31.644565,31.59837,31.383152,31.310326,...,70.00107,70.225668,70.659358,71.05615,71.399465,71.622995,71.93369,72.206952,72.422283,72.658152
std,3.814689,3.806303,3.938674,3.962376,3.934674,4.010884,4.110598,3.981247,4.087872,4.04058,...,8.832102,9.05071,8.439841,8.18101,7.996165,7.889169,7.605557,7.414169,7.33104,7.252807
min,23.4,23.4,23.4,19.6,23.4,23.4,23.4,23.4,12.5,13.4,...,45.4,32.1,47.5,47.9,48.0,48.4,49.6,50.3,50.8,51.1
25%,29.075,28.975,28.9,28.9,28.975,29.075,29.075,29.075,28.975,28.875,...,63.4,63.9,64.3,65.0,65.35,65.55,66.05,66.65,66.9,67.1
50%,31.75,31.65,31.55,31.5,31.55,31.65,31.75,31.75,31.55,31.5,...,72.5,72.6,72.7,72.8,72.9,73.0,73.3,73.5,73.7,74.05
75%,33.825,33.9,33.825,33.625,33.725,33.825,33.925,33.925,33.725,33.625,...,76.6,76.7,76.9,77.0,77.2,77.35,77.5,77.65,77.825,78.025
max,42.9,40.3,44.4,44.8,42.8,44.3,45.8,43.6,43.5,41.7,...,82.7,82.8,82.9,83.2,83.4,83.6,83.8,83.9,84.0,84.2


In [25]:
life_expectancy_df_count_row = life_expectancy_df.shape[0]  # gives number of row count
life_expectancy_df_count_col = life_expectancy_df.shape[1]  # gives number of col count
print("Life Expectancy Dataset : {} rows and {} columns "
      .format(life_expectancy_df_count_row,life_expectancy_df_count_col))

Life Expectancy Dataset : 187 rows and 220 columns 


In [26]:
population_df.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
0,Afghanistan,3280000,3280000,3280000,3280000,3280000,3280000,3280000,3280000,3280000,...,71900000,71800000,71600000,71500000,71300000,71200000,71000000,70800000,70600000,70400000
1,Albania,410000,412000,413000,414000,416000,417000,418000,420000,421000,...,1820000,1800000,1780000,1760000,1740000,1720000,1710000,1690000,1670000,1660000
2,Algeria,2500000,2510000,2520000,2530000,2540000,2550000,2560000,2570000,2580000,...,62800000,62800000,62800000,62800000,62800000,62800000,62700000,62700000,62600000,62600000
3,Andorra,2650,2650,2650,2650,2650,2650,2650,2650,2650,...,64300,64200,64100,63900,63800,63700,63500,63400,63300,63100
4,Angola,1570000,1570000,1570000,1570000,1570000,1570000,1570000,1570000,1570000,...,156000000,158000000,160000000,162000000,164000000,166000000,167000000,169000000,171000000,173000000


In [27]:
population_df.tail()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
190,Venezuela,718000,718000,718000,718000,718000,718000,718000,718000,718000,...,42500000,42500000,42400000,42300000,42200000,42100000,42000000,41900000,41800000,41600000
191,Vietnam,6550000,6550000,6550000,6550000,6550000,6550000,6550000,6550000,6550000,...,110000000,109000000,109000000,109000000,109000000,109000000,108000000,108000000,108000000,108000000
192,Yemen,2590000,2590000,2590000,2590000,2590000,2590000,2590000,2590000,2590000,...,54700000,54600000,54500000,54400000,54200000,54100000,54000000,53800000,53700000,53500000
193,Zambia,747000,747000,747000,747000,747000,747000,747000,747000,747000,...,84600000,85700000,86800000,87900000,89000000,90100000,91200000,92300000,93300000,94400000
194,Zimbabwe,1090000,1090000,1090000,1090000,1090000,1090000,1090000,1090000,1090000,...,40000000,40100000,40200000,40300000,40400000,40500000,40500000,40600000,40600000,40700000


In [28]:
population_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Columns: 302 entries, country to 2100
dtypes: int64(301), object(1)
memory usage: 460.2+ KB


In [29]:
population_df.describe()

Unnamed: 0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
count,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,...,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0
mean,4858439.0,4875184.0,4896420.0,4921854.0,4944249.0,4965603.0,4987121.0,5008500.0,5029397.0,5056351.0,...,56594980.0,56625400.0,56757020.0,56798060.0,56921940.0,56946540.0,57002010.0,57072810.0,57132750.0,57196150.0
std,26232980.0,26359250.0,26548600.0,26769260.0,26959290.0,27149560.0,27308240.0,27498550.0,27689430.0,27911490.0,...,156460000.0,155865000.0,156109900.0,155531800.0,155770200.0,155180400.0,154907000.0,154815900.0,154555400.0,154463500.0
min,905.0,905.0,905.0,905.0,905.0,905.0,905.0,905.0,905.0,905.0,...,806.0,805.0,806.0,802.0,803.0,804.0,802.0,799.0,800.0,798.0
25%,112000.0,112000.0,112000.0,112000.0,112000.0,112000.0,112000.0,112000.0,112000.0,112000.0,...,2035000.0,2025000.0,2010000.0,1995000.0,1985000.0,1975000.0,1960000.0,1950000.0,1940000.0,1925000.0
50%,713000.0,713000.0,713000.0,713000.0,713000.0,713000.0,713000.0,713000.0,713000.0,717000.0,...,12600000.0,12600000.0,12500000.0,12400000.0,12400000.0,12300000.0,12300000.0,12200000.0,12100000.0,12100000.0
75%,2120000.0,2120000.0,2120000.0,2120000.0,2120000.0,2120000.0,2120000.0,2120000.0,2120000.0,2120000.0,...,49150000.0,49150000.0,49000000.0,48800000.0,48600000.0,48400000.0,48250000.0,48050000.0,47950000.0,48050000.0
max,322000000.0,324000000.0,327000000.0,330000000.0,333000000.0,336000000.0,338000000.0,341000000.0,344000000.0,347000000.0,...,1570000000.0,1560000000.0,1560000000.0,1550000000.0,1550000000.0,1540000000.0,1530000000.0,1530000000.0,1520000000.0,1520000000.0


In [31]:
population_df_count_row = population_df.shape[0]  # gives number of row count
population_df_count_col = population_df.shape[1]  # gives number of col count
print("Population Dataset : {} rows and {} columns "
      .format(population_df_count_row,population_df_count_col))

Population Dataset : 195 rows and 302 columns 


In [33]:
# print each data type of each data frame
print("Total Fertility Column Data Type")
print(total_fertility_df.dtypes)

Total Fertility Column Data Type
country     object
1800       float64
1801       float64
1802       float64
1803       float64
1804       float64
1805       float64
1806       float64
1807       float64
1808       float64
1809       float64
1810       float64
1811       float64
1812       float64
1813       float64
1814       float64
1815       float64
1816       float64
1817       float64
1818       float64
1819       float64
1820       float64
1821       float64
1822       float64
1823       float64
1824       float64
1825       float64
1826       float64
1827       float64
1828       float64
            ...   
1989       float64
1990       float64
1991       float64
1992       float64
1993       float64
1994       float64
1995       float64
1996       float64
1997       float64
1998       float64
1999       float64
2000       float64
2001       float64
2002       float64
2003       float64
2004       float64
2005       float64
2006       float64
2007       float64
2008       float6

In [34]:
print("Income GDP per capita Column Data Type")
print(income_gdppercapita_df.dtypes)

Income GDP per capita Column Data Type
country    object
1800        int64
1801        int64
1802        int64
1803        int64
1804        int64
1805        int64
1806        int64
1807        int64
1808        int64
1809        int64
1810        int64
1811        int64
1812        int64
1813        int64
1814        int64
1815        int64
1816        int64
1817        int64
1818        int64
1819        int64
1820        int64
1821        int64
1822        int64
1823        int64
1824        int64
1825        int64
1826        int64
1827        int64
1828        int64
            ...  
2011        int64
2012        int64
2013        int64
2014        int64
2015        int64
2016        int64
2017        int64
2018        int64
2019        int64
2020        int64
2021        int64
2022        int64
2023        int64
2024        int64
2025        int64
2026        int64
2027        int64
2028        int64
2029        int64
2030        int64
2031        int64
2032        int64
2033   

In [35]:
print("Life Expectancy Column Data Type")
print(life_expectancy_df.dtypes)

Life Expectancy Column Data Type
country     object
1800       float64
1801       float64
1802       float64
1803       float64
1804       float64
1805       float64
1806       float64
1807       float64
1808       float64
1809       float64
1810       float64
1811       float64
1812       float64
1813       float64
1814       float64
1815       float64
1816       float64
1817       float64
1818       float64
1819       float64
1820       float64
1821       float64
1822       float64
1823       float64
1824       float64
1825       float64
1826       float64
1827       float64
1828       float64
            ...   
1989       float64
1990       float64
1991       float64
1992       float64
1993       float64
1994       float64
1995       float64
1996       float64
1997       float64
1998       float64
1999       float64
2000       float64
2001       float64
2002       float64
2003       float64
2004       float64
2005       float64
2006       float64
2007       float64
2008       float6

In [36]:
print("Population Column Data Type")
print(population_df.dtypes)

Population Column Data Type
country    object
1800        int64
1801        int64
1802        int64
1803        int64
1804        int64
1805        int64
1806        int64
1807        int64
1808        int64
1809        int64
1810        int64
1811        int64
1812        int64
1813        int64
1814        int64
1815        int64
1816        int64
1817        int64
1818        int64
1819        int64
1820        int64
1821        int64
1822        int64
1823        int64
1824        int64
1825        int64
1826        int64
1827        int64
1828        int64
            ...  
2071        int64
2072        int64
2073        int64
2074        int64
2075        int64
2076        int64
2077        int64
2078        int64
2079        int64
2080        int64
2081        int64
2082        int64
2083        int64
2084        int64
2085        int64
2086        int64
2087        int64
2088        int64
2089        int64
2090        int64
2091        int64
2092        int64
2093        int64




### Data Cleaning , Data Melting ,Data Merging

In [39]:
# After discussing the structure of the data and any problems that need to be
#   cleaned, perform those cleaning steps in the second part of this section.
pd.isna(total_fertility_df).sum().sum()

0

In [57]:
total_fertility_df = total_fertility_df.dropna(axis=0)

In [74]:
pd.isna(total_fertility_df).sum().sum()

0

In [75]:
total_fertility_df.head(2)

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Afghanistan,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,7.0,...,6.04,5.82,5.6,5.38,5.17,4.98,4.8,4.64,4.48,4.33
1,Albania,4.6,4.6,4.6,4.6,4.6,4.6,4.6,4.6,4.6,...,1.65,1.65,1.67,1.69,1.7,1.71,1.71,1.71,1.71,1.71


In [76]:
pd.isna(income_gdppercapita_df).sum().sum()

0

In [60]:
income_gdppercapita_df = income_gdppercapita_df.dropna(axis=0)

In [78]:
pd.isna(income_gdppercapita_df).sum().sum()

0

In [79]:
income_gdppercapita_df.head(2)

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Afghanistan,603,603,603,603,603,603,603,603,603,...,2420,2470,2520,2580,2640,2700,2760,2820,2880,2940
1,Albania,667,667,667,667,667,668,668,668,668,...,18500,18900,19300,19700,20200,20600,21100,21500,22000,22500


In [62]:
pd.isna(life_expectancy_df).sum().sum()

516

In [63]:
life_expectancy_df = life_expectancy_df.dropna(axis=0)

In [80]:
pd.isna(life_expectancy_df).sum().sum()

0

In [81]:
life_expectancy_df.head(2)

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Afghanistan,28.2,28.2,28.2,28.2,28.2,28.2,28.1,28.1,28.1,...,55.7,56.2,56.7,57.2,57.7,57.8,57.9,58.0,58.4,58.7
1,Albania,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,35.4,...,75.9,76.3,76.7,77.0,77.2,77.4,77.6,77.7,77.9,78.0


In [100]:
# https://stackoverflow.com/questions/36667548/how-to-create-a-series-of-numbers-using-pandas-in-python
total_fertility_df_melt = pd.melt(total_fertility_df,
        id_vars=['country'],
        value_vars = pd.Series(range(1980,2019)).astype(str),
        var_name='Year',
        value_name='Total_Fertility')
total_fertility_df_melt


Unnamed: 0,country,Year,Total_Fertility
0,Afghanistan,1980,7.45
1,Albania,1980,3.62
2,Algeria,1980,6.79
3,Angola,1980,7.50
4,Antigua and Barbuda,1980,2.12
5,Argentina,1980,3.33
6,Armenia,1980,2.39
7,Australia,1980,1.90
8,Austria,1980,1.65
9,Azerbaijan,1980,3.50


<a id='eda'></a>
## Exploratory Data Analysis

> **Tip**: Now that you've trimmed and cleaned your data, you're ready to move on to exploration. Compute statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Research Question 1 (Replace this header name!)

In [None]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


### Research Question 2  (Replace this header name!)

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work, you should save a copy of the report in HTML or PDF form via the **File** > **Download as** submenu. Before exporting your report, check over it to make sure that the flow of the report is complete. You should probably remove all of the "Tip" quotes like this one so that the presentation is as tidy as possible. Congratulations!