# Project 2

## Data Set 2 - Population

For this data set, the goal is to calculate the population change by country.

In [1]:
# Import pandas

import pandas as pan
from pandas import DataFrame, Series

# Read the information from .csv file into pandas

pop = pan.read_csv('Population.csv')

In [2]:
# Display the population data in the dataframe

pop.head()

Unnamed: 0,COUNTRY,2000,2010,2020
0,UNITED STATES,282200000,309300000,329500000
1,CHINA,1263000000,1338000000,1411000000
2,RUSSIA,146600000,142800000,144100000
3,INDIA,1060000000,1241000000,1396000000


The data is presented in a wide format, which does not indicate what the values between the two axes represent. To make this data easier to analyze, the best course of action is to transform it into the long format using the .melt() function.

In [3]:
# Unpivot the wide format data to long format using the melt() function and reset the index

tidy_pop = pop.melt(id_vars =['COUNTRY'], var_name='Year', value_name='Population').sort_values(by=['COUNTRY']).reset_index(drop=True)
tidy_pop.head(12)

Unnamed: 0,COUNTRY,Year,Population
0,CHINA,2000,1263000000
1,CHINA,2010,1338000000
2,CHINA,2020,1411000000
3,INDIA,2000,1060000000
4,INDIA,2010,1241000000
5,INDIA,2020,1396000000
6,RUSSIA,2000,146600000
7,RUSSIA,2010,142800000
8,RUSSIA,2020,144100000
9,UNITED STATES,2000,282200000


Now that the data has been transformed into the long format, the calculations to determine the percent change in population for each country can be done.

In [4]:
# Calculate the percent change in population from year to year for each country

tidy_pop["percent_change"] = tidy_pop.groupby('COUNTRY')['Population'].pct_change()*100
tidy_pop

Unnamed: 0,COUNTRY,Year,Population,percent_change
0,CHINA,2000,1263000000,
1,CHINA,2010,1338000000,5.938242
2,CHINA,2020,1411000000,5.455904
3,INDIA,2000,1060000000,
4,INDIA,2010,1241000000,17.075472
5,INDIA,2020,1396000000,12.489927
6,RUSSIA,2000,146600000,
7,RUSSIA,2010,142800000,-2.592087
8,RUSSIA,2020,144100000,0.910364
9,UNITED STATES,2000,282200000,


In [5]:
# Calculate the percent change in population from 2000 to 2020 for each country

tidy_pop["percent_change"] = tidy_pop.groupby('COUNTRY')['Population'].pct_change(periods=2)*100
tidy_pop

Unnamed: 0,COUNTRY,Year,Population,percent_change
0,CHINA,2000,1263000000,
1,CHINA,2010,1338000000,
2,CHINA,2020,1411000000,11.718131
3,INDIA,2000,1060000000,
4,INDIA,2010,1241000000,
5,INDIA,2020,1396000000,31.698113
6,RUSSIA,2000,146600000,
7,RUSSIA,2010,142800000,
8,RUSSIA,2020,144100000,-1.705321
9,UNITED STATES,2000,282200000,


For China, from 2000 to 2010 and from 2010 to 2020, there was a 5.94% and a 5.46% change in population, respectively. From 2000 to 2020, the population increased by 11.72%.

For India, from 2000 to 2010 and from 2010 to 2020, there was a 17.08% and a 12.49% change in population, respectively. From 2000 to 2020, the population increased by 31.70%.

For Russia, from 2000 to 2010 and from 2010 to 2020, there was a -2.59% and a 0.91% change in population, respectively. From 2000 to 2020, the population decreased by 1.71%.

For the United States, from 2000 to 2010 and from 2010 to 2020, there was a 9.60% and a 6.53% change in population, respectively. From 2000 to 2020, the population increased by 16.76%.