# Project 2
### GDP Growth vs Unemployment Rate in the U.S. After 2000

*Columbia University*


**Author**: Sherly Tuo

>This Project will navigate you into two datasets which are all from World Bank
>
> One dataset is about [GDP growth (annual%)]('https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart').
>
>URL: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart
>
>
> Another dataset is about [Unemployment, total (% of total labor force) (modeled ILO estimate)]('https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS').
>
>URL:https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS
>
>> You can also access and research them in [World Bank Open Data]('https://data.worldbank.org/')
>>
>>URL:https://data.worldbank.org/


## Step 1

Let us have a look on GDP growth.

In [1]:
import pandas as pd
GDP=pd.read_csv('Project2_GDP growth.csv',header=2)
GDP.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
0,Aruba,ABW,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,,,,,,,...,1.719625,7.048533,2.397086,-2.232442,-26.21182,24.132627,8.517918,4.263719,,
1,Africa Eastern and Southern,AFE,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,,0.469708,7.868623,5.622472,4.689533,5.159536,...,2.195991,2.696238,2.665038,2.20034,-2.859784,4.563568,3.555769,1.891307,2.766804,
2,Afghanistan,AFG,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,,,,,,,...,2.260314,2.647003,1.189228,3.911603,-2.351101,-20.738839,-6.240172,2.266944,,
3,Africa Western and Central,AFW,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,,1.869637,3.725941,7.039191,5.364761,4.105616,...,0.194177,2.296168,2.904654,3.282163,-0.984117,4.03,3.974964,3.357987,4.176103,
4,Angola,AGO,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,,,,,,,...,-2.58005,-0.147213,-1.316362,-0.702273,-5.638215,1.199211,3.044727,1.0781,4.423905,


In this project, I will focus on the U.S, so lets select 'United States' from 'Country Name' column.

If we want to do the data analysis, we have to use melt function to change this dataset a little bit.

In [2]:
GDP_US=GDP[GDP['Country Name']=='United States']
GDP_US=pd.melt(
    GDP_US,
    id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],
    var_name='Year',
    value_name='GDP_Growth_Rate'
)
GDP_US.head()


Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,Year,GDP_Growth_Rate
0,United States,USA,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,1960,
1,United States,USA,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,1961,2.565343
2,United States,USA,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,1962,6.129637
3,United States,USA,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,1963,4.357286
4,United States,USA,GDP growth (annual %),NY.GDP.MKTP.KD.ZG,1964,5.762747


Let's have a look on the dataset to know the type of the column, so that we can do the data cleansing if needed

In [3]:
GDP_US.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Country Name     66 non-null     object 
 1   Country Code     66 non-null     object 
 2   Indicator Name   66 non-null     object 
 3   Indicator Code   66 non-null     object 
 4   Year             66 non-null     object 
 5   GDP_Growth_Rate  64 non-null     float64
dtypes: float64(1), object(5)
memory usage: 3.2+ KB


We want to access the year after 2000. We have to convert Year from string to integer

In [4]:
GDP_US['Year'].unique()

array(['1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967',
       '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975',
       '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983',
       '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991',
       '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999',
       '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007',
       '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015',
       '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023',
       '2024', 'Unnamed: 69'], dtype=object)

Convert 'Unnamed: 69' into NA, and convert the column into Int

In [5]:
import numpy as np
GDP_US['Year']=GDP_US['Year'].replace('Unnamed: 69',np.nan)
GDP_US['Year']=GDP_US['Year'].astype('Int64')
GDP_US.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Country Name     66 non-null     object 
 1   Country Code     66 non-null     object 
 2   Indicator Name   66 non-null     object 
 3   Indicator Code   66 non-null     object 
 4   Year             65 non-null     Int64  
 5   GDP_Growth_Rate  64 non-null     float64
dtypes: Int64(1), float64(1), object(4)
memory usage: 3.3+ KB


Let's do the line chart

In [6]:
import plotly.express as px
from IPython.display import HTML
fig=px.line(GDP_US[GDP_US['Year']>2000],
        x='Year',
        y='GDP_Growth_Rate',
        title='The GDP growth of the US After 2000',
        labels={'GDP_Growth_Rate':'GDP Growth Rate'})
HTML(fig.to_html(include_plotlyjs="cdn", full_html=False))

## Step 2
Let us have a look on unemployment rate.

In [7]:
unemployment=pd.read_csv('Project2_unemployment.csv',header=2)
unemployment.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
0,Aruba,ABW,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,,,,,,,...,,,,,,,,,,
1,Africa Eastern and Southern,AFE,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,,,,,,,...,7.194666,7.346331,7.360513,7.584419,8.191395,8.577385,7.985202,7.806365,7.772654,
2,Afghanistan,AFG,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,,,,,,,...,10.133,11.184,11.196,11.185,11.71,11.994,14.1,13.991,13.295,
3,Africa Western and Central,AFW,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,,,,,,,...,4.157574,4.274196,4.323631,4.395271,4.852393,4.736732,3.658573,3.277245,3.218313,
4,Angola,AGO,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,,,,,,,...,16.575,16.61,16.594,16.497,16.69,15.799,14.602,14.537,14.464,


Have a look on the US's data

In [8]:
ueply_US=unemployment[unemployment['Country Name']=='United States']
ueply_US.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2016,2017,2018,2019,2020,2021,2022,2023,2024,Unnamed: 69
251,United States,USA,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,,,,,,,...,4.869,4.355,3.896,3.669,8.055,5.349,3.65,3.638,4.106,


Do the melt

In [9]:
ueply_US=pd.melt(
    ueply_US,
    id_vars=['Country Name','Country Code','Indicator Name','Indicator Code'],
    var_name='Year',
    value_name='unemployment_rate'
            )
ueply_US.head()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,Year,unemployment_rate
0,United States,USA,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,1960,
1,United States,USA,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,1961,
2,United States,USA,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,1962,
3,United States,USA,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,1963,
4,United States,USA,"Unemployment, total (% of total labor force) (...",SL.UEM.TOTL.ZS,1964,


In [10]:
ueply_US['Year'].unique()


array(['1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967',
       '1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975',
       '1976', '1977', '1978', '1979', '1980', '1981', '1982', '1983',
       '1984', '1985', '1986', '1987', '1988', '1989', '1990', '1991',
       '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999',
       '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007',
       '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015',
       '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023',
       '2024', 'Unnamed: 69'], dtype=object)

Convert this column into a integer column

In [11]:
ueply_US['Year']=ueply_US['Year'].replace('Unnamed: 69',np.nan).astype('Int64')

Draw the graph

In [12]:
fig2=px.line(ueply_US[ueply_US['Year']>2000],
            x='Year',
            y='unemployment_rate',
            title='The GDP growth of the US by After 2000',
            labels={'unemployment_rate':'Unemployment Rate'})
HTML(fig2.to_html(include_plotlyjs="cdn", full_html=False))

## Step 3

Let's put this two lines chart into one graph

Start it by using the data after 2000

In [15]:
GDP_US_2000= GDP_US[GDP_US['Year']>2000]
ueply_US_2000=ueply_US[ueply_US['Year']>2000]


In [16]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=GDP_US_2000['Year'], y= GDP_US_2000['GDP_Growth_Rate'], name="GDP Growth Rate"),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=ueply_US_2000['Year'], y=ueply_US_2000['unemployment_rate'], name="Unemployment Rate"),
    secondary_y=True,
)

# Add figure title
fig.update_layout(
    title_text="GDP Growth Rate against Unemployment Rate in the US After 2000"
)

# Set x-axis title
fig.update_xaxes(title_text="Year")

# Set y-axes titles
fig.update_yaxes(title_text="GDP_Growth_Rate", secondary_y=False)
fig.update_yaxes(title_text="Unemployment Rate", secondary_y=True)
fig.update_traces(mode="lines")

HTML(fig.to_html(include_plotlyjs="cdn", full_html=False))

From this graph, simply tell, there is no any relationship between GDP growth and Unemployment rate.