**DATA CLEANING**

Step 1: Aligning time frames  
We want all the datasets to start at the same year and end at the same year. Currently, the wages data is 1973-2022, the inflation data 1913-2022, and the GDP data 1973-2021. We want all datasets to be 1973-2021 so that they are aligned. 

Step 2: Averaging CPI data and calculating rates     
Currently, the inflation data has the CPI month-by-month for each year. But we are not doing analysis on a month-by-month basis. We are doing in on a year-by-year basis. So, we want to average the month-by-month values so we have an annual CPI for each year. Then, once we have the annual CPI values, we'll actually use them to calculate the inflation rates which is relevant for the regressions we want to do, which you can see in the notebook where we actually run our tests. 

Step 3: Dropping unncessary wage data  
For the purposes of this assignment, we want to capture the more notable and extreme trends. We're not really trying to analyze lower middle-class to middle-class wages. We want to capture the large disparities and the significant trends. To accomplish this, we'll retain the columns: 
- annual_poverty-level_wage, 
- hourly_poverty-level_wage, 
- share_below_poverty_wages, 
- 300%+_of_poverty_wages, 
- men_share_below_poverty_wages, 
- men_300%+_of_poverty_wages, 
- women_share_below_poverty_wages, 
- women_300%+_of_poverty_wages, 
- white_share_below_poverty_wages,
- white_men_share_below_poverty_wages, 
- white_women_share_below_poverty_wages, 
- black_share_below_poverty_wages, 
- black_men_share_below_poverty_wages, 
- black_women_share_below_poverty_wages, 
- hispanic_share_below_poverty_wages, 
- hispanic_men_share_below_poverty_wages, 
- hispanic_women_share_below_poverty_wages.    
It is our group's belief that these columns capture the most important insights.  

Step 4: Average the fed funds rate dataset   
Currently, the fed funds rate dataset is too granular, containing rates for specific days in specific months in specific years. To make everything clearner and more in line with all the data, we'll just calculate the average rate on an annual basis.

Step 5: Create analysis dataset  
Combine our cleaned datasets into one table now that they are cleaned. 

Step 6: Add a final column  
We'll add a column that contain the difference between "men_share_below_poverty_wages" and "women_share_below_poverty_wages" This will be necessary for the analyses we perform so if you want to know more about why we made this column you'll see in phase5.ipynb

These are the generally important edits we must make. Throughout the code you may see some smaller preprocessing being done whcih we'll explain, but these are the overall broad strokes.

In [522]:
import pandas as pd
import numpy as np

In [523]:
inflation_data = pd.read_csv('inflation.csv')
inflation_data

Unnamed: 0.1,Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
0,0,1913.0,9.800,9.800,9.800,9.800,9.700,9.800,9.900,9.900,10.000,10.000,10.100,10.000
1,1,1914.0,10.000,9.900,9.900,9.800,9.900,9.900,10.000,10.200,10.200,10.100,10.200,10.100
2,2,1915.0,10.100,10.000,9.900,10.000,10.100,10.100,10.100,10.100,10.100,10.200,10.300,10.300
3,3,1916.0,10.400,10.400,10.500,10.600,10.700,10.800,10.800,10.900,11.100,11.300,11.500,11.600
4,4,1917.0,11.700,12.000,12.000,12.600,12.800,13.000,12.800,13.000,13.300,13.500,13.500,13.700
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
105,105,2018.0,247.867,248.991,249.554,250.546,251.588,251.989,252.006,252.146,252.439,252.885,252.038,251.233
106,106,2019.0,251.712,252.776,254.202,255.548,256.092,256.143,256.571,256.558,256.759,257.346,257.208,256.974
107,107,2020.0,257.971,258.678,258.115,256.389,256.394,257.797,259.101,259.918,260.280,260.388,260.229,260.474
108,108,2021.0,261.582,263.014,264.877,267.054,269.195,271.696,273.003,273.567,274.310,276.589,277.948,278.802


In [524]:
wage_data = pd.read_csv('poverty_level_wages.csv')
wage_data

Unnamed: 0,year,annual_poverty-level_wage,hourly_poverty-level_wage,0-75%_of_poverty_wages,75-100%_of_poverty_wages,share_below_poverty_wages,100-125%_of_poverty_wages,125-200%_of_poverty_wages,200-300%_of_poverty_wages,300%+_of_poverty_wages,...,women_300%+_of_poverty_wages,white_share_below_poverty_wages,white_men_share_below_poverty_wages,white_women_share_below_poverty_wages,black_share_below_poverty_wages,black_men_share_below_poverty_wages,black_women_share_below_poverty_wages,hispanic_share_below_poverty_wages,hispanic_men_share_below_poverty_wages,hispanic_women_share_below_poverty_wages
0,2022,27733,13.33,4.7,7.5,12.2,15.2,32.7,19.5,20.4,...,16.7,10.7,8.1,13.5,16.2,14.5,17.7,15.4,12.3,19.2
1,2021,25688,12.35,3.8,9.1,12.9,15.0,30.3,20.4,21.5,...,17.4,11.1,8.7,13.8,18.0,15.7,19.9,16.8,14.2,20.0
2,2020,24544,11.8,3.4,8.6,12.0,13.0,31.5,20.8,22.6,...,18.1,10.4,7.9,13.1,18.1,15.4,20.5,14.9,12.0,18.6
3,2019,24242,11.65,4.4,11.3,15.7,14.2,31.2,18.7,20.3,...,16.0,13.4,10.5,16.6,22.3,19.2,24.9,19.7,15.8,24.6
4,2018,23809,11.45,5.0,13.0,18.1,14.2,29.3,19.1,19.4,...,15.2,15.0,11.6,18.7,26.0,23.1,28.5,24.3,19.3,30.7
5,2017,23244,11.18,5.1,15.3,20.4,12.0,29.8,18.4,19.4,...,15.1,16.8,13.4,20.5,29.1,26.4,31.4,28.5,23.7,34.8
6,2016,22755,10.94,5.6,15.2,20.8,13.2,28.4,18.8,18.9,...,14.9,17.3,14.0,20.8,28.8,25.1,32.0,29.2,24.7,35.1
7,2015,22464,10.8,6.7,16.3,23.0,13.0,27.1,18.7,18.2,...,14.2,18.9,15.2,22.9,31.8,28.3,34.8,33.5,29.1,39.2
8,2014,22422,10.78,8.2,16.2,24.4,12.3,27.7,18.7,16.9,...,13.2,20.2,16.0,24.6,32.4,30.2,34.3,36.2,31.7,42.1
9,2013,22048,10.6,6.4,18.6,25.0,12.2,27.5,18.0,17.3,...,13.2,20.8,16.9,24.9,31.9,29.0,34.4,38.2,34.8,42.7


In [525]:
interest_data = pd.read_csv('fed-funds-rate.csv')
interest_data

Unnamed: 0,date,value
0,7/1/1954,1.13
1,7/2/1954,1.25
2,7/3/1954,1.25
3,7/4/1954,1.25
4,7/5/1954,0.88
...,...,...
24060,4/2/2021,
24061,4/3/2021,
24062,4/4/2021,
24063,4/5/2021,


In [526]:
gdp_data = pd.read_csv('gdp_per_capita.csv')
gdp_data

Unnamed: 0,Year,GDP_Per_Capita
0,1973,6726.36
1,1974,7225.69
2,1975,7801.46
3,1976,8592.25
4,1977,9452.58
5,1978,10564.95
6,1979,11674.18
7,1980,12574.79
8,1981,13976.11
9,1982,14433.79


DATA CLEANING - STEP 1

In [527]:
# INFLATION DATA

# First, change the type of the Year column from float to int
inflation_data['Year'] = inflation_data['Year'].astype(int)

# Then drop rows not in the interval of 1973-2021
inflation_data = inflation_data[(inflation_data['Year'] >= 1973) \
                                & (inflation_data['Year'] <= 2021)]

# While we're at it, we'll also drop the "Unnamed: 0" 
# column that was read in weirdly since we don't want it
inflation_data = inflation_data.drop('Unnamed: 0', axis=1)

inflation_data

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
60,1973,42.6,42.9,43.3,43.6,43.9,44.2,44.3,45.1,45.2,45.6,45.9,46.2
61,1974,46.6,47.2,47.8,48.0,48.6,49.0,49.4,50.0,50.6,51.1,51.5,51.9
62,1975,52.1,52.5,52.7,52.9,53.2,53.6,54.2,54.3,54.6,54.9,55.3,55.5
63,1976,55.6,55.8,55.9,56.1,56.5,56.8,57.1,57.4,57.6,57.9,58.0,58.2
64,1977,58.5,59.1,59.5,60.0,60.3,60.7,61.0,61.2,61.4,61.6,61.9,62.1
65,1978,62.5,62.9,63.4,63.9,64.5,65.2,65.7,66.0,66.5,67.1,67.4,67.7
66,1979,68.3,69.1,69.8,70.6,71.5,72.3,73.1,73.8,74.6,75.2,75.9,76.7
67,1980,77.8,78.9,80.1,81.0,81.8,82.7,82.7,83.3,84.0,84.8,85.5,86.3
68,1981,87.0,87.9,88.5,89.1,89.8,90.6,91.6,92.3,93.2,93.4,93.7,94.0
69,1982,94.3,94.6,94.5,94.9,95.8,97.0,97.5,97.7,97.9,98.2,98.0,97.6


In [528]:
# WAGE DATA

# First, let's just reverse the years, because currently, 
# its starting with 2022 and then going to 1973
wage_data.sort_values(by='year', ascending=True, inplace=True)

# Then, we just have to delete the 2022 row because it 
# already starts at 1973
wage_data = wage_data[wage_data['year'] != 2022]

# Let's also just rename 'year' to 'Year' since we're 
# using uppercase for other dataframes
wage_data= wage_data.rename(columns={'year' : 'Year'})

wage_data

Unnamed: 0,Year,annual_poverty-level_wage,hourly_poverty-level_wage,0-75%_of_poverty_wages,75-100%_of_poverty_wages,share_below_poverty_wages,100-125%_of_poverty_wages,125-200%_of_poverty_wages,200-300%_of_poverty_wages,300%+_of_poverty_wages,...,women_300%+_of_poverty_wages,white_share_below_poverty_wages,white_men_share_below_poverty_wages,white_women_share_below_poverty_wages,black_share_below_poverty_wages,black_men_share_below_poverty_wages,black_women_share_below_poverty_wages,hispanic_share_below_poverty_wages,hispanic_men_share_below_poverty_wages,hispanic_women_share_below_poverty_wages
49,1973,4701,2.26,9.3,16.3,25.6,12.7,32.9,19.8,9.0,...,2.4,23.8,13.7,38.5,37.3,25.3,50.6,34.9,27.1,48.2
48,1974,5158,2.48,7.5,16.8,24.3,15.2,31.0,20.6,8.9,...,2.3,22.7,12.8,37.0,34.6,22.8,47.4,33.7,24.5,49.7
47,1975,5595,2.69,10.2,15.7,25.9,13.9,32.0,19.5,8.7,...,2.5,24.2,14.4,38.2,36.7,25.1,49.1,35.0,27.3,47.5
46,1976,5914,2.84,7.1,18.0,25.1,15.2,31.2,19.6,9.0,...,2.8,23.6,14.1,36.7,34.8,23.8,46.2,34.8,27.2,46.3
45,1977,6284,3.02,6.2,20.7,26.9,13.6,30.3,20.5,8.7,...,2.4,25.4,15.5,38.7,36.7,28.0,45.7,36.0,25.6,52.8
44,1978,6718,3.23,5.4,20.6,26.0,15.4,29.6,19.9,9.1,...,2.3,24.5,14.7,37.5,35.5,25.2,46.0,32.6,23.8,46.5
43,1979,7360,3.54,5.1,20.7,25.8,13.6,31.1,20.7,8.9,...,2.4,24.3,14.3,37.3,33.7,25.1,42.9,34.6,25.1,49.6
42,1980,8181,3.93,4.6,21.1,25.7,13.8,31.8,19.9,8.8,...,2.4,24.2,14.4,36.5,34.1,25.3,43.0,33.6,25.3,46.5
41,1981,8956,4.31,5.3,22.5,27.8,14.4,29.1,20.5,8.2,...,2.4,26.2,16.4,38.4,35.7,26.8,44.7,37.1,29.1,49.4
40,1982,9499,4.57,10.6,17.6,28.3,13.5,29.4,19.7,9.1,...,2.8,26.6,17.3,37.8,36.9,29.3,44.4,37.5,30.6,47.7


In [529]:
# GDP DATA: already starts at 1973 and ends at 2021 so no editing is needed

DATA CLEANING - STEP 2

In [530]:
# First, we'll create the Annual_CPI column
columns_to_average = inflation_data.columns[1:]
inflation_data['Annual_CPI'] = inflation_data[columns_to_average].mean(axis=1)

# Drop all the month columns since they are no longer necessary
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', \
          'Oct', 'Nov', 'Dec']
inflation_data = inflation_data.drop(columns=months)

# Round to two decimal places
inflation_data['Annual_CPI'] = inflation_data['Annual_CPI'].round(2)

# Now we'll create the column for the Annual_Inflation column
Annual_Inflation = [((44.4 - 41.8) / 41.8) * 100]

for i in range(61, len(inflation_data['Annual_CPI']) + 60):
    Annual_Inflation.append(((inflation_data['Annual_CPI'][i] - inflation_data['Annual_CPI'][i - 1]) / inflation_data['Annual_CPI'][i - 1]) * 100)
inflation_data['Annual_Inflation'] = np.array(Annual_Inflation).round(2)

inflation_data

Unnamed: 0,Year,Annual_CPI,Annual_Inflation
60,1973,44.4,6.22
61,1974,49.31,11.06
62,1975,53.82,9.15
63,1976,56.91,5.74
64,1977,60.61,6.5
65,1978,65.23,7.62
66,1979,72.58,11.27
67,1980,82.41,13.54
68,1981,90.92,10.33
69,1982,96.5,6.14


DATA CLEANING - STEP 3

In [531]:
# Create a list of the columns to drop
drop_categories = ['0-75%_of_poverty_wages', '75-100%_of_poverty_wages', \
                   '100-125%_of_poverty_wages', \
                   '125-200%_of_poverty_wages', \
          '200-300%_of_poverty_wages', 'men_0-75%_of_poverty_wages', \
                   'men_75-100%_of_poverty_wages', \
          'men_100-125%_of_poverty_wages', 'men_125-200%_of_poverty_wages', \
                   'men_200-300%_of_poverty_wages', \
          'women_0-75%_of_poverty_wages', 'women_75-100%_of_poverty_wages', \
                   'women_100-125%_of_poverty_wages', \
          'women_125-200%_of_poverty_wages', \
                   'women_200-300%_of_poverty_wages']
wage_data = wage_data.drop(columns=drop_categories)

wage_data

Unnamed: 0,Year,annual_poverty-level_wage,hourly_poverty-level_wage,share_below_poverty_wages,300%+_of_poverty_wages,men_share_below_poverty_wages,men_300%+_of_poverty_wages,women_share_below_poverty_wages,women_300%+_of_poverty_wages,white_share_below_poverty_wages,white_men_share_below_poverty_wages,white_women_share_below_poverty_wages,black_share_below_poverty_wages,black_men_share_below_poverty_wages,black_women_share_below_poverty_wages,hispanic_share_below_poverty_wages,hispanic_men_share_below_poverty_wages,hispanic_women_share_below_poverty_wages
49,1973,4701,2.26,25.6,9.0,15.4,13.6,40.1,2.4,23.8,13.7,38.5,37.3,25.3,50.6,34.9,27.1,48.2
48,1974,5158,2.48,24.3,8.9,14.3,13.5,38.6,2.3,22.7,12.8,37.0,34.6,22.8,47.4,33.7,24.5,49.7
47,1975,5595,2.69,25.9,8.7,16.0,13.1,39.7,2.5,24.2,14.4,38.2,36.7,25.1,49.1,35.0,27.3,47.5
46,1976,5914,2.84,25.1,9.0,15.6,13.6,38.0,2.8,23.6,14.1,36.7,34.8,23.8,46.2,34.8,27.2,46.3
45,1977,6284,3.02,26.9,8.7,17.1,13.5,40.0,2.4,25.4,15.5,38.7,36.7,28.0,45.7,36.0,25.6,52.8
44,1978,6718,3.23,26.0,9.1,16.2,14.3,38.7,2.3,24.5,14.7,37.5,35.5,25.2,46.0,32.6,23.8,46.5
43,1979,7360,3.54,25.8,8.9,16.0,13.9,38.4,2.4,24.3,14.3,37.3,33.7,25.1,42.9,34.6,25.1,49.6
42,1980,8181,3.93,25.7,8.8,16.1,13.8,37.6,2.4,24.2,14.4,36.5,34.1,25.3,43.0,33.6,25.3,46.5
41,1981,8956,4.31,27.8,8.2,18.2,12.9,39.6,2.4,26.2,16.4,38.4,35.7,26.8,44.7,37.1,29.1,49.4
40,1982,9499,4.57,28.3,9.1,19.3,14.3,38.9,2.8,26.6,17.3,37.8,36.9,29.3,44.4,37.5,30.6,47.7


DATA CLEANING - STEP 4

In [532]:
# First, let's get rid of NaNs

interest_data.dropna(inplace=True)

# Second, let's change the dates to just have their year and drop the month and day
counter = 0
for val in interest_data['date']:
    interest_data['date'][counter] = val.split('/')[2]
    counter += 1
    
# Third, let's get the average value for each year
interest_data = interest_data.groupby('date')['value'].mean().reset_index()


interest_data.rename(columns={'date' : 'Year', 'value' : 'Interest_Rate'}, inplace=True)
interest_data['Year'] = interest_data['Year'].astype(int)
interest_data = interest_data[(interest_data['Year'] >= 1973) & (interest_data['Year'] <= 2021)]
    
interest_data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  interest_data['date'][counter] = val.split('/')[2]


Unnamed: 0,Year,Interest_Rate
19,1973,8.742274
20,1974,10.511397
21,1975,5.821178
22,1976,5.045082
23,1977,5.542301
24,1978,7.936877
25,1979,11.202795
26,1980,13.349727
27,1981,16.386356
28,1982,12.237671


DATA CLEANING - STEP 5

In [533]:
# First, we'll merge inflation, wage and interest together
analysis_df = pd.merge(inflation_data, wage_data, on='Year', how='outer')
analysis_df = pd.merge(analysis_df, interest_data, on='Year', how='outer')

analysis_df

Unnamed: 0,Year,Annual_CPI,Annual_Inflation,annual_poverty-level_wage,hourly_poverty-level_wage,share_below_poverty_wages,300%+_of_poverty_wages,men_share_below_poverty_wages,men_300%+_of_poverty_wages,women_share_below_poverty_wages,...,white_share_below_poverty_wages,white_men_share_below_poverty_wages,white_women_share_below_poverty_wages,black_share_below_poverty_wages,black_men_share_below_poverty_wages,black_women_share_below_poverty_wages,hispanic_share_below_poverty_wages,hispanic_men_share_below_poverty_wages,hispanic_women_share_below_poverty_wages,Interest_Rate
0,1973,44.4,6.22,4701,2.26,25.6,9.0,15.4,13.6,40.1,...,23.8,13.7,38.5,37.3,25.3,50.6,34.9,27.1,48.2,8.742274
1,1974,49.31,11.06,5158,2.48,24.3,8.9,14.3,13.5,38.6,...,22.7,12.8,37.0,34.6,22.8,47.4,33.7,24.5,49.7,10.511397
2,1975,53.82,9.15,5595,2.69,25.9,8.7,16.0,13.1,39.7,...,24.2,14.4,38.2,36.7,25.1,49.1,35.0,27.3,47.5,5.821178
3,1976,56.91,5.74,5914,2.84,25.1,9.0,15.6,13.6,38.0,...,23.6,14.1,36.7,34.8,23.8,46.2,34.8,27.2,46.3,5.045082
4,1977,60.61,6.5,6284,3.02,26.9,8.7,17.1,13.5,40.0,...,25.4,15.5,38.7,36.7,28.0,45.7,36.0,25.6,52.8,5.542301
5,1978,65.23,7.62,6718,3.23,26.0,9.1,16.2,14.3,38.7,...,24.5,14.7,37.5,35.5,25.2,46.0,32.6,23.8,46.5,7.936877
6,1979,72.58,11.27,7360,3.54,25.8,8.9,16.0,13.9,38.4,...,24.3,14.3,37.3,33.7,25.1,42.9,34.6,25.1,49.6,11.202795
7,1980,82.41,13.54,8181,3.93,25.7,8.8,16.1,13.8,37.6,...,24.2,14.4,36.5,34.1,25.3,43.0,33.6,25.3,46.5,13.349727
8,1981,90.92,10.33,8956,4.31,27.8,8.2,18.2,12.9,39.6,...,26.2,16.4,38.4,35.7,26.8,44.7,37.1,29.1,49.4,16.386356
9,1982,96.5,6.14,9499,4.57,28.3,9.1,19.3,14.3,38.9,...,26.6,17.3,37.8,36.9,29.3,44.4,37.5,30.6,47.7,12.237671


In [534]:
# Then, merge this with GDP
analysis_df = pd.merge(analysis_df, gdp_data, on='Year', how='outer')

analysis_df

Unnamed: 0,Year,Annual_CPI,Annual_Inflation,annual_poverty-level_wage,hourly_poverty-level_wage,share_below_poverty_wages,300%+_of_poverty_wages,men_share_below_poverty_wages,men_300%+_of_poverty_wages,women_share_below_poverty_wages,...,white_men_share_below_poverty_wages,white_women_share_below_poverty_wages,black_share_below_poverty_wages,black_men_share_below_poverty_wages,black_women_share_below_poverty_wages,hispanic_share_below_poverty_wages,hispanic_men_share_below_poverty_wages,hispanic_women_share_below_poverty_wages,Interest_Rate,GDP_Per_Capita
0,1973,44.4,6.22,4701,2.26,25.6,9.0,15.4,13.6,40.1,...,13.7,38.5,37.3,25.3,50.6,34.9,27.1,48.2,8.742274,6726.36
1,1974,49.31,11.06,5158,2.48,24.3,8.9,14.3,13.5,38.6,...,12.8,37.0,34.6,22.8,47.4,33.7,24.5,49.7,10.511397,7225.69
2,1975,53.82,9.15,5595,2.69,25.9,8.7,16.0,13.1,39.7,...,14.4,38.2,36.7,25.1,49.1,35.0,27.3,47.5,5.821178,7801.46
3,1976,56.91,5.74,5914,2.84,25.1,9.0,15.6,13.6,38.0,...,14.1,36.7,34.8,23.8,46.2,34.8,27.2,46.3,5.045082,8592.25
4,1977,60.61,6.5,6284,3.02,26.9,8.7,17.1,13.5,40.0,...,15.5,38.7,36.7,28.0,45.7,36.0,25.6,52.8,5.542301,9452.58
5,1978,65.23,7.62,6718,3.23,26.0,9.1,16.2,14.3,38.7,...,14.7,37.5,35.5,25.2,46.0,32.6,23.8,46.5,7.936877,10564.95
6,1979,72.58,11.27,7360,3.54,25.8,8.9,16.0,13.9,38.4,...,14.3,37.3,33.7,25.1,42.9,34.6,25.1,49.6,11.202795,11674.18
7,1980,82.41,13.54,8181,3.93,25.7,8.8,16.1,13.8,37.6,...,14.4,36.5,34.1,25.3,43.0,33.6,25.3,46.5,13.349727,12574.79
8,1981,90.92,10.33,8956,4.31,27.8,8.2,18.2,12.9,39.6,...,16.4,38.4,35.7,26.8,44.7,37.1,29.1,49.4,16.386356,13976.11
9,1982,96.5,6.14,9499,4.57,28.3,9.1,19.3,14.3,38.9,...,17.3,37.8,36.9,29.3,44.4,37.5,30.6,47.7,12.237671,14433.79


DATA CLEANING - STEP 6

In [535]:
analysis_df['diff_poverty_share'] = abs(analysis_df['men_share_below_poverty_wages'] - analysis_df['women_share_below_poverty_wages']).round(2)
analysis_df

Unnamed: 0,Year,Annual_CPI,Annual_Inflation,annual_poverty-level_wage,hourly_poverty-level_wage,share_below_poverty_wages,300%+_of_poverty_wages,men_share_below_poverty_wages,men_300%+_of_poverty_wages,women_share_below_poverty_wages,...,white_women_share_below_poverty_wages,black_share_below_poverty_wages,black_men_share_below_poverty_wages,black_women_share_below_poverty_wages,hispanic_share_below_poverty_wages,hispanic_men_share_below_poverty_wages,hispanic_women_share_below_poverty_wages,Interest_Rate,GDP_Per_Capita,diff_poverty_share
0,1973,44.4,6.22,4701,2.26,25.6,9.0,15.4,13.6,40.1,...,38.5,37.3,25.3,50.6,34.9,27.1,48.2,8.742274,6726.36,24.7
1,1974,49.31,11.06,5158,2.48,24.3,8.9,14.3,13.5,38.6,...,37.0,34.6,22.8,47.4,33.7,24.5,49.7,10.511397,7225.69,24.3
2,1975,53.82,9.15,5595,2.69,25.9,8.7,16.0,13.1,39.7,...,38.2,36.7,25.1,49.1,35.0,27.3,47.5,5.821178,7801.46,23.7
3,1976,56.91,5.74,5914,2.84,25.1,9.0,15.6,13.6,38.0,...,36.7,34.8,23.8,46.2,34.8,27.2,46.3,5.045082,8592.25,22.4
4,1977,60.61,6.5,6284,3.02,26.9,8.7,17.1,13.5,40.0,...,38.7,36.7,28.0,45.7,36.0,25.6,52.8,5.542301,9452.58,22.9
5,1978,65.23,7.62,6718,3.23,26.0,9.1,16.2,14.3,38.7,...,37.5,35.5,25.2,46.0,32.6,23.8,46.5,7.936877,10564.95,22.5
6,1979,72.58,11.27,7360,3.54,25.8,8.9,16.0,13.9,38.4,...,37.3,33.7,25.1,42.9,34.6,25.1,49.6,11.202795,11674.18,22.4
7,1980,82.41,13.54,8181,3.93,25.7,8.8,16.1,13.8,37.6,...,36.5,34.1,25.3,43.0,33.6,25.3,46.5,13.349727,12574.79,21.5
8,1981,90.92,10.33,8956,4.31,27.8,8.2,18.2,12.9,39.6,...,38.4,35.7,26.8,44.7,37.1,29.1,49.4,16.386356,13976.11,21.4
9,1982,96.5,6.14,9499,4.57,28.3,9.1,19.3,14.3,38.9,...,37.8,36.9,29.3,44.4,37.5,30.6,47.7,12.237671,14433.79,19.6


DATA CLEANING COMPLETE - NOW WE EXPORT THE ANALYSIS READY DATAFRAME INTO A CSV

In [536]:
analysis_df.to_csv('analysis.csv', index=False)