Sometimes, it's argued that increasing minimum wage makes it so that employing people is more costly, and, as a result, employment may drop.

In [14]:
import pandas as pd

unemp_county = pd.read_csv("Unemployment by county.csv")
unemp_county.head()

Unnamed: 0,Year,Month,State,County,Rate
0,2015,February,Mississippi,Newton County,6.1
1,2015,February,Mississippi,Panola County,9.4
2,2015,February,Mississippi,Monroe County,7.9
3,2015,February,Mississippi,Hinds County,6.1
4,2015,February,Mississippi,Kemper County,10.6


### We want to map the min wage by state to this df:
Let's first load in the Minimum Wage Data, and drop columns thats missing data

In [24]:
df = pd.read_csv("Minimum Wage Data.csv")

act_min_wage = pd.DataFrame()

for name,group in df.groupby("State"):
    if act_min_wage.empty:
        act_min_wage = group.set_index("Year")["State.Minimum.Wage"].to_frame().rename(columns={"State.Minimum.Wage":name})
    else:
        act_min_wage = act_min_wage.join(group.set_index("Year")["State.Minimum.Wage"].to_frame().rename(columns={"State.Minimum.Wage":name}))
        
act_min_wage.head()

Unnamed: 0_level_0,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,District of Columbia,Florida,...,Tennessee,Texas,U.S. Virgin Islands,Utah,Vermont,Virginia,Washington,West Virginia,Wisconsin,Wyoming
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1968,0.0,2.1,0.468,0.15625,1.65,1.0,1.4,1.25,1.25,0.0,...,0.0,0.0,0.0,1.0,1.4,0.0,1.6,1.0,1.25,1.2
1969,0.0,2.1,0.468,0.15625,1.65,1.0,1.4,1.25,1.25,0.0,...,0.0,0.0,0.0,1.0,1.4,0.0,1.6,1.0,1.25,1.2
1970,0.0,2.1,0.468,1.1,1.65,1.0,1.6,1.25,1.6,0.0,...,0.0,0.0,0.0,1.0,1.6,0.0,1.6,1.0,1.3,1.3
1971,0.0,2.1,0.468,1.1,1.65,1.0,1.6,1.25,1.6,0.0,...,0.0,0.0,0.0,1.0,1.6,0.0,1.6,1.0,1.3,1.3
1972,0.0,2.1,0.468,1.2,1.65,1.0,1.85,1.6,1.6,0.0,...,0.0,1.4,0.0,1.2,1.6,0.0,1.6,1.2,1.45,1.5


In [12]:
import numpy as np

act_min_wage =  act_min_wage.replace(0,np.NaN).dropna(axis=1)
act_min_wage.head()

Unnamed: 0_level_0,Alaska,Arkansas,California,Colorado,Connecticut,Delaware,District of Columbia,Guam,Hawaii,Idaho,...,Oregon,Pennsylvania,Rhode Island,South Dakota,Utah,Vermont,Washington,West Virginia,Wisconsin,Wyoming
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1968,2.1,0.15625,1.65,1.0,1.4,1.25,1.25,1.25,1.25,1.15,...,1.25,1.15,1.4,0.425,1.0,1.4,1.6,1.0,1.25,1.2
1969,2.1,0.15625,1.65,1.0,1.4,1.25,1.25,1.25,1.25,1.15,...,1.25,1.15,1.4,0.425,1.0,1.4,1.6,1.0,1.25,1.2
1970,2.1,1.1,1.65,1.0,1.6,1.25,1.6,1.6,1.6,1.25,...,1.25,1.3,1.6,1.0,1.0,1.6,1.6,1.0,1.3,1.3
1971,2.1,1.1,1.65,1.0,1.6,1.25,1.6,1.6,1.6,1.25,...,1.25,1.3,1.6,1.0,1.0,1.6,1.6,1.0,1.3,1.3
1972,2.1,1.2,1.65,1.0,1.85,1.6,1.6,1.9,1.6,1.4,...,1.25,1.6,1.6,1.0,1.2,1.6,1.6,1.2,1.45,1.5


Our very end goal is to see if there's any relationship between unemployment and the minimum wage:
- Might need to call sth like .corr() on some DataFrame
- which is likely to be unemp_county dataframe
- Perhaps add a min_wage column to the unemp_county dataframe

**Preliminary solution:**
- go row by row in unemp_county, check which state, set the minimum wage column to that state's value in act_min_wage (which we need to locate first)
    -  In this case, we're mapping some values from one dataframe to another, but maybe another time it wont be a dataframe to another, it could be some sensor value, some sort of custom user input, or even something that will require further calculations. 
    - For reusability, map functions to columns, based on row values.
    
Let's just first create a function that can handle this:

In [30]:
def get_min_wage(year, state):
    try:
        return act_min_wage.loc[year][state]
    except:
        return np.NaN
    
get_min_wage(2009, "Maine")     # 7.25

7.25

**Now, we map!**

- about python map(): https://bit.ly/3fQAnAg

In [20]:
%%time    
# time will give us the total time to perform some cell's operation.

unemp_county["min_wage"] = list(map(get_min_wage, unemp_county["Year"], unemp_county["State"]))

Wall time: 1min 1s


We can use this method to map just about any function with as many parameters as we want to a column. This method will basically always work, but wont necessarily be the most efficient. Often, we can use .map or .apply insted to a column, or some other built-in methods, but the above is always an option.

In [27]:
unemp_county.head()

Unnamed: 0,Year,Month,State,County,Rate,min_wage
0,2015,February,Mississippi,Newton County,6.1,
1,2015,February,Mississippi,Panola County,9.4,
2,2015,February,Mississippi,Monroe County,7.9,
3,2015,February,Mississippi,Hinds County,6.1,
4,2015,February,Mississippi,Kemper County,10.6,


In [39]:
unemp_county.tail()

Unnamed: 0,Year,Month,State,County,Rate,min_wage
885543,2009,November,Maine,Somerset County,10.5,7.25
885544,2009,November,Maine,Oxford County,10.5,7.25
885545,2009,November,Maine,Knox County,7.5,7.25
885546,2009,November,Maine,Piscataquis County,11.3,7.25
885547,2009,November,Maine,Aroostook County,9.0,7.25


In [40]:
# isinstance(unemp_county[['Rate','min_wage']], pd.DataFrame)    # True
unemp_county[["Rate","min_wage"]]

Unnamed: 0,Rate,min_wage
0,6.1,
1,9.4,
2,7.9,
3,6.1,
4,10.6,
...,...,...
885543,10.5,7.25
885544,10.5,7.25
885545,7.5,7.25
885546,11.3,7.25


In [42]:
unemp_county[["Rate","min_wage"]].corr()

Unnamed: 0,Rate,min_wage
Rate,1.0,0.140909
min_wage,0.140909,1.0


In [43]:
unemp_county[["Rate","min_wage"]].cov()

Unnamed: 0,Rate,min_wage
Rate,9.687873,0.722753
min_wage,0.722753,2.720391


- It looks like there's a slightly positive relationship (correlation) between the unemployment rate and minimum wage, but also a pretty strong covariance, signaling to us that these two things do tend to vary together
- **Though they definitely vary together, the actual impact of one on the other isn't very substantial.**
- **we have to ask next which comes first. The increased unemployment, or the minimum wage increases.**
- Also, I'd like to look at **election data** by county and see if there's a relationship between voting, minimum wage, and unemployment

In [54]:
election_2016 = pd.read_csv("Election-2016-results.csv")
election_2016.head()

Unnamed: 0,county,fips,cand,st,pct_report,votes,total_votes,pct,lead
0,,US,Donald Trump,US,0.9951,60350241.0,127592176.0,0.472993,Donald Trump
1,,US,Hillary Clinton,US,0.9951,60981118.0,127592176.0,0.477938,Donald Trump
2,,US,Gary Johnson,US,0.9951,4164589.0,127592176.0,0.03264,Donald Trump
3,,US,Jill Stein,US,0.9951,1255968.0,127592176.0,0.009844,Donald Trump
4,,US,Evan McMullin,US,0.9951,451636.0,127592176.0,0.00354,Donald Trump


This data starts with the entire US aggregate data, but then breaks down by state and county, as well as candidate

In [61]:
election_2016["county"].unique()

array([nan, 'Los Angeles County', 'Cook County', ..., 'St. Croix Island',
       'St. John Island', 'St. Thomas Island'], dtype=object)

Let's include the top 10 candidates. 

To grab their names:

In [57]:
top_candidates = election_2016.head(10)["cand"].values
top_candidates

array(['Donald Trump', 'Hillary Clinton', 'Gary Johnson', 'Jill Stein',
       'Evan McMullin', 'Darrell Castle', 'Gloria La Riva',
       'Rocky De La Fuente', ' None of these candidates',
       'Richard Duncan'], dtype=object)

In [74]:
#county_2015 = unemp_county[ (unemp_county['Year']==2015 and unemp_county["Month"]=="February") ]



unemp_county_2015 = unemp_county[ (unemp_county['Year']==2015) & (unemp_county["Month"]=="February")]
unemp_county_2015

Unnamed: 0,Year,Month,State,County,Rate,min_wage
0,2015,February,Mississippi,Newton County,6.1,
1,2015,February,Mississippi,Panola County,9.4,
2,2015,February,Mississippi,Monroe County,7.9,
3,2015,February,Mississippi,Hinds County,6.1,
4,2015,February,Mississippi,Kemper County,10.6,
...,...,...,...,...,...,...
2797,2015,February,Maine,Somerset County,8.4,7.5
2798,2015,February,Maine,Oxford County,6.8,7.5
2799,2015,February,Maine,Knox County,6.1,7.5
2800,2015,February,Maine,Piscataquis County,7.0,7.5


Now, for unemp_county_2015, we'd like to convert the State to all-caps abbreviation that our election_2016 is using. We can do that using our abbreviations that we used before:

In [64]:
state_abbv = pd.read_csv("state_abbv.csv", index_col=0)    # Use the first column as index instead of adding new ones
state_abbv.head()

Unnamed: 0_level_0,Postal Code
State/District,Unnamed: 1_level_1
Alabama,AL
Alaska,AK
Arizona,AZ
Arkansas,AR
California,CA


In [73]:
state_abbv_dict = state_abbv.to_dict()["Postal Code"]
state_abbv_dict

{'Alabama': 'AL',
 'Alaska': 'AK',
 'Arizona': 'AZ',
 'Arkansas': 'AR',
 'California': 'CA',
 'Colorado': 'CO',
 'Connecticut': 'CT',
 'Delaware': 'DE',
 'District of Columbia': 'DC',
 'Florida': 'FL',
 'Georgia': 'GA',
 'Hawaii': 'HI',
 'Idaho': 'ID',
 'Illinois': 'IL',
 'Indiana': 'IN',
 'Iowa': 'IA',
 'Kansas': 'KS',
 'Kentucky': 'KY',
 'Louisiana': 'LA',
 'Maine': 'ME',
 'Maryland': 'MD',
 'Massachusetts': 'MA',
 'Michigan': 'MI',
 'Minnesota': 'MN',
 'Mississippi': 'MS',
 'Missouri': 'MO',
 'Montana': 'MT',
 'Nebraska': 'NE',
 'Nevada': 'NV',
 'New Hampshire': 'NH',
 'New Jersey': 'NJ',
 'New Mexico': 'NM',
 'New York': 'NY',
 'North Carolina': 'NC',
 'North Dakota': 'ND',
 'Ohio': 'OH',
 'Oklahoma': 'OK',
 'Oregon': 'OR',
 'Pennsylvania': 'PA',
 'Rhode Island': 'RI',
 'South Carolina': 'SC',
 'South Dakota': 'SD',
 'Tennessee': 'TN',
 'Texas': 'TX',
 'Utah': 'UT',
 'Vermont': 'VT',
 'Virginia': 'VA',
 'Washington': 'WA',
 'West Virginia': 'WV',
 'Wisconsin': 'WI',
 'Wyoming': 'WY

In [75]:
unemp_county_2015["State"] = unemp_county_2015["State"].map(state_abbv_dict)
unemp_county_2015.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unemp_county_2015["State"] = unemp_county_2015["State"].map(state_abbv_dict)


Unnamed: 0,Year,Month,State,County,Rate,min_wage
0,2015,February,MS,Newton County,6.1,
1,2015,February,MS,Panola County,9.4,
2,2015,February,MS,Monroe County,7.9,
3,2015,February,MS,Hinds County,6.1,
4,2015,February,MS,Kemper County,10.6,


***In the case of singe-parmeter functions, we can just use a .map.*** Or...as you just saw here, ***if you want to map a key to a value using a dict, you can do the same thing, and just say you want to map the dictionary***. Cool, huh?

Now let's map the county's candidate percentages to this. To do this, we have quite a few columns, and really, everything just needs to match up by county and state.

In [77]:
print(len(unemp_county_2015))
print(len(election_2016))

2802
18475


#### Notice:
- election_2016 is longer, we'll map that to unemp_county_15, where there are matches. **Instead of a map, however, we'll combine with a join**
- Both dataframes are indexed by state AND county. So, we'll name these both the same, and then index as such.

In [99]:
election_2016.rename(columns={"County":"County","State":"State"}, inplace=True)
election_2016.head()

Unnamed: 0,County,fips,cand,State,pct_report,votes,total_votes,pct,lead
0,,US,Donald Trump,US,0.9951,60350241.0,127592176.0,0.472993,Donald Trump
1,,US,Hillary Clinton,US,0.9951,60981118.0,127592176.0,0.477938,Donald Trump
2,,US,Gary Johnson,US,0.9951,4164589.0,127592176.0,0.03264,Donald Trump
3,,US,Jill Stein,US,0.9951,1255968.0,127592176.0,0.009844,Donald Trump
4,,US,Evan McMullin,US,0.9951,451636.0,127592176.0,0.00354,Donald Trump


#### Some issues:
- Accidentally run, which results in my dfs being modified
for df in [unemp_county_2015, election_2016]:
    
    df.set_index(["County", "State"], inplace=True)

- Thus, please just treat both dfs as modified ones and join directly
  

In [105]:
election_2016.set_index(["County", "State"], inplace=True)

KeyError: "None of ['County', 'State'] are in the columns"

In [127]:
unemp_county_2015 = unemp_county_2015.join(election_2016)

ValueError: columns overlap but no suffix specified: Index(['fips', 'cand', 'pct_report', 'votes', 'total_votes', 'pct', 'lead'], dtype='object')

In [129]:
unemp_county_2015.head()   # Check the result

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,Month,Rate,min_wage,fips,cand,pct_report,votes,total_votes,pct,lead
County,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Abbeville County,SC,2015,February,7.4,,45001,Donald Trump,1.0,6742.0,10724.0,0.628683,Donald Trump
Abbeville County,SC,2015,February,7.4,,45001,Hillary Clinton,1.0,3712.0,10724.0,0.34614,Donald Trump
Abbeville County,SC,2015,February,7.4,,45001,Gary Johnson,1.0,128.0,10724.0,0.011936,Donald Trump
Abbeville County,SC,2015,February,7.4,,45001,Evan McMullin,1.0,56.0,10724.0,0.005222,Donald Trump
Abbeville County,SC,2015,February,7.4,,45001,Darrell Castle,1.0,38.0,10724.0,0.003543,Donald Trump


In [125]:
election_2016.head()    # Check the result

Unnamed: 0_level_0,Unnamed: 1_level_0,fips,cand,pct_report,votes,total_votes,pct,lead
County,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
,US,US,Donald Trump,0.9951,60350241.0,127592176.0,0.472993,Donald Trump
,US,US,Hillary Clinton,0.9951,60981118.0,127592176.0,0.477938,Donald Trump
,US,US,Gary Johnson,0.9951,4164589.0,127592176.0,0.03264,Donald Trump
,US,US,Jill Stein,0.9951,1255968.0,127592176.0,0.009844,Donald Trump
,US,US,Evan McMullin,0.9951,451636.0,127592176.0,0.00354,Donald Trump


Let's just only take Donald Trump's County & State rows first:
   
    - [IMPORTANT] unemp_county_2015 is now a merged column  

In [143]:
unemp_county_2015_Donald = unemp_county_2015[ unemp_county_2015["cand"]=="Donald Trump" ]
unemp_county_2015_Donald.dropna(inplace=True)
unemp_county_2015_Donald.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unemp_county_2015_Donald.dropna(inplace=True)


Unnamed: 0_level_0,Unnamed: 1_level_0,Year,Month,Rate,min_wage,fips,cand,pct_report,votes,total_votes,pct,lead
County,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Ada County,ID,2015,February,4.0,7.25,16001,Donald Trump,1.0,93748.0,195587.0,0.479316,Donald Trump
Adair County,KY,2015,February,8.8,7.25,21001,Donald Trump,1.0,6637.0,8231.0,0.806342,Donald Trump
Adair County,OK,2015,February,6.8,2.0,40001,Donald Trump,1.0,4753.0,6468.0,0.734848,Donald Trump
Adams County,CO,2015,February,5.2,8.23,8001,Donald Trump,1.0,73807.0,175125.0,0.421453,Hillary Clinton
Adams County,ID,2015,February,10.8,7.25,16003,Donald Trump,1.0,1556.0,2183.0,0.712781,Donald Trump
Adams County,IN,2015,February,4.3,7.25,18001,Donald Trump,1.0,9642.0,13039.0,0.739474,Donald Trump
Adams County,ND,2015,February,3.7,7.25,38001,Donald Trump,1.0,904.0,1206.0,0.749585,Donald Trump
Adams County,NE,2015,February,3.2,8.0,31001,Donald Trump,1.0,9205.0,13172.0,0.698831,Donald Trump
Adams County,OH,2015,February,9.8,7.25,39001,Donald Trump,1.0,8445.0,11063.0,0.763355,Donald Trump
Adams County,PA,2015,February,4.8,7.25,42001,Donald Trump,1.0,31249.0,47138.0,0.662926,Donald Trump


Finally, drop columns thats of no help:

    - Can drop the "cand" since all of the value will be "Donald Trump"

In [147]:
unemp_county_2015_Donald.drop(["fips","cand","pct_report","votes","total_votes","lead"],axis=1,inplace=True)


KeyError: "['fips' 'cand' 'pct_report' 'votes' 'total_votes' 'lead'] not found in axis"

In [150]:
unemp_county_2015_Donald.drop("Year", axis=1,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [151]:
unemp_county_2015_Donald.corr()

Unnamed: 0,Rate,min_wage,pct
Rate,1.0,0.186703,-0.085985
min_wage,0.186703,1.0,-0.325007
pct,-0.085985,-0.325007,1.0


#### Observations:
- min_wage appears to have a negative correlation with the pct vote for Trump