# Which jobs grew in pay after adjusting for inflation from 2003 to 2018 in Louisville Ky?

To figure this out I will look at data collected from The Occupational Information Network (https://www.onetonline.org/).  I will look at data collected in 5 year increments from 2003 to 2018 and the data looked at will only be Louisville Kentucky data. 

**Why not look at the highest paying jobs?**

Looking at the highest paying job does suggest that a particular job is valued at the specific time that that job is viewed.  However, jobs that see a continuous increase in income suggest that the supply is not meeting the demand, and leads to an increase in pay.  Therefore, we should see growth in the pay for jobs in high demand. 


In [1]:
import sqlite3
import os
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

**I cleaned the data in the other file and this contains only the clean dataset.**

In [2]:
occupation_file_name = os.path.join('new_occupation.csv')

In [3]:
occupation = pd.read_csv(occupation_file_name, index_col=0)

In [4]:
occupation["OCC_TITLE"] = occupation["OCC_TITLE"].str.lower()
pop=occupation.pop('Unnamed: 19')

## Converting to 2018 dollars
In economics the term real dollars is the value of currency after being adjusted for inflation.  I will turn everything into 2018 dollars and to compute the difference I used an inflation calculator, which can be found in the following website 
https://www.usinflationcalculator.com/

Adjusted for inflation, 1.00 in 2003 is equal to 1.36 in 2018, with a 36.5% cumulative rate of inflation.

Adjusted for inflation, 1.00 in 2008 is equal to 1.17 in 2018, with a 16.6% cumulative rate of inflation.

Adjusted for inflation, 1.00 in 2013 is equal to 1.08 in 2018, with a 7.8% cumulative rate of inflation. 

In [5]:
#This formula converts each year to 2018 dollars, which contorls for inflation. 
occupation.columns[5:18]

for x in occupation.columns[5:18]:
    occupation.loc[(occupation.YEAR == 2013), x] *=1.08
    occupation.loc[(occupation.YEAR == 2008), x] *=1.17
    occupation.loc[(occupation.YEAR == 2003), x] *=1.36


## Checking to see how many occupations existed across all four sample years

As you can see there are 389 jobs that where in all four instances, 128 in three, 139 in two, and 80 in 1.  In the future, I can use this to look at jobs that became obsolete or jobs that are newly created. For this project I will only look at the four year data. 

In [6]:
#This counts the number of instances of a specific job code.  
occ_count=occupation.groupby('OCC_CODE')
occ_count_all = occ_count.size()
occ_count_all.value_counts()

4    389
2    139
3    128
1     80
dtype: int64

In [12]:
#I made a bunch of dictionaries that contain the job code as the key and the number of years as its items.  
#I know that dictionaries can be converted to pandas easily.  I may also use this dictionary in for loops. 
total_instance=occupation.groupby(['OCC_CODE'])
total_instance=total_instance.size()
dict4={}
dict3={}
dict2={}
dict1={}
for y, x in total_instance.items():
    if x == 4:
        dict4.update({y:x})
    elif x == 3:
        dict3.update({y:x})
    elif x == 2:
        dict2.update({y:x})
    else:
        dict1.update({y:x})   


In [13]:
#Sort values for percent change
occupation.sort_values(['YEAR', 'OCC_CODE'], ascending=[1, 1], inplace=True)

#Create Percent Change for the hourly mean (H_MEAN) and average means (A_MEAN) items
occupation['H_CHANGE'] = occupation.groupby('OCC_CODE').H_MEAN.pct_change()
occupation['A_CHANGE'] = occupation.groupby('OCC_CODE').A_MEAN.pct_change()

In [14]:
#I am using this to create new rows and then I am collecting the location for further manipulation
occupation = occupation.assign(**{'FOUR_YEAR':"False",'THREE_YEAR': "False",'TWO_YEAR': "False",'ONE_YEAR':"False",'GAP_YEAR':"False"})
y4_loc=occupation.columns.get_loc('FOUR_YEAR')
y3_loc=occupation.columns.get_loc('THREE_YEAR')
y2_loc=occupation.columns.get_loc('TWO_YEAR')
y1_loc=occupation.columns.get_loc('ONE_YEAR')
ygap_loc=occupation.columns.get_loc('GAP_YEAR')

# I am using this to populate the newly created rows. 
locnumber=0
for index, row in occupation.iterrows():
    occupation.iloc[locnumber:,y4_loc] = (row['OCC_CODE'] in dict4)
    occupation.iloc[locnumber:,y3_loc] = (row['OCC_CODE'] in dict3)
    occupation.iloc[locnumber:,y2_loc] = (row['OCC_CODE'] in dict2)
    occupation.iloc[locnumber:,y1_loc] = (row['OCC_CODE'] in dict1)
    occupation.iloc[locnumber:,ygap_loc] = (row['OCC_CODE'] in dictgap)
    locnumber +=1

#Here is data that is just Occupation data that has each based on the different ways I sepereated them.  
occupation_4 = occupation[occupation['FOUR_YEAR'] == True]
occupation_3 = occupation[(occupation['THREE_YEAR'] == True) & (occupation['GAP_YEAR'] == False)]
occupation_2 = occupation[(occupation['TWO_YEAR'] == True) & (occupation['GAP_YEAR'] == False)]
occupation_1 = occupation[(occupation['ONE_YEAR'] == True) & (occupation['GAP_YEAR'] == False)]
occupation_gap = occupation[(occupation['GAP_YEAR'] == True) & (occupation['GAP_YEAR'] == False)]

occupation["OCC_TITLE"] = occupation["OCC_TITLE"].str.lower()

#occupation.loc[occupation.OCC_TITLE.isupper(), 'OCC_TITLE"] = occupation.OCC_TITLE.str.lower() 
#This can help if I need to search for terms
occupation=occupation[occupation["OCC_TITLE"].str.contains("econ")]

NameError: name 'dictgap' is not defined

### SQL STUFF

In [None]:
occupation_4=occupation_4.sort_values(by='YEAR', ascending=False)
occupation_4.to_sql("occ_table",sqlite3.connect('occ.db'), if_exists ="replace")

In [None]:
con = sqlite3.connect("occ.db")

In [None]:
#This is sorting the data by the average hourly percent change. Where those with the highest percent change fairing better.   
highest_h_ave= pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE, H_MEAN, AVG(H_CHANGE) as H_Change_Average, YEAR FROM occ_table GROUP BY OCC_CODE ORDER BY H_Change_Average DESC LIMIT 10", con)
highest_h_ave

In [None]:
highest_h_mean= pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE,H_MEAN,YEAR FROM occ_table WHERE OCC_CODE IN ('27-2012','11-9061','49-9098','19-4091','11-2011','11-3061','29-1066','39-9032','13-2082','27-4032')", con)
highest_h_mean.set_index('YEAR', inplace=True)
highest_h_mean.groupby('OCC_CODE')['H_MEAN'].plot(legend=True)

In [None]:
highest_h_mean

In [None]:
#offset it to avoid na data
lowest_h_ave=pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE,AVG(H_CHANGE) as H_average FROM occ_table GROUP BY OCC_CODE ORDER BY H_average ASC LIMIT 10 OFFSET 12", con)
lowest_h_ave

In [None]:
lowest_h_mean= pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE,H_MEAN,YEAR FROM occ_table WHERE OCC_CODE IN ('29-1011','13-1121','13-2021','51-3091','17-3025','51-4023','53-7021','49-3022','25-1194','51-4122')", con)
lowest_h_mean.set_index('YEAR', inplace=True)
lowest_h_mean.groupby('OCC_CODE')['H_MEAN'].plot(legend=True)

In [None]:
#This is sorting the data by the average annual percent change. Where those with the highest percent change fairing better.   
highest_a_ave=pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE, AVG(A_CHANGE) as A_average FROM occ_table GROUP BY OCC_CODE ORDER BY A_average DESC LIMIT 10", con)
highest_a_ave

In [None]:
highest_a_mean= pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE,A_MEAN,YEAR FROM occ_table WHERE OCC_CODE IN ('11-9061','25-1121','49-9098','53-2012','19-4091','11-2011','11-3061','29-1066','39-9032','13-2082')", con)
highest_a_mean.set_index('YEAR', inplace=True)
highest_a_mean.groupby('OCC_CODE')['A_MEAN'].plot(legend=True)

In [None]:
#This is sorting the data by the average annual percent change. Where those with the lowest percent change fairing better.   
lowest_a_ave=pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE, AVG(A_CHANGE) as A_average FROM occ_table GROUP BY OCC_CODE ORDER BY A_average ASC LIMIT 10", con)
lowest_a_ave

In [None]:
lowest_a_mean= pd.read_sql_query("SELECT OCC_CODE,OCC_TITLE,A_MEAN,YEAR FROM occ_table WHERE OCC_CODE IN ('29-1011','13-1121','13-2021','29-9091','51-3091','17-3025','51-4023','53-7021','49-3022','25-1194')", con)
lowest_a_mean.set_index('YEAR', inplace=True)
lowest_a_mean.groupby('OCC_CODE')['A_MEAN'].plot(legend=True)