# How to join Onet code to get salary and title

##### Data sources used:
+ Linkup Raw Job Records.  We will be using a slice of raw data.
+ All data XML file from Bureau of Labor Statistics.  https://www.bls.gov/oes/tables.htm
+ Onet 2010 to 2018 Crosswalk.  https://www.onetcenter.org/crosswalks.html

##### Warning:  The Bureau of Labor Statistics and only includes salary data for the United States.  Salary estimates that are being joined are estimates based on United States Data.

##### In order to join these tables together there are 2 approaches that can be done within python:

1. The first approach is using the pandas library to join these tables.
2. The second apprach uses SQL to join these tables.  I will be using the sqlite3 library to create a SQL structure in memory, but the query can be taken and used in any SQL database

##### For this tutorial I am going to use select columns from BLS that are most common for our clients, however feel free to look through to choose the data points most relevant to your use case.  The columns I will join are:
- Occ_Code:  This is the join key
- Occ_Title:  Human readable onet description
- h_mean & a_mean:  Hourly and annual mean income
- h_median & h_median:  Hourly and annual hourly income

In [7]:
%whos

Variable   Type        Data/Info
--------------------------------
a          NoneType    None


In [6]:
%ls

Aggregates from Raw Job Records.html
Aggregates from Raw Job Records.ipynb
Joining Files to Job Records.html
Joining Files to Job Records.ipynb
Loading Daily Files.ipynb
Onet Code - Add Onet Desc and Salary.html
Onet Code - Add Onet Desc and Salary.ipynb
PIT Ticker Join for Larry.ipynb
Reference Files Querying.html
Reference Files Querying.ipynb


In [1]:
%pwd

'/Users/iflath/data-analysis-science/QuantTips+Tricks/Tutorials_Finished'

In [1]:
# Import Libaries
import pandas as pd
import numpy as np
import os
import sqlite3

# Display parameters for dataframes for tutorial display purposes
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [2]:
# Load Linkup Job Records
JobRecords = pd.read_csv('../raw-sample/raw-sample-jobs.csv', low_memory = False)

# Load BLS Datadata
BLS_data = pd.read_excel('../OnetCodes/all_data_M_2018.xlsx')
BLS_data = BLS_data[(BLS_data.area_title == 'U.S.') &
                    (BLS_data.naics_title == 'Cross-industry')]

#Load Crosswalk
OnetCrosswalk = pd.read_excel('../OnetCodes/2010_to_2018_SOC_Crosswalk.xls')[['O*NET-SOC 2010 Code','2018 SOC Code']]
OnetCrosswalk.columns = ['O*NET-SOC 2010 Code','occ_code']

BLS_data = OnetCrosswalk.merge(BLS_data[['occ_code', 'occ_title', 'h_mean', 'a_mean', 'h_median', 'a_median']], 
                    how = 'left',on = 'occ_code')

# Crosswalk Onet Codes in Job Records to 2018 Version

In [3]:
JobRecords = JobRecords.merge(OnetCrosswalk, 
                              how = 'left',
                              left_on = 'onet_occupation_code', 
                              right_on = 'O*NET-SOC 2010 Code')

# Approach 1

##### USe the Pandas Python library to join tables together

In [4]:
New_JobRecords = JobRecords.merge(
    BLS_data,
    on = 'occ_code',
    how = 'left')
New_JobRecords.head(3)

Unnamed: 0,hash,title,company_id,company_name,city,state,zip,country,created,last_checked,last_updated,delete_date,unmapped_location,onet_occupation_code,url,O*NET-SOC 2010 Code_x,occ_code,O*NET-SOC 2010 Code_y,occ_title,h_mean,a_mean,h_median,a_median
0,0f659e59f8967f986ab53b898e543095,Financial Solutions Advisor - Bilingual Mandar...,381,Bank of America Corporation,El Cerrito,CA,94530,USA,2015-04-16T12:25:01+00:00,2015-04-23T08:16:47+00:00,,2015-04-25T13:20:05+00:00,False,41-3031.02,https://bacfhrs.taleo.net/careersection/2/jobd...,41-3031.02,41-3031,41-3031.00,"Securities, Commodities, and Financial Service...",47.49,98770,30.83,64120
1,0f659e59f8967f986ab53b898e543095,Financial Solutions Advisor - Bilingual Mandar...,381,Bank of America Corporation,El Cerrito,CA,94530,USA,2015-04-16T12:25:01+00:00,2015-04-23T08:16:47+00:00,,2015-04-25T13:20:05+00:00,False,41-3031.02,https://bacfhrs.taleo.net/careersection/2/jobd...,41-3031.02,41-3031,41-3031.01,"Securities, Commodities, and Financial Service...",47.49,98770,30.83,64120
2,0f659e59f8967f986ab53b898e543095,Financial Solutions Advisor - Bilingual Mandar...,381,Bank of America Corporation,El Cerrito,CA,94530,USA,2015-04-16T12:25:01+00:00,2015-04-23T08:16:47+00:00,,2015-04-25T13:20:05+00:00,False,41-3031.02,https://bacfhrs.taleo.net/careersection/2/jobd...,41-3031.02,41-3031,41-3031.02,"Securities, Commodities, and Financial Service...",47.49,98770,30.83,64120


# Approach 2

##### Use SQL query to join these tables.  I will be using the sqlite3 library to create a SQL structure in memory, but the query can be taken and used in any SQL database

In [5]:
#Make the db in memory
conn = sqlite3.connect(':memory:')
#write the tables
JobRecords.to_sql('JobRecords', conn, index=False)
BLS_data.to_sql('BLS_data', conn, index=False)

qry = '''
    SELECT *
    FROM JobRecords
    LEFT JOIN BLS_data
    ON JobRecords.occ_code = BLS_data.occ_code;
    '''

New_JobRecords = pd.read_sql_query(qry, conn)
New_JobRecords.head(3)

  dtype=dtype, method=method)


Unnamed: 0,hash,title,company_id,company_name,city,state,zip,country,created,last_checked,last_updated,delete_date,unmapped_location,onet_occupation_code,url,O*NET-SOC 2010 Code,occ_code,O*NET-SOC 2010 Code.1,occ_code.1,occ_title,h_mean,a_mean,h_median,a_median
0,0f659e59f8967f986ab53b898e543095,Financial Solutions Advisor - Bilingual Mandar...,381,Bank of America Corporation,El Cerrito,CA,94530,USA,2015-04-16T12:25:01+00:00,2015-04-23T08:16:47+00:00,,2015-04-25T13:20:05+00:00,0.0,41-3031.02,https://bacfhrs.taleo.net/careersection/2/jobd...,41-3031.02,41-3031,41-3031.00,41-3031,"Securities, Commodities, and Financial Service...",47.49,98770,30.83,64120
1,0f659e59f8967f986ab53b898e543095,Financial Solutions Advisor - Bilingual Mandar...,381,Bank of America Corporation,El Cerrito,CA,94530,USA,2015-04-16T12:25:01+00:00,2015-04-23T08:16:47+00:00,,2015-04-25T13:20:05+00:00,0.0,41-3031.02,https://bacfhrs.taleo.net/careersection/2/jobd...,41-3031.02,41-3031,41-3031.01,41-3031,"Securities, Commodities, and Financial Service...",47.49,98770,30.83,64120
2,0f659e59f8967f986ab53b898e543095,Financial Solutions Advisor - Bilingual Mandar...,381,Bank of America Corporation,El Cerrito,CA,94530,USA,2015-04-16T12:25:01+00:00,2015-04-23T08:16:47+00:00,,2015-04-25T13:20:05+00:00,0.0,41-3031.02,https://bacfhrs.taleo.net/careersection/2/jobd...,41-3031.02,41-3031,41-3031.02,41-3031,"Securities, Commodities, and Financial Service...",47.49,98770,30.83,64120
