### bd econ CPS extract

bd_CPS_ONET.ipynb

February 21, 2019

Contact: Brian Dew, @bd_econ

Requires: `Abilities.txt`, `Work Activities.txt`, `Work Context.txt`, `Occupation Data.txt`, and `2010_to_SOC_Crosswalk.xls` from [O*Net](https://www.onetcenter.org/database.html#individual-files)

-----

See [readme](https://github.com/bdecon/econ_data/tree/master/bd_CPS) for bd CPS documentation.

In [1]:
import pandas as pd
import numpy as np
import os

os.chdir('/home/brian/Documents/CPS/data/db_23_2_text/')

In [2]:
df = pd.read_csv('Abilities.txt', sep='\t')

pd_skills = ['Static Strength', 'Explosive Strength', 
             'Dynamic Strength', 'Trunk Strength', 
             'Reaction Time', 'Gross Body Equilibrium']

pds = df.loc[(df['Scale ID'] == 'IM') & 
             (df['Element Name'].isin(pd_skills)) & 
             (df['Data Value'] >= 4.0), 'O*NET-SOC Code'].to_list()

df = pd.read_csv('Work Activities.txt', sep='\t')

pd_activities = ['Performing General Physical Activities', 
                 'Handling and Moving Objects']

pda = df.loc[(df['Scale ID'] == 'IM') & 
             (df['Element Name'].isin(pd_activities)) & 
             (df['Data Value'] >= 4.0), 'O*NET-SOC Code'].to_list()

df = pd.read_csv('Work Context.txt', sep='\t')

pd_context = ['Spend Time Bending or Twisting the Body',
              'Spend Time Kneeling, Crouching, Stooping, or Crawling',
              'Spend Time Standing', 'Spend Time Walking and Running',
              'Spend Time Making Repetitive Motions']

pdc = df.loc[(df['Scale ID'] == 'CX') & 
             (df['Element Name'].isin(pd_context)) & 
             (df['Data Value'] >= 4.0), 'O*NET-SOC Code'].to_list()

pd_jobs = list(set(pds + pda + pdc))

In [3]:
len(pd_jobs)

334

In [4]:
job_titles = pd.read_csv('Occupation Data.txt', sep='\t')

#### List of Physically Demanding Jobs

In [5]:
job_titles.loc[job_titles['O*NET-SOC Code'].isin(pd_jobs), 'Title'].to_list()

['Nursery and Greenhouse Managers',
 'Food Service Managers',
 'Claims Examiners, Property and Casualty Insurance',
 'Web Developers',
 'Geospatial Information Scientists and Technologists',
 'Architectural Drafters',
 'Food Science Technicians',
 'Chemical Technicians',
 'Court Reporters',
 'Kindergarten Teachers, Except Special Education',
 'Elementary School Teachers, Except Special Education',
 'Adapted Physical Education Specialists',
 'Museum Technicians and Conservators',
 'Art Directors',
 'Floral Designers',
 'Merchandise Displayers and Window Trimmers',
 'Actors',
 'Athletes and Sports Competitors',
 'Coaches and Scouts',
 'Umpires, Referees, and Other Sports Officials',
 'Dancers',
 'Choreographers',
 'Musicians, Instrumental',
 'Technical Writers',
 'Chiropractors',
 'Dentists, General',
 'Oral and Maxillofacial Surgeons',
 'Prosthodontists',
 'Pharmacists',
 'Surgeons',
 'Dermatologists',
 'Sports Medicine Physicians',
 'Physical Therapists',
 'Radiation Therapists',
 'Exe

#### How many job categories are partially PD?

In [6]:
on_soc = pd.read_excel('2010_to_SOC_Crosswalk.xls', header=3)

In [7]:
on_soc['PD'] = np.where(on_soc['O*NET-SOC 2010 Code'].isin(pd_jobs), 1, 0)

In [8]:
pd_cats = on_soc.groupby('2010 SOC Code')['PD'].mean()#.between(0, 1, inclusive=False)

In [9]:
pd_cats[pd_cats.between(0, 1, inclusive=False)]

2010 SOC Code
11-9013    0.250000
13-1031    0.333333
15-1199    0.076923
17-3011    0.333333
19-4011    0.333333
25-2059    0.500000
27-2042    0.333333
29-1069    0.153846
29-1141    0.200000
29-2011    0.750000
29-2099    0.200000
31-9099    0.333333
33-1021    0.666667
33-2011    0.666667
33-2021    0.333333
43-4041    0.333333
43-5081    0.400000
43-9041    0.333333
45-2092    0.333333
47-2031    0.666667
47-2152    0.333333
49-2021    0.500000
49-3023    0.666667
49-9021    0.333333
51-4121    0.333333
51-8099    0.200000
51-9195    0.600000
51-9199    0.500000
53-6051    0.250000
Name: PD, dtype: float64

#### An example where only a subset of a CPS jobs category is PD

In [10]:
on_soc[on_soc['2010 SOC Code'] == '15-1199']

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,2010 SOC Code,2010 SOC Title,PD
135,15-1199.00,"Computer Occupations, All Other",15-1199,"Computer Occupations, All Other",0
136,15-1199.01,Software Quality Assurance Engineers and Testers,15-1199,"Computer Occupations, All Other",0
137,15-1199.02,Computer Systems Engineers/Architects,15-1199,"Computer Occupations, All Other",0
138,15-1199.03,Web Administrators,15-1199,"Computer Occupations, All Other",0
139,15-1199.04,Geospatial Information Scientists and Technolo...,15-1199,"Computer Occupations, All Other",1
140,15-1199.05,Geographic Information Systems Technicians,15-1199,"Computer Occupations, All Other",0
141,15-1199.06,Database Architects,15-1199,"Computer Occupations, All Other",0
142,15-1199.07,Data Warehousing Specialists,15-1199,"Computer Occupations, All Other",0
143,15-1199.08,Business Intelligence Analysts,15-1199,"Computer Occupations, All Other",0
144,15-1199.09,Information Technology Project Managers,15-1199,"Computer Occupations, All Other",0


In [11]:
# Why are GIS jobs PD?
GIS = '15-1199.04'

df = pd.read_csv('Work Context.txt', sep='\t')

In [12]:
df.loc[(df['O*NET-SOC Code'] == GIS) & 
       (df['Scale ID'] == 'CX') & 
       (df['Element Name'].isin(pd_context)) & 
       (df['Data Value'] >= 4.0)]

Unnamed: 0,O*NET-SOC Code,Element ID,Element Name,Scale ID,Category,Data Value,N,Standard Error,Lower CI Bound,Upper CI Bound,Recommend Suppress,Not Relevant,Date,Domain Source
41820,15-1199.04,4.C.2.d.1.i,Spend Time Making Repetitive Motions,CX,,4.0,23.0,,,,,,07/2016,Occupational Expert


#### Another example

In [13]:
on_soc[on_soc['2010 SOC Code'] == '29-1069']

Unnamed: 0,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title,2010 SOC Code,2010 SOC Title,PD
469,29-1069.00,"Physicians and Surgeons, All Other",29-1069,"Physicians and Surgeons, All Other",0
470,29-1069.01,Allergists and Immunologists,29-1069,"Physicians and Surgeons, All Other",0
471,29-1069.02,Dermatologists,29-1069,"Physicians and Surgeons, All Other",1
472,29-1069.03,Hospitalists,29-1069,"Physicians and Surgeons, All Other",0
473,29-1069.04,Neurologists,29-1069,"Physicians and Surgeons, All Other",0
474,29-1069.05,Nuclear Medicine Physicians,29-1069,"Physicians and Surgeons, All Other",0
475,29-1069.06,Ophthalmologists,29-1069,"Physicians and Surgeons, All Other",0
476,29-1069.07,Pathologists,29-1069,"Physicians and Surgeons, All Other",0
477,29-1069.08,Physical Medicine and Rehabilitation Physicians,29-1069,"Physicians and Surgeons, All Other",0
478,29-1069.09,Preventive Medicine Physicians,29-1069,"Physicians and Surgeons, All Other",0


In [14]:
# Why is Dermatology physically demanding?
DERM = '29-1069.02'

df = pd.read_csv('Work Context.txt', sep='\t')

In [15]:
df.loc[(df['O*NET-SOC Code'] == DERM) & 
       (df['Scale ID'] == 'CX') & 
       (df['Element Name'].isin(pd_context)) & 
       (df['Data Value'] >= 4.0)]

Unnamed: 0,O*NET-SOC Code,Element ID,Element Name,Scale ID,Category,Data Value,N,Standard Error,Lower CI Bound,Upper CI Bound,Recommend Suppress,Not Relevant,Date,Domain Source
138484,29-1069.02,4.C.2.d.1.b,Spend Time Standing,CX,,4.08,21.0,0.25,3.56,4.6,N,,07/2012,Incumbent


In [16]:
os.chdir('..')

In [17]:
pd.read_feather('clean/cps2018.ft')['OCC'].unique()

[-1, 9140, 4920, 350, 5700, ..., 8860, 8210, 6750, 4410, 8420]
Length: 485
Categories (485, int64): [-1, 9140, 4920, 350, ..., 8210, 6750, 4410, 8420]

#### Mapping SOC codes to Census codes

In [18]:
pd.read_html('https://www.bls.gov/cps/cenocc2010.htm')[0]

Unnamed: 0,Occupation title,2010 Census code(s),2010 SOC code(s)
0,"Management, professional, and related occupations",0010–3540,11-0000–29-0000
1,"Management, business, and financial operations...",0010–0950,11-0000–13-0000
2,Management occupations,0010–0430,11-0000
3,Chief executives,0010,11-1011
4,General and operations managers,0020,11-1021
5,Legislators,0030,11-1031
6,Advertising and promotions managers,0040,11-2011
7,Marketing and sales managers,0050,11-2020
8,Public relations and fundraising managers,0060,11-2031
9,Administrative services managers,0100,11-3011
