# O\*NET Occupational Categories

* O\*NET categorized detailed occupations into categories such as __Bright Outlook__, __Green Economy Sector__, __STEM__ and __New & Emerging__.
* The first three categories can be reached from the [link](#https://www.onetonline.org/find/) and _New & Emerging_ occupations can be reached from [here](https://www.onetcenter.org/reports/NewEmerging.html).
* The aim is to merge all categories into one file and match the occupational titles with O\*NET data to identify important _abilities, knowledge, skills_ and _work activities_ for each category. Also, the data will be used to measure if any observed difference (premium) between categories in terms of earnings.

# Table of Contents

1. [Bright Occupations](#Bright-Occupations)
2. [Green Occupations](#Green-Occupations)
3. [STEM Occupations](#STEM-Occupations)
4. [New & Emerging Occupations](#New-&-Emerging-Occupations)
5. [All Categories into One](#All-Categories-into-One)
6. [O\*NET-SOC Occupations and Occupation Categories](#O\*NET-SOC-Occupations-and-Occupation-Categories)
    
    6.1. [Number of Occupations in Each Category](#Number-of-Occupations-in-Each-Category)
    
    6.2. [Number of O\*NET-SOC Occupations in Each Category](#Number-of-O\*NET-SOC-Occupations-in-Each-Category)
    
    6.3. [Number of Occupations with O\*NET data in Each Category](#Number-of-Occupations-with-O\*NET-data-in-Each-Category)
    
    6.4. [Number of Overlapping Occupations in Each Category](#Number-of-Overlapping-Occupations-in-Each-Category)

In [1]:
from IPython.core.display import display, HTML
display(HTML('<style>.container { width:80% !important; }</style>'))
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
import numpy as np
enc = OneHotEncoder()

# Bright Occupations

Data can be obtained from the [link](https://www.onetonline.org/find/bright?b=0&g=Go)

* __Rapid Growth__: projected to grow faster than average (employment increase of 7% or more) over the period 2018-2028
* __Numerous Job Openings__:  projected to have 100,000 or more job openings over the period 2018-2028

* `onetsoccode`: O\*NET detailed occupation code
* `title`: O\*NET detailed occupation title
* `num_job_open`: (dummy) if occupations is considered a __Numerous Job Openings__ occupation
* `rapid_growth`: (dummy) if occupations is considered a __Rapid Growth__ occupation

In [2]:
df_bright = pd.read_csv('csv_files/ONET_occ_categories/All_Bright_Outlook_Occupations.csv')
# encode categorical variable 1: rapid_growth 0: num_job_open
df_encoded = pd.DataFrame(enc.fit_transform(df_bright[['Categories']]).toarray())
df_bright = df_bright.join(df_encoded) # add encoded columns
df_bright.columns = ['onetsoccode', 'title', 'categories',
                     'num_job_open', 'rapid_growth', 'both']
df_bright['num_job_open'] = df_bright['num_job_open'] + df_bright['both']
df_bright['rapid_growth'] = df_bright['rapid_growth'] + df_bright['both']
df_bright.drop(['categories', 'both'], axis=1, inplace=True)
df_bright.head()

Unnamed: 0,onetsoccode,title,num_job_open,rapid_growth
0,13-2011.01,Accountants,1.0,1.0
1,13-2011.00,Accountants and Auditors,1.0,1.0
2,27-2011.00,Actors,0.0,1.0
3,15-2011.00,Actuaries,0.0,1.0
4,29-1199.01,Acupuncturists,0.0,1.0


# Green Occupations

Data can be obtained from the [link](https://www.onetcenter.org/dictionary/22.0/excel/green_occupations.html)

* __Green New & Emerging__: The impact of green economy activities and technologies is sufficient to create the need for unique work and worker requirements, which results in the generation of new occupations.
* __Green Enhanced Skills__: The impact of green economy activities and technologies results in a significant change to the work and worker requirements of an existing O*NET-SOC occupation.
* __Green Increased Demand__: The impact of green economy activities and technologies results in an increase in employment demand, but does not entail significant changes in the work and worker requirements of the occupation.

For detailed information, visit the [report](https://www.onetcenter.org/dl_files/Green.pdf)

* `onetsoccode`: O\*NET detailed occupation code
* `title`: O\*NET detailed occupation title
* `green_enhanced`: (dummy) if occupations is considered a __Green Enhanced Skills__ occupation
* `green_increased`: (dummy) if occupations is considered a __Green Increased Demand__ occupation
* `green_new_emerging`: (dummy) if occupations is considered a __Green New & Emerging__ occupation
* `sectors`: The sector that the occupation is belong to

In [3]:
df_green = pd.read_csv('csv_files/ONET_occ_categories/All_Green_Economy_Sectors.csv')
df_encoded = pd.DataFrame(enc.fit_transform(df_green[['Category']]).toarray())
df_green = df_green.join(df_encoded)
df_green.columns = ['category', 'onetsoccode', 'title', 'sectors',
                    'green_enhanced', 'green_increased', 'green_new_emerging']
df_green.drop('category', axis=1, inplace=True)
df_green.head()

Unnamed: 0,onetsoccode,title,sectors,green_enhanced,green_increased,green_new_emerging
0,17-2011.00,Aerospace Engineers,"Research, Design, and Consulting Services; Tra...",1.0,0.0,0.0
1,45-2011.00,Agricultural Inspectors,Agriculture and Forestry; Governmental and Reg...,0.0,1.0,0.0
2,19-4011.01,Agricultural Technicians,Agriculture and Forestry,1.0,0.0,0.0
3,51-2011.00,"Aircraft Structure, Surfaces, Rigging, and Sys...",Manufacturing,1.0,0.0,0.0
4,23-1022.00,"Arbitrators, Mediators, and Conciliators",Governmental and Regulatory Administration; Re...,1.0,0.0,0.0


# STEM Occupations

Data can be obtained from the [link](https://www.onetonline.org/find/stem?t=0)

* STEM occupations are divided into 5 categories:
    1. Managerial STEM Occupations
    2. Postsecondary Teaching STEM Occupations
    3. Research, Development, Design and Practitioners STEM Occupations
    4. Sales STEM Occupations
    5. Technologists and Technicians STEM Occupations

* `onetsoccode`: O\*NET detailed occupation code
* `title`: O\*NET detailed occupation title
* `managerial`: (dummy) if occupations is considered a __Managerial STEM Occupations__
* `post_teaching`: (dummy) if occupations is considered a __Postsecondary Teaching STEM Occupations__
* `research_dev`: (dummy) if occupations is considered a __Research, Development, Design and Practitioners STEM Occupations__
* `sales`: (dummy) if occupations is considered a __Sales STEM Occupations__
* `tech`: (dummy) if occupations is considered a __Technologists and Technicians STEM Occupations__

In [4]:
df_stem = pd.read_csv('csv_files/ONET_occ_categories/All_STEM_Occupations.csv')
df_encoded = pd.DataFrame(enc.fit_transform(df_stem[['Occupation Types']]).toarray())
df_stem = df_stem.join(df_encoded)
df_stem.columns = ['onetsoccode', 'title', 'categories', 'managerial',
                   'post_teaching', 'research_dev', 'sales' ,'tech']
df_stem.drop('categories', axis=1, inplace=True)
df_stem.head()

Unnamed: 0,onetsoccode,title,managerial,post_teaching,research_dev,sales,tech
0,15-2011.00,Actuaries,0.0,0.0,1.0,0.0,0.0
1,29-1199.01,Acupuncturists,0.0,0.0,1.0,0.0,0.0
2,29-1141.01,Acute Care Nurses,0.0,0.0,1.0,0.0,0.0
3,29-1141.02,Advanced Practice Psychiatric Nurses,0.0,0.0,1.0,0.0,0.0
4,17-3021.00,Aerospace Engineering and Operations Technicians,0.0,0.0,0.0,0.0,1.0


# New & Emerging Occupations

Data can be obtained from the [link](https://www.onetcenter.org/reports/UpdatingTaxonomy2009.html).

* `onetsoccode`: O\*NET detailed occupation code
* `title`: O\*NET detailed occupation title

In [5]:
df_taxonomy2009 = pd.read_excel('csv_files/ONET_occ_categories/UpdatingTaxonomy2009_AppB.xls')
df_taxonomy2009 = df_taxonomy2009.iloc[3:, :]
df_taxonomy2009.dropna(inplace=True)
df_taxonomy2009.columns = ['onetsoccode', 'title']
df_taxonomy2009.title = df_taxonomy2009.title.apply(
                                lambda x: x[:-1] if x.endswith('*') else x)
df_taxonomy2009.set_index('onetsoccode', inplace=True)
df_taxonomy2009.head()

Unnamed: 0_level_0,title
onetsoccode,Unnamed: 1_level_1
11-1011.03,Chief Sustainability Officers
11-2011.01,Green Marketers
11-3051.01,Quality Control Systems Managers
11-3051.02,Geothermal Production Managers
11-3051.03,Biofuels Production Managers


* Since those titles are belong to O\*NET Taxonomy 2009, it is required to update O\*NET Detailed Occupational Codes and Titles according to O\*NET Taxonomy 2010. For this reason, __2009-2010__ crosswalk is used to track the changes in codes and titles as well as the splits in occupational titles.

In [6]:
df_emerging = pd.read_csv('csv_files/Crosswalks/2009_to_2010_Crosswalk.csv')
df_emerging.head()

Unnamed: 0,O*NET-SOC 2009 Code,O*NET-SOC 2009 Title,O*NET-SOC 2010 Code,O*NET-SOC 2010 Title
0,11-1011.00,Chief Executives,11-1011.00,Chief Executives
1,11-1011.03,Chief Sustainability Officers,11-1011.03,Chief Sustainability Officers
2,11-1021.00,General and Operations Managers,11-1021.00,General and Operations Managers
3,11-1031.00,Legislators,11-1031.00,Legislators
4,11-2011.00,Advertising and Promotions Managers,11-2011.00,Advertising and Promotions Managers


In [7]:
df_emerging.columns = ['onetsoccode', 'onetsoc2009title',
                       'onetsoc2010code', 'onetsoc2010title']
df_emerging.set_index('onetsoccode', inplace=True)
df_emerging = df_emerging.join(df_taxonomy2009, how='inner')
df_emerging.reset_index(drop=True, inplace=True)
df_emerging.drop(['onetsoc2009title', 'title'], axis=1, inplace=True)
df_emerging.columns = ['onetsoccode', 'title']
df_emerging.head()

Unnamed: 0,onetsoccode,title
0,11-1011.03,Chief Sustainability Officers
1,11-2011.01,Green Marketers
2,11-3051.01,Quality Control Systems Managers
3,11-3051.02,Geothermal Production Managers
4,11-3051.03,Biofuels Production Managers


__Updating onetsoccodes of the new & emeging occupations for database 22.3__
* The number of occupations increased to 160 due to the split of _Ophthalmic Medical Technologists and Technicians_ in two separate titles
* The following occupational titles had changed the title and\or code in 2010 taxonomy

|ONETOSCCODE 2009|                 TITLE 2009                     |ONETSOCCODE 2010|              TITLE 2010                  |
|----------------|------------------------------------------------|----------------|------------------------------------------|
|      11-3051.04|Biomass Production Managers                     |      11-3051.04|Biomass Power Plant Managers              |
|      25-3099.01|Adaptive Physical Education Specialists         |      25-2059.01|Adapted Physical Education Specialists    |
|      29-2099.01|Electroneurodiagnostic Technologists            |      29-2099.01|Neurodiagnostic Technologists             |
|      15-1099.12|Electronic Commerce Specialists                 |      15-1199.10|Search Marketing Strategists              |
|      33-9099.02|Loss Prevention Specialists                     |      33-9099.02|Retail Loss Prevention Specialists        |
|      15-1099.03|Network Designers                               |      15-1143.00|Computer Network Architects               |
|      29-2099.03|Ophthalmic Medical Technologists and Technicians|      29-2057.00|Ophthalmic Medical Technicians            |
|      29-2099.03|Ophthalmic Medical Technologists and Technicians|      29-2099.05|Ophthalmic Medical Technologists          |
|      15-1081.01|Telecommunications Specialists                  |      15-1143.01|Telecommunications Engineering Specialists|
|      33-9099.01|Transportation Security Officers                |      33-9093.00|Transportation Security Screeners         |

# All Categories into One

__Merge Bright and Green Occupations__

In [8]:
df_final = df_bright.merge(df_green, on='onetsoccode', how='outer', indicator='merge_green')
df_final['bright'] = np.where(df_final.merge_green.isin(['left_only', 'both']), 1, 0)
df_final['green'] = np.where(df_final.merge_green.isin(['right_only', 'both']), 1, 0)
df_final['onet_title'] = np.where(df_final.title_x.isnull(), df_final.title_y, df_final.title_x)
df_final.drop(['title_x', 'title_y', 'merge_green', 'sectors'], axis=1, inplace=True)

__Merge STEM Occupations__

In [9]:
df_final = df_final.merge(df_stem, on='onetsoccode', how='outer', indicator='merge_stem')
df_final['stem'] = np.where(df_final.merge_stem.isin(['right_only', 'both']), 1, 0)
df_final['onet_title'] = np.where(df_final.onet_title.isnull(), df_final.title, df_final.onet_title)
df_final.drop(['title', 'merge_stem'], axis=1, inplace=True)

__Merge New & Emerging Occupations__

In [10]:
df_final = df_final.merge(df_emerging, on='onetsoccode', how='outer', indicator='merge_emerging')
df_final['emerging'] = np.where(df_final.merge_emerging.isin(['right_only', 'both']), 1, 0)
df_final['onet_title'] = np.where(df_final.onet_title.isnull(), df_final.title, df_final.onet_title)
df_final.drop(['title', 'merge_emerging'], axis=1, inplace=True)

In [11]:
ordered = ['onetsoccode', 'onet_title', 'bright', 'green', 'stem', 'emerging']
ordered.extend([column for column in df_final.columns if column not in ordered])
df_final = df_final[ordered]
df_final.sort_values('onetsoccode', inplace=True)
df_final.reset_index(drop=True, inplace=True)
df_final.fillna(0, inplace=True)
df_final.head()

Unnamed: 0,onetsoccode,onet_title,bright,green,stem,emerging,num_job_open,rapid_growth,green_enhanced,green_increased,green_new_emerging,managerial,post_teaching,research_dev,sales,tech
0,11-1011.03,Chief Sustainability Officers,0.0,1.0,0.0,1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,11-1021.00,General and Operations Managers,1.0,1.0,0.0,0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,11-2011.01,Green Marketers,0.0,1.0,0.0,1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,11-2021.00,Marketing Managers,1.0,1.0,0.0,0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,11-2031.00,Public Relations and Fundraising Managers,1.0,0.0,0.0,0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [12]:
df_final.to_csv('onet_occupational_categories.csv', index=False)

# O\*NET-SOC Occupations and Occupation Categories

## Number of Occupations in Each Category

In [13]:
print('Number of Bright Occupations: {}'.format(
        df_final[df_final.bright == 1].shape[0]))
print('Number of Green Occupations: {}'.format(
        df_final[df_final.green == 1].shape[0]))
print('Number of STEM Occupations: {}'.format(
        df_final[df_final.stem == 1].shape[0]))
print('Number of New & Emerging Occupations: {}'.format(
        df_final[df_final.emerging == 1].shape[0]))

Number of Bright Occupations: 434
Number of Green Occupations: 204
Number of STEM Occupations: 308
Number of New & Emerging Occupations: 160


## Number of O\*NET-SOC Occupations in Each Category

In [14]:
df_final['if_bls'] = df_final.onetsoccode.apply(lambda x: x[-2:])
print('Number of BLS-SOC Occupations in Bright category: {}'.format(
        df_final[(df_final.bright == 1) & (df_final.if_bls == '00')].shape[0]))
print('Number of BLS-SOC Occupations in Green category: {}'.format(
        df_final[(df_final.green == 1) & (df_final.if_bls == '00')].shape[0]))
print('Number of BLS-SOC Occupations in STEM category: {}'.format(
        df_final[(df_final.stem == 1) & (df_final.if_bls == '00')].shape[0]))
print('Number of BLS-SOC Occupations in Emerging category: {}'.format(
        df_final[(df_final.emerging == 1) & (df_final.if_bls == '00')].shape[0]))

Number of BLS-SOC Occupations in Bright category: 330
Number of BLS-SOC Occupations in Green category: 104
Number of BLS-SOC Occupations in STEM category: 184
Number of BLS-SOC Occupations in Emerging category: 11


## Number of Occupations with O\*NET data in Each Category

In [15]:
df_onet = pd.read_csv('onet_data.csv')
df_onet = df_onet['onetsoccode']
df_final = df_final.merge(df_onet, on='onetsoccode', indicator='onet_data')
df_final.onet_data = np.where(df_final.onet_data.isin(['left_only', 'both']), 
                                 1, 0)

In [16]:
print('Number of Bright Occupations with O*NET data: {}'.format(
        df_final[(df_final.bright == 1) & (df_final.onet_data == 1)].shape[0]))
print('Number of Green Occupations with O*NET data: {}'.format(
        df_final[(df_final.green == 1) & (df_final.onet_data == 1)].shape[0]))
print('Number of STEM Occupations with O*NET data: {}'.format(
        df_final[(df_final.stem == 1) & (df_final.onet_data == 1)].shape[0]))
print('Number of Emerging Occupations with O*NET data: {}'.format(
        df_final[(df_final.emerging == 1) & (df_final.onet_data == 1)].shape[0]))

Number of Bright Occupations with O*NET data: 388
Number of Green Occupations with O*NET data: 199
Number of STEM Occupations with O*NET data: 276
Number of Emerging Occupations with O*NET data: 154


## Number of Overlapping Occupations in Each Category

In [17]:
print('Number of Bright and Green Occupations: {}'.format(
        df_final[(df_final.bright == 1) & (df_final.green == 1)].shape[0]))
print('Number of Bright and STEM Occupations: {}'.format(
        df_final[(df_final.bright == 1) & (df_final.stem == 1)].shape[0]))
print('Number of Bright and Emerging Occupations: {}'.format(
        df_final[(df_final.bright == 1) & (df_final.emerging == 1)].shape[0]))
print('Number of Green and STEM Occupations: {}'.format(
        df_final[(df_final.green == 1) & (df_final.stem == 1)].shape[0]))
print('Number of Green and Emerging Occupations: {}'.format(
        df_final[(df_final.green == 1) & (df_final.emerging == 1)].shape[0]))
print('Number of STEM and Emerging Occupations: {}'.format(
        df_final[(df_final.stem == 1) & (df_final.emerging == 1)].shape[0]))

Number of Bright and Green Occupations: 63
Number of Bright and STEM Occupations: 148
Number of Bright and Emerging Occupations: 74
Number of Green and STEM Occupations: 81
Number of Green and Emerging Occupations: 73
Number of STEM and Emerging Occupations: 98
