# Identify the Most Important Items in O\*NET Data

* We will identify the most important items in O\*NET Data (Abilities, Knowledge, Skills, and Work Activities)
* The motivation is that if an item is important then it should get the highest score regardless of occupation or occupational category. For this reason, we will take mean of O\*NET data and sort mean values in descending order in order to identify the most important items.

In [1]:
from IPython.core.display import display, HTML
display(HTML('<style>.container { width:80% !important; }</style>'))
import pandas as pd
import numpy as np

# Read O\*NET Data

In [2]:
df_onet_data = pd.read_csv('onet_data.csv')
display(df_onet_data.head())
items = df_onet_data.iloc[:, 2:].columns
means = [df_onet_data[item].mean() for item in items]

Unnamed: 0,onetsoccode,title,ability_1,ability_2,ability_3,ability_4,ability_5,ability_6,ability_7,ability_8,...,skill_26,skill_27,skill_28,skill_29,skill_30,skill_31,skill_32,skill_33,skill_34,skill_35
0,11-1011.00,Chief Executives,1.0,2.12,3.5,1.75,4.12,1.75,1.0,1.0,...,1.88,3.12,4.25,4.38,4.12,4.12,1.75,4.0,1.0,4.0
1,11-1011.03,Chief Sustainability Officers,1.0,1.88,3.38,1.75,4.0,2.0,1.0,1.0,...,1.75,3.25,3.75,4.0,3.62,3.62,1.62,3.38,1.12,3.88
2,11-1021.00,General and Operations Managers,2.0,2.12,3.0,1.75,3.75,2.0,1.0,1.62,...,1.88,3.25,4.0,4.0,3.0,3.0,1.88,3.75,2.0,3.25
3,11-2011.00,Advertising and Promotions Managers,1.88,1.88,3.38,1.5,3.88,1.88,1.0,1.0,...,1.5,3.12,4.0,4.0,3.12,3.0,1.62,3.88,1.12,3.75
4,11-2021.00,Marketing Managers,1.12,1.88,3.25,1.0,3.88,1.75,1.0,1.25,...,1.75,3.12,3.88,3.88,3.25,3.5,1.75,3.5,1.0,3.25


# Read Elements

In [3]:
df_elements = pd.read_csv('elements.csv')
display(df_elements.head())
elements = df_elements.elements.values

Unnamed: 0,elements,encodings
0,Arm-Hand Steadiness,ability_1
1,Auditory Attention,ability_2
2,Category Flexibility,ability_3
3,Control Precision,ability_4
4,Deductive Reasoning,ability_5


# Top 10 Most Important Element in O\*NET

In [4]:
df_scores = pd.DataFrame({'scores' : means, 'elements' : elements,})
df_scores.sort_values('scores', ascending=False, inplace=True)
df_scores.reset_index(drop=True, inplace=True)
df_scores.head(10)

Unnamed: 0,scores,elements
0,4.202453,Getting Information
1,3.961346,"Communicating with Supervisors, Peers, or Subo..."
2,3.930062,Making Decisions and Solving Problems
3,3.80588,"Identifying Objects, Actions, and Events"
4,3.734886,Updating and Using Relevant Knowledge
5,3.733592,Oral Comprehension
6,3.696822,Oral Expression
7,3.672588,Establishing and Maintaining Interpersonal Rel...
8,3.641439,"Organizing, Planning, and Prioritizing Work"
9,3.636625,English Language


# Read Occupation Categories

In [5]:
df_cat= pd.read_csv('onet_occupational_categories.csv')
df_cat = df_cat[['onetsoccode', 'bright', 'green', 'stem', 'emerging']]
df_cat.head()

Unnamed: 0,onetsoccode,bright,green,stem,emerging
0,11-1011.03,0.0,1.0,0.0,1
1,11-1021.00,1.0,1.0,0.0,0
2,11-2011.01,0.0,1.0,0.0,1
3,11-2021.00,1.0,1.0,0.0,0
4,11-2031.00,1.0,0.0,0.0,0


In [6]:
temp = df_onet_data.merge(df_cat, on='onetsoccode', how='left')
df_bright = temp[temp.bright == 1]
df_green = temp[temp.green == 1]
df_stem = temp[temp.stem == 1]
df_emerging = temp[temp.emerging == 1]

In [7]:
def get_top10(df):
    items = df.iloc[:, 2:-4].columns
    scores = [df[item].mean() for item in items]
    temp =  pd.DataFrame({'scores' : scores, 'elements' : elements})
    temp.sort_values('scores', ascending=False, inplace=True)
    temp.reset_index(drop=True, inplace=True)
    return temp

# Top 10 Elements for Bright Occupations

In [8]:
df_bright_important = get_top10(df_bright)
df_bright_important.head(10)

Unnamed: 0,scores,elements
0,4.259485,Getting Information
1,4.010103,"Communicating with Supervisors, Peers, or Subo..."
2,4.001418,Making Decisions and Solving Problems
3,3.864691,Updating and Using Relevant Knowledge
4,3.830077,Oral Comprehension
5,3.827242,"Identifying Objects, Actions, and Events"
6,3.809433,Establishing and Maintaining Interpersonal Rel...
7,3.804278,Oral Expression
8,3.783686,English Language
9,3.721959,"Organizing, Planning, and Prioritizing Work"


# Top 10 Elements for Green Occupations

In [9]:
df_green_important = get_top10(df_green)
df_green_important.head(10)

Unnamed: 0,scores,elements
0,4.246834,Getting Information
1,4.076533,"Communicating with Supervisors, Peers, or Subo..."
2,4.045528,Making Decisions and Solving Problems
3,3.871558,"Identifying Objects, Actions, and Events"
4,3.799749,Updating and Using Relevant Knowledge
5,3.732161,Problem Sensitivity
6,3.73196,Oral Comprehension
7,3.714422,"Monitor Processes, Materials, or Surroundings"
8,3.693166,Interacting With Computers
9,3.692161,"Organizing, Planning, and Prioritizing Work"


# Top 10 Elements for STEM Occupations

In [10]:
df_stem_important = get_top10(df_stem)
df_stem_important.head(10)

Unnamed: 0,scores,elements
0,4.405725,Getting Information
1,4.224529,Making Decisions and Solving Problems
2,4.202138,Interacting With Computers
3,4.177609,Updating and Using Relevant Knowledge
4,4.112754,"Communicating with Supervisors, Peers, or Subo..."
5,4.031014,Documenting/Recording Information
6,3.983043,"Identifying Objects, Actions, and Events"
7,3.974493,Analyzing Data or Information
8,3.969094,Oral Comprehension
9,3.959058,Processing Information


# Top 10 Elements for New & Emerging Occupations

In [11]:
df_emerging_important = get_top10(df_emerging)
df_emerging_important.head(10)

Unnamed: 0,scores,elements
0,4.388052,Getting Information
1,4.268442,Making Decisions and Solving Problems
2,4.180065,"Communicating with Supervisors, Peers, or Subo..."
3,4.164091,Interacting With Computers
4,4.110455,Updating and Using Relevant Knowledge
5,3.969805,Documenting/Recording Information
6,3.955,"Identifying Objects, Actions, and Events"
7,3.953052,Analyzing Data or Information
8,3.919286,Processing Information
9,3.917143,Oral Comprehension


# Summary

* The methodology we used for identifying the most important elements is to take mean of an element by number of occupations. For example, `Getting Information` is the most important element in the O\*NET dataset. First, sum ratings of all occupations for `Getting Information`, then divide the sum by the number of occupations. In other words, we take simple average for each element.
* After calculating mean for all elements, we sorted them descending to find the most important elements.
* The lists of most important elements are also very similar for occupational categories. The main reason is that we use mean scores which means that if an element is the most important one, then it has to have the highest ratings for almost all occupational titles. This is case with `Getting Information`. However, the list changes as we go down in the list.