# Skill Type Flags for a Title
***
We can classify top skills for a role into one of three skill types:
- _Defining Skills_ - Those which form the core skills for the role - they describe what skills make this role what it is (e.g. "Machine Learning" for a Data Scientist role).
- _Distinguishing Skills_ - Those skills which are less frequently requested but are still predictive of the occupation. Distinguishing skills tend to be niche skills associated with a specialty area within an occupation (e.g. marketing analytics required for a specialized subset of Data Scientist jobs).
- _Necessary Skills_ - those skills which are requested most frequently within an role but are less specific to any one role. Necessary skills are often those skills required to perform many job functions shared across several occupations. They are important, but do not define or distinguish a role from others (e.g. Java or SQL for Data Scientist roles).
***
The algorithm used to classify skill type for a role is as follows:
- _Defining Skill_ - has a posting frequency count of at least 7% and is ranked in the top 75 of predictive skills for that role.
- _Distinguishing Skill_ - has a posting frequency count between 3% and 7% and is ranked in the top 75 of predictive skills for that role.
- _Necessary Skill_ - Has a posting frequency count of at least 7% and is ranked below the top 75 of predictive skills for that role.

Top predictive skills are defined via [Significant Terms Aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html)

In [1]:
# import the necessary packages
import pandas as pd
from EmsiApiPy import UnitedStatesPostingsConnection

CONN = UnitedStatesPostingsConnection()

In [2]:
title = "Data Scientists"

payload = {
    "filter": {
        "when": {
            "start": "2021-01",
            "end": "2021-12"
        },
        "title_name": [title]
    },
    "rank": {
        "by": "significance",
        "limit": 100
    }
}

response = CONN.post_rankings(
    facet = "hard_skills_name",
    payload = payload
)

In [3]:
df = pd.DataFrame(response["data"]["ranking"]["buckets"])
df["percent"] = df["unique_postings"] / response["data"]["totals"]["unique_postings"]
df.head()

Unnamed: 0,name,significance,unique_postings,percent
0,Data Science,180.332474,30729,0.9954
1,Machine Learning,74.715453,21814,0.706618
2,R (Programming Language),71.154048,17765,0.575459
3,Predictive Modeling,64.048414,8232,0.266658
4,Statistics,62.895397,15325,0.496421


In [4]:
# percent > 7% and is in the top 75 of the index
defining_skills_df = df.loc[(df["percent"] > 0.07) & (df.index < 75)]
defining_skills_df.head()

Unnamed: 0,name,significance,unique_postings,percent
0,Data Science,180.332474,30729,0.9954
1,Machine Learning,74.715453,21814,0.706618
2,R (Programming Language),71.154048,17765,0.575459
3,Predictive Modeling,64.048414,8232,0.266658
4,Statistics,62.895397,15325,0.496421


In [5]:
# 3% < percent < 7% and is in the top 75 of the index
distinguishing_skills = df.loc[(df["percent"] >= 0.03) & (df["percent"] <= 0.07) & (df.index < 75)]
distinguishing_skills.head()

Unnamed: 0,name,significance,unique_postings,percent
13,Random Forest Algorithm,30.825958,1924,0.062324
18,Exploratory Data Analysis,20.442253,1717,0.055619
20,NumPy,17.919639,2154,0.069774
23,Keras (Neural Network Library),16.536391,1875,0.060737
25,Support Vector Machine,14.928267,1095,0.03547


In [6]:
# percent > 7% and not in top 75
necessary_skills = df.loc[(df["percent"] >= 0.07) & (df.index > 75)]
necessary_skills.head()

Unnamed: 0,name,significance,unique_postings,percent
84,Physics,3.944666,3770,0.122121
87,Economics,3.733272,5205,0.168605
95,Data Engineering,3.292457,2995,0.097017
96,MATLAB,3.252394,2431,0.078747
