# Skills to Title Mapping
***
The purpose of this walkthrough is to help demonstrate the relationship between skills and a job title and how, using a set of skills, you can find the job titles that seem to align most closely with those skills.

In [1]:
import json
import pandas as pd

# first, connect to the APIs we want to use
from EmsiApiPy import SkillsClassificationConnection, UnitedStatesPostingsConnection
skills_connection = SkillsClassificationConnection()
postings_connection = UnitedStatesPostingsConnection()

def pprint(data):
    print(json.dumps(data, indent = 2))

In [2]:
# here is sample text from one of Emsi's job postings
text = """Full Stack Web Developer

If you’re ready to join a high-functioning team of full stack devs working closely with product managers, data engineers, and designers to create interfaces and visualizations that make nuanced data intelligible, we’d love to hear from you.
Candidates must have…

    Experience with the front-end basics: HTML5, CSS3, and JS
    Experience using a version control system
    Familiarity with MV* frameworks, e.g. React, Ember, Angular, Vue
    Familiarity with server-side languages like PHP, Python, or Node

Great candidates also have…

    Experience with a particular JS MV* framework (we happen to use React)
    Experience working with databases
    Experience with AWS
    Familiarity with microservice architecture
    Familiarity with modern CSS practices, e.g. LESS, SASS, CSS-in-JS

People who succeed in this position are…

    Team oriented and ready to work closely with other developers
    Determined to produce clean, well-tested code
    Comfortable with working in rapid development cycles
    Skilled oral and written communicators
    Enthusiastic for learning and pushing the envelope

Emsi is an equal opportunity employer."""

In [3]:
# extract the skills from the text and print out the names
response = skills_connection.post_extract(text)
skill_names = [record["skill"]["name"] for record in response["data"]]
pprint(skill_names)

[
  "Server-Side",
  "Enthusiasm",
  "React.js",
  "Cascading Style Sheets (CSS)",
  "PHP (Scripting Language)",
  "Python (Programming Language)",
  "Full Stack Software Engineering",
  "Angular (Web Framework)",
  "Version Control",
  "Vue.js",
  "Microservices",
  "Ember.Js",
  "Amazon Web Services",
  "HTML5",
  "Team Oriented",
  "Front End (Software Engineering)",
  "Node.Js",
  "JavaScript (Programming Language)"
]


In [4]:
# next, let's make a query to see if we have some data on these skills being all together in the data
payload = {
    "filter": {
        "when": {
            "start": "2019-10",
            "end": "2020-09"  # last 12 months or so, from when this was written
        },
        "skills_name": {
            "include": skill_names,
            "include_op": "and"
        }
    }
}
data = postings_connection.post_totals(payload)
pprint(data)

{
  "unique_postings": 0
}


In [5]:
# a simple way to get more options would be to change the query to an "or" instead of "and" requirement
payload = {
    "filter": {
        "when": {
            "start": "2019-10",
            "end": "2020-09"  # last 12 months or so, from when this was written
        },
        "skills_name": {
            "include": skill_names,
            "include_op": "or"  # this is what changed from apove
        }
    }
}
data = postings_connection.post_totals(payload)
pprint(data)

{
  "unique_postings": 4017985
}


In [6]:
# so now we know that we would get some results, so let's see what the top titles are from this query
payload = {
    "filter": {
        "when": {
            "start": "2019-10",
            "end": "2020-09"  # last 12 months or so, from when this was written
        },
        "skills_name": {
            "include": skill_names,
            "include_op": "or"  # this is what changed from apove
        }
    },
    "rank": {
        "by": "unique_postings",
        "limit": 10
    }
}
querystring = {"title_version": "emsi"}
df = postings_connection.post_rankings_df("title_name", payload = payload, querystring = querystring)
df.head(10)

Unnamed: 0,title_name,unique_postings
0,Software Engineers,145043
1,Unclassified,117081
2,Java Developers,46141
3,Software Developers,42488
4,DevOps Engineers,36388
5,Sales Associates,32681
6,Full Stack Developers,30462
7,Delivery Drivers,28054
8,Data Scientists,27369
9,.NET Developers,26527


In [7]:
# first, let's remove the "Unclassified" title from the list. That's not a particularly helpful result
payload = {
    "filter": {
        "when": {
            "start": "2019-10",
            "end": "2020-09"  # last 12 months or so, from when this was written
        },
        "skills_name": {
            "include": skill_names,
            "include_op": "or"  # this is what changed from apove
        }
    },
    "rank": {
        "by": "unique_postings",
        "limit": 10,
        "exclude": ["Unclassified"]
    }
}
querystring = {"title_version": "emsi"}
df = postings_connection.post_rankings_df("title_name", payload = payload, querystring = querystring)
df.head(10)

Unnamed: 0,title_name,unique_postings
0,Software Engineers,145043
1,Java Developers,46141
2,Software Developers,42488
3,DevOps Engineers,36388
4,Sales Associates,32681
5,Full Stack Developers,30462
6,Delivery Drivers,28054
7,Data Scientists,27369
8,.NET Developers,26527
9,Data Engineers,25580


In [8]:
# another way to approach this would be to rank by a factor called "significance"
# significance measures how unique the set of skills are to the title in question
# this query is more computationally expensive, so don't be surprised if it takes a little longer to run

payload = {
    "filter": {
        "when": {
            "start": "2019-10",
            "end": "2020-09"
        },
        "skills_name": {
            "include": skill_names,
            "include_op": "or"
        }
    },
    "rank": {
        "by": "significance",
        "limit": 10,
        "exclude": ["Unclassified"]
    }
}
querystring = {"title_version": "emsi"}
df = postings_connection.post_rankings_df("title_name", payload = payload, querystring = querystring)
df.head(10)

Unnamed: 0,title_name,significance,unique_postings
0,Software Engineers,0.243946,145043
1,Full Stack Developers,0.08529,30462
2,Java Developers,0.081293,46141
3,DevOps Engineers,0.079336,36388
4,Software Developers,0.069433,42488
5,Data Scientists,0.056893,27369
6,Data Engineers,0.051159,25580
7,.NET Developers,0.04848,26527
8,Retail Staff,0.046454,17686
9,Front End Engineers,0.046138,17008


***
As you can see, `Full Stack Developers`, which is the title that's included in the posting text above, has now moved up to the second rank. The ever-popular title of "Software Engineer" is still at the top, which is not a surprise, given the number of postings.