Lightcast is a company that provides data about occupations, skills, schools and so on. I have created a notebook using my client credentials to access this data. Feel free to use it, or get access for your own use in this link: https://lightcast.io/open-skills/access

I am using the 'Skills' dataset. Other datasets available can be found in this link: https://api.lightcast.io/datasets

## Creating session token

_Note: The session token only lasts for an hour_

For more information, visit: https://api.lightcast.io/apis/skills#overview

In [1]:
import requests
import datetime
from time import time
import pandas as pd

In [2]:
CLIENT_ID = "zr73n04dvvfeugya"
CLIENT_SECRET = "T45GwOsv"
SCOPE = "emsi_open"

In [3]:
url = "https://auth.emsicloud.com/connect/token"
payload = f"client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&grant_type=client_credentials&scope={SCOPE}"
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
response = requests.request("POST", url, data=payload, headers=headers)
SESSION_TOKEN = response.json()['access_token']
SESSION_START_TIME = datetime.datetime.now()
print(f"Session started at {SESSION_START_TIME.strftime('%H:%M:%S')}")
print(f"Session will end at {(SESSION_START_TIME + datetime.timedelta(hours=1)).strftime('%H:%M:%S')}")

Session started at 22:00:59
Session will end at 23:00:59


In [4]:
url = "https://emsiservices.com/skills/status"
headers = {'Authorization': f'Bearer {SESSION_TOKEN}'}
response = requests.request("GET", url, headers=headers)
print(response.json()['data']['message']) # Should print "Service is healthy"

Service is healthy


## Make Requests from Lightcast API

In [5]:
# INPUT YOUR QUERY HERE
queries = ['data', 'analysis', 'machine learning', 'ML', 'statistic']

In [6]:
skills = {}
start_time = time()
for query in queries:
    url = "https://emsiservices.com/skills/versions/latest/skills"
    querystring = {"q":f"{query}","typeIds":"ST1,ST2","fields":"id,name,type,infoUrl"}#,"limit":"100"}
    headers = {'Authorization': f'Bearer {SESSION_TOKEN}'}
    response = requests.request("GET", url, headers=headers, params=querystring)
    data = response.json()
    skills[query] = sorted([meta_data['name'] for meta_data in data['data']])
print(f"Time Elapsed: {time() - start_time} seconds")

Time Elapsed: 6.996306896209717 seconds


In [7]:
# Reformatting result into pandas dataframe
skill_df = []
for query, skill_list in skills.items():
    for skill in skill_list:
        skill_df.append((query, skill))
skill_df = pd.DataFrame(skill_df, columns = ['query', 'skill'])
skill_df

Unnamed: 0,query,skill
0,data,ADOdb Database Abstraction Library For PHP
1,data,ATLAS.ti (Qualitative Data Analysis Software)
2,data,Abstract Data Types
3,data,ActiveX Data Objects
4,data,Adobe LiveCycle Data Services (Software)
...,...,...
1085,statistic,Statistical Time Division Multiplexing
1086,statistic,Statistics
1087,statistic,Tax Statistics
1088,statistic,Test Statistics


In [8]:
# Save dataframe to project repo
repo_path = "C:/Users/ernes/Git/dsa3101-2220-12-ds/"
repo_data_path = repo_path + "Data/skills/"
file_name = "lightcast_skills_queries-"
for query in queries:
    file_name += (query + '_')
file_name = file_name[:-1] + ".csv"
print(f"Saving to: {repo_data_path}\nFile: {file_name}\n\nRun next cell to save")

Saving to: C:/Users/ernes/Git/dsa3101-2220-12-ds/Data/skills/
File: lightcast_skills_queries-data_analysis_machine learning_ML_statistic.csv

Run next cell to save


In [9]:
skill_df.to_csv(repo_data_path + file_name, index = False)