Lightcast is a company that provides data about occupations, skills, schools and so on. I have created a notebook using my client credentials to access this data. Feel free to use it, or get access for your own use in this link: https://lightcast.io/open-skills/access

I am using the 'Skills' dataset. Other datasets available can be found in this link: https://api.lightcast.io/datasets

# How to use:
Input a few strings into the `queries` list below. Then, just run all cells. 

This notebook will query for skills related to those strings using the Lightcast API. Every skill also has a skill description tagged to it. The resultant table will be stored in `repo/Backend/Data/skills/`

In [1]:
# INPUT YOUR QUERY HERE
queries = ['data', 'analysis', 'machine learning', 'ML', 'statistic']

## Creating session token

_Note: The session token only lasts for an hour_

For more information, visit: https://docs.lightcast.dev/apis/skills

In [2]:
import requests
import datetime
from time import time
import pandas as pd
import numpy as np

In [3]:
CLIENT_ID = "zr73n04dvvfeugya"
CLIENT_SECRET = "T45GwOsv"
SCOPE = "emsi_open"

In [4]:
url = "https://auth.emsicloud.com/connect/token"
payload = f"client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&grant_type=client_credentials&scope={SCOPE}"
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
response = requests.request("POST", url, data=payload, headers=headers)
SESSION_TOKEN = response.json()['access_token']
SESSION_START_TIME = datetime.datetime.now()
print(f"Session started at {SESSION_START_TIME.strftime('%H:%M:%S')}")
print(f"Session will end at {(SESSION_START_TIME + datetime.timedelta(hours=1)).strftime('%H:%M:%S')}")

Session started at 03:43:46
Session will end at 04:43:46


In [5]:
print(SESSION_TOKEN)

eyJhbGciOiJSUzI1NiIsImtpZCI6IjNDNjZCRjIzMjBGNkY4RDQ2QzJERDhCMjI0MEVGMTFENTZEQkY3MUYiLCJ0eXAiOiJKV1QiLCJ4NXQiOiJQR2FfSXlEMi1OUnNMZGl5SkE3eEhWYmI5eDgifQ.eyJuYmYiOjE2ODE0MTUwMjYsImV4cCI6MTY4MTQxODYyNiwiaXNzIjoiaHR0cHM6Ly9hdXRoLmVtc2ljbG91ZC5jb20iLCJhdWQiOlsiZW1zaV9vcGVuIiwiaHR0cHM6Ly9hdXRoLmVtc2ljbG91ZC5jb20vcmVzb3VyY2VzIl0sImNsaWVudF9pZCI6InpyNzNuMDRkdnZmZXVneWEiLCJlbWFpbCI6ImVybmVzdGxpdTY0QGdtYWlsLmNvbSIsImNvbXBhbnkiOiJlIiwibmFtZSI6ImUiLCJpYXQiOjE2ODE0MTUwMjYsInNjb3BlIjpbImVtc2lfb3BlbiJdfQ.CERvUgjDweRG5fBtfXbRCGPy0AWRNKNVG-VfuqEHDJO8nFq6pFkg7iIzMBd7j5-y3RHWOpSAAXpqTdkHbXOsK6Rg0ly2d8IecvBO8EmNxsI56wgLWR5q2QczF-RMUrwzOKDSwM3MO7ElFux5ijl3T59zTZa05ll7bPb5qgK8AkMfHnlyg3jsDoVngcys8aYE70n_fE6AHERVO1hJ6AvkxNZCTKxHl2qYpUo_ZA_w5IyOWm5nHzWWn5y5qV4mY0Mxf7Zaa7fuffrkK5DWzj79mlmwWb1B0lsN2wbVihKONFcn7hmP986Ea-Pa-cY3IUkHcD2umxsIewFPfYP6sxIJgg


In [6]:
url = "https://emsiservices.com/skills/status"
headers = {'Authorization': f'Bearer {SESSION_TOKEN}'}
response = requests.request("GET", url, headers=headers)
print(response.json()['data']['message']) # Should print "Service is healthy"

Service is healthy


## Make Requests from Lightcast API

In [7]:
skills = []
start_time = time()
for query in queries:
    url = "https://emsiservices.com/skills/versions/latest/skills"
    querystring = {"q":f"{query}","typeIds":"ST1,ST2","fields":"id,name,type,infoUrl,description"}#,"limit":"100"}
    headers = {'Authorization': f'Bearer {SESSION_TOKEN}'}
    response = requests.request("GET", url, headers=headers, params=querystring)
    data = response.json()
    skills.extend([[query, meta_data['name'], meta_data['description']] for meta_data in data['data']])
skills = pd.DataFrame(skills, columns = ['Query', 'Skill', 'Skill_Description'])
print(f"Time Elapsed: {time() - start_time} seconds")

Time Elapsed: 5.772021532058716 seconds


In [8]:
# Cleaning Dataset
skills = skills.dropna()
skills = skills.loc[skills.apply(lambda x: x.Skill.lower() in x.Skill_Description.lower(), axis=1)].reset_index(drop=True)

In [9]:
# Save dataframe to project repo
repo_data_path = "../../Data/skills/"
file_name = "lightcast_skills_queries-"
for query in queries:
    file_name += (query + '_')
file_name = file_name[:-1] + ".csv"
print(f"Saving to: {repo_data_path}\nFile: {file_name}\n\nRun next cell to save")

Saving to: ../../Data/skills/
File: lightcast_skills_queries-data_analysis_machine learning_ML_statistic.csv

Run next cell to save


In [10]:
skills.to_csv(repo_data_path + file_name, index = False)