# Lightcast Global Smart Dataset - Client

This notebook aims to show the use of Lightcast Global Smart Dataset python cliente.

The API allows you to quickly access data from Lightcast's database to obtain content, trends and projections regarding the labour market.

The Global Smart Dataset are available:
- RESTFul @ https://solutions-api.lightcast.io for each software developer, the data are in realtime
- Snowflake, to support the creation of marvellous BI dashboard or to access it with STATA, R, SAS (and of course Python) for each data analyst and data scientist. The data are updated monthly
- Python Client, to support the integration of Lightcast data in your data science code, the data are in realtime

For documentation is open at:
https://solutions-api.lightcast.io/docs


<a href="https://githubtocolab.com/Lightcast-Global-Innovation/global-smart-dataset/blob/main/notebooks/Lightcast_Global_Smart_Dataset_Client.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/></a>



# Setup

To use the smart dataset API and the lightcast-smart-dataset client you need a Username and a Password. Please contact our sales team @ sales-europe@lightcast.io 



In [8]:
USERNAME = "****"
PSWD = "****"

To install the client just use pip

In [1]:
!pip install lightcast-smart-dataset  

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting lightcast-smart-dataset
  Downloading lightcast_smart_dataset-0.1.31-py3-none-any.whl (10 kB)
Collecting xlsxwriter
  Downloading XlsxWriter-3.0.3-py3-none-any.whl (149 kB)
[K     |████████████████████████████████| 149 kB 3.7 MB/s 
Installing collected packages: xlsxwriter, lightcast-smart-dataset
Successfully installed lightcast-smart-dataset-0.1.31 xlsxwriter-3.0.3


In [6]:
from lightcast_client.client import LightcastSmartDataset

In [9]:
client = LightcastSmartDataset(USERNAME, PSWD)

# Occupation Insight API - UK


To obtain the list of occupation available you use the taxonomy client

In [11]:
client.taxonomy().getSocLevel4()[0]

b'{"data":[{"id":"1115","name":"Chief executives and senior officials","description":"This unit group includes those who head large enterprises and organisations. They plan, direct and co-ordinate, with directors and managers, the resources necessary for the various functions and specialist activities of these enterprises and organisations. The chief executives of hospitals will be classified in this unit group. Senior officials in national government direct the operations of government departments. Senior officials in local government participate in the implementation of local government policies and ensure that legal, statutory and other provisions concerning the running of a local authority are observed. Senior officials of special interest organisations ensure that legal, statutory and other regulations concerning the running of trade associations, employers\xe2\x80\x99 associations, learned societies, trades unions, charitable organisations and similar bodies are observed. Chief e

{'description': 'This unit group includes those who head large enterprises and organisations. They plan, direct and co-ordinate, with directors and managers, the resources necessary for the various functions and specialist activities of these enterprises and organisations. The chief executives of hospitals will be classified in this unit group. Senior officials in national government direct the operations of government departments. Senior officials in local government participate in the implementation of local government policies and ensure that legal, statutory and other provisions concerning the running of a local authority are observed. Senior officials of special interest organisations ensure that legal, statutory and other regulations concerning the running of trade associations, employers’ associations, learned societies, trades unions, charitable organisations and similar bodies are observed. Chief executives and senior officials also act as representatives of the organisations 

Let's convert the results in a Pandas DataFrame

In [14]:
import pandas as pd

soc4_occupation = pd.DataFrame.from_dict(client.taxonomy().getSocLevel4())

b'{"data":[{"id":"1115","name":"Chief executives and senior officials","description":"This unit group includes those who head large enterprises and organisations. They plan, direct and co-ordinate, with directors and managers, the resources necessary for the various functions and specialist activities of these enterprises and organisations. The chief executives of hospitals will be classified in this unit group. Senior officials in national government direct the operations of government departments. Senior officials in local government participate in the implementation of local government policies and ensure that legal, statutory and other provisions concerning the running of a local authority are observed. Senior officials of special interest organisations ensure that legal, statutory and other regulations concerning the running of trade associations, employers\xe2\x80\x99 associations, learned societies, trades unions, charitable organisations and similar bodies are observed. Chief e

In [15]:
soc4_occupation

Unnamed: 0,id,name,description
0,1115,Chief executives and senior officials,This unit group includes those who head large ...
1,1116,Elected officers and representatives,Elected representatives in national government...
2,1121,Production managers and directors in manufactu...,Production managers and directors in manufactu...
3,1122,Production managers and directors in construction,Production managers and directors in construct...
4,1123,Production managers and directors in mining an...,"Production managers and directors in mining, e..."
...,...,...,...
364,9272,Kitchen and catering assistants,Workers in this unit group assist in the prepa...
365,9273,Waiters and waitresses,Waiters and waitresses serve food and beverage...
366,9274,Bar staff,"Bar staff prepare, mix and serve alcoholic and..."
367,9275,Leisure and theme park attendants,Leisure and theme park attendants monitor the ...


In [16]:
soc4_occupation[soc4_occupation["name"].str.contains("software")]

Unnamed: 0,id,name,description
51,2136,Programmers and software development professio...,Programmers and software development professio...


Let's check the area avaialble in UK

In [20]:
client.taxonomy().getUkNuts3()[0]

b'{"data":[{"id":"UKC11","name":"Hartlepool and Stockton-on-Tees"},{"id":"UKC12","name":"South Teesside"},{"id":"UKC13","name":"Darlington"},{"id":"UKC14","name":"Durham CC"},{"id":"UKC21","name":"Northumberland"},{"id":"UKC22","name":"Tyneside"},{"id":"UKC23","name":"Sunderland"},{"id":"UKD11","name":"West Cumbria"},{"id":"UKD12","name":"East Cumbria"},{"id":"UKD33","name":"Manchester"},{"id":"UKD34","name":"Greater Manchester South West"},{"id":"UKD35","name":"Greater Manchester South East"},{"id":"UKD36","name":"Greater Manchester North West"},{"id":"UKD37","name":"Greater Manchester North East"},{"id":"UKD41","name":"Blackburn with Darwen"},{"id":"UKD42","name":"Blackpool"},{"id":"UKD44","name":"Lancaster and Wyre"},{"id":"UKD45","name":"Mid Lancashire"},{"id":"UKD46","name":"East Lancashire"},{"id":"UKD47","name":"Chorley and West Lancashire"},{"id":"UKD61","name":"Warrington"},{"id":"UKD62","name":"Cheshire East"},{"id":"UKD63","name":"Cheshire West and Chester"},{"id":"UKD71","n

{'id': 'UKC11', 'name': 'Hartlepool and Stockton-on-Tees'}

In [21]:
nuts3_uk = pd.DataFrame.from_dict(client.taxonomy().getUkNuts3())

b'{"data":[{"id":"UKC11","name":"Hartlepool and Stockton-on-Tees"},{"id":"UKC12","name":"South Teesside"},{"id":"UKC13","name":"Darlington"},{"id":"UKC14","name":"Durham CC"},{"id":"UKC21","name":"Northumberland"},{"id":"UKC22","name":"Tyneside"},{"id":"UKC23","name":"Sunderland"},{"id":"UKD11","name":"West Cumbria"},{"id":"UKD12","name":"East Cumbria"},{"id":"UKD33","name":"Manchester"},{"id":"UKD34","name":"Greater Manchester South West"},{"id":"UKD35","name":"Greater Manchester South East"},{"id":"UKD36","name":"Greater Manchester North West"},{"id":"UKD37","name":"Greater Manchester North East"},{"id":"UKD41","name":"Blackburn with Darwen"},{"id":"UKD42","name":"Blackpool"},{"id":"UKD44","name":"Lancaster and Wyre"},{"id":"UKD45","name":"Mid Lancashire"},{"id":"UKD46","name":"East Lancashire"},{"id":"UKD47","name":"Chorley and West Lancashire"},{"id":"UKD61","name":"Warrington"},{"id":"UKD62","name":"Cheshire East"},{"id":"UKD63","name":"Cheshire West and Chester"},{"id":"UKD71","n

In [22]:
nuts3_uk

Unnamed: 0,id,name
0,UKC11,Hartlepool and Stockton-on-Tees
1,UKC12,South Teesside
2,UKC13,Darlington
3,UKC14,Durham CC
4,UKC21,Northumberland
...,...,...
176,UKN12,Causeway Coast and Glens
177,UKN13,Antrim and Newtownabbey
178,UKN14,Lisburn and Castlereagh
179,UKN15,Mid and East Antrim


In [23]:
nuts3_uk[nuts3_uk["name"].str.contains("London")]

Unnamed: 0,id,name
79,UKI31,Camden and City of London


Now, it is the time to use the Global Smart Dataset API with the Occupation Insight

In [24]:
occupation = "Programmers and software development professionals"
area = "Camden and City of London"

In [27]:
r = client.ukDataset().getSocOccupationInsight(
    occupation=occupation,
    area=area
)

b'{"area":"Camden and City of London","occupation":"Programmers and software development professionals","date":"2022-07-30T17:33:48.406","area_classification":"nuts3_name","occupation_classification":"soc4_name","salary":{"min":20777,"max":247000,"median":80256,"unique_postings":5315},"current_year_active_postings":{"results":[{"month":"2021-06","unique_postings":616},{"month":"2021-07","unique_postings":630},{"month":"2021-08","unique_postings":604},{"month":"2021-09","unique_postings":584},{"month":"2021-10","unique_postings":519},{"month":"2021-11","unique_postings":563},{"month":"2021-12","unique_postings":578},{"month":"2022-01","unique_postings":821},{"month":"2022-02","unique_postings":1067},{"month":"2022-03","unique_postings":1389},{"month":"2022-04","unique_postings":1535},{"month":"2022-05","unique_postings":1835},{"month":"2022-06","unique_postings":1698}],"total_unique_postings":5315},"previous_year_active_postings":{"results":[{"month":"2020-06","unique_postings":751},{"m

Now we can use the information returned as a dict or a pandas dataframe.

In [29]:
refresh_date = r.refresh_date
print(f"Data refreshed at {refresh_date}")

Data refreshed at 2022-07-30T17:33:48.406


Last 12 months of job postings vs previous 12 months of job postings

In [31]:
current_year_active_postings = r.current_year_active_postings
previous_year_active_postings = r.previous_year_active_postings

ds_current_year_active_postings = pd.DataFrame.from_dict(current_year_active_postings)
ds_previous_year_active_postings = pd.DataFrame.from_dict(previous_year_active_postings)

In [32]:
ds_time_series = pd.concat([ds_current_year_active_postings, ds_previous_year_active_postings], axis=1)
ds_time_series

Unnamed: 0,month,unique_postings,month.1,unique_postings.1
0,2021-06,616,2020-06,751
1,2021-07,630,2020-07,767
2,2021-08,604,2020-08,757
3,2021-09,584,2020-09,773
4,2021-10,519,2020-10,927
5,2021-11,563,2020-11,805
6,2021-12,578,2020-12,891
7,2022-01,821,2021-01,815
8,2022-02,1067,2021-02,769
9,2022-03,1389,2021-03,763


Now we can access to the salary distribution

In [33]:
salary_max = r.salary_max
salary_min = r.salary_min
salary_median = r.salary_median

print(f"Salary max {salary_max:,.2f}")
print(f"Salary min {salary_min:,.2f}")
print(f"Salary median {salary_median:,.2f}")

Salary max 247,000.00
Salary min 20,777.00
Salary median 80,256.00


And the skills data represented as common skills and specialized skills

In [34]:
top_10_common_skills = r.top_10_common_skills
top_10_specialized_skills = r.top_10_specialized_skills


ds_top_10_common_skills = pd.DataFrame.from_dict(top_10_common_skills)
ds_top_10_specialized_skills = pd.DataFrame.from_dict(top_10_specialized_skills)

In [35]:
ds_top_10_common_skills

Unnamed: 0,name,unique_postings
0,Communications,788
1,Management,432
2,Problem Solving,350
3,Innovation,244
4,Leadership,241
5,Mentorship,232
6,Planning,180
7,Operations,169
8,Research,168
9,Troubleshooting (Problem Solving),167


In [36]:
ds_top_10_specialized_skills

Unnamed: 0,name,unique_postings
0,Communications,788
1,Management,432
2,Problem Solving,350
3,Innovation,244
4,Leadership,241
5,Mentorship,232
6,Planning,180
7,Operations,169
8,Research,168
9,Troubleshooting (Problem Solving),167


Finally we can extract the most common job titles (referred to the last year of web job postings):

In [37]:
top_10_job_titles = r.top_10_job_titles

ds_top_10_job_titles = pd.DataFrame.from_dict(top_10_job_titles)
ds_top_10_job_titles

Unnamed: 0,name,unique_postings
0,Java Developers,403
1,DevOps Engineers,258
2,Software Engineers,197
3,C# .NET Developers,146
4,Full Stack Developers,137
5,Software Developers,119
6,.NET Developers,99
7,Python Developers,99
8,Java Engineers,86
9,Lead Python Developers,74


...and the most mentioned employers (last year of data)

In [38]:
top_10_employers = r.top_10_employers

ds_top_10_employers = pd.DataFrame.from_dict(top_10_employers)
ds_top_10_employers

Unnamed: 0,name,unique_postings
0,Metro Bank,49
1,Digitech Resourcing,40
2,Uk Spring Cleaners Limited,38
3,Thomson Keene Associates,36
4,Tiro Partners,35
5,Noir,30
6,Intelligent Resource,29
7,Deerfoot,26
8,King's College London,26
9,83Zero Limited,22


# Occupation Insight API - Global


To obtain the list of occupation available you use the taxonomy client (global)

In [40]:
client.taxonomy().getOccupation()[0]

b'{"data":[{"id":"1","name":"na"},{"id":"10","name":"Business Development / Sales Manager"},{"id":"100","name":"Financial Quantitative Analyst"},{"id":"101","name":"Risk Manager / Analyst"},{"id":"102","name":"Investment Underwriter"},{"id":"103","name":"Fraud Examiner / Analyst"},{"id":"104","name":"Computer Scientist"},{"id":"105","name":"Data Scientist"},{"id":"106","name":"Systems Analyst"},{"id":"107","name":"Cyber / Information Security Engineer / Analyst"},{"id":"108","name":"Software Developer / Engineer"},{"id":"109","name":"Computer Programmer"},{"id":"11","name":"Communications / Public Relations Manager"},{"id":"110","name":"Mobile Applications Developer"},{"id":"111","name":"Computer Systems Engineer / Architect"},{"id":"112","name":"Web Designer"},{"id":"113","name":"Web Developer"},{"id":"114","name":"UI / UX Designer / Developer"},{"id":"115","name":"Database Administrator"},{"id":"116","name":"Network / Systems Administrator"},{"id":"117","name":"Telecommunications Eng

{'id': '1', 'name': 'na'}

In [42]:
occupations = pd.DataFrame.from_dict(client.taxonomy().getOccupation())

b'{"data":[{"id":"1","name":"na"},{"id":"10","name":"Business Development / Sales Manager"},{"id":"100","name":"Financial Quantitative Analyst"},{"id":"101","name":"Risk Manager / Analyst"},{"id":"102","name":"Investment Underwriter"},{"id":"103","name":"Fraud Examiner / Analyst"},{"id":"104","name":"Computer Scientist"},{"id":"105","name":"Data Scientist"},{"id":"106","name":"Systems Analyst"},{"id":"107","name":"Cyber / Information Security Engineer / Analyst"},{"id":"108","name":"Software Developer / Engineer"},{"id":"109","name":"Computer Programmer"},{"id":"11","name":"Communications / Public Relations Manager"},{"id":"110","name":"Mobile Applications Developer"},{"id":"111","name":"Computer Systems Engineer / Architect"},{"id":"112","name":"Web Designer"},{"id":"113","name":"Web Developer"},{"id":"114","name":"UI / UX Designer / Developer"},{"id":"115","name":"Database Administrator"},{"id":"116","name":"Network / Systems Administrator"},{"id":"117","name":"Telecommunications Eng

In [43]:
occupations.head()

Unnamed: 0,id,name
0,1,na
1,10,Business Development / Sales Manager
2,100,Financial Quantitative Analyst
3,101,Risk Manager / Analyst
4,102,Investment Underwriter


In [45]:
occupations[occupations["name"].str.contains("Data")]

Unnamed: 0,id,name
7,105,Data Scientist
18,115,Database Administrator
29,125,Database Architect
30,126,Data Warehousing Specialist
36,131,Data / Data Mining Analyst
46,140,Clinical Data Systems Specialist / Manager
457,511,Data Entry Clerk
619,659,Data Engineer


Check now the global database

In [46]:
r = client.globalDataset().getOccupationInsight(
    occupation="Data Scientist",
    area="Milan (ITA)"
)

b'{"area":"Milan (ITA)","occupation":"Data Scientist","date":"2022-07-30T17:38:32.035","area_classification":"market_name","occupation_classification":"occupation_name","salary":{"min":0,"max":0,"median":0,"unique_postings":0},"current_year_active_postings":{"results":[{"month":"2021-06","unique_postings":0},{"month":"2021-07","unique_postings":0},{"month":"2021-08","unique_postings":0},{"month":"2021-09","unique_postings":0},{"month":"2021-10","unique_postings":0},{"month":"2021-11","unique_postings":0},{"month":"2021-12","unique_postings":0},{"month":"2022-01","unique_postings":0},{"month":"2022-02","unique_postings":0},{"month":"2022-03","unique_postings":0},{"month":"2022-04","unique_postings":0},{"month":"2022-05","unique_postings":0},{"month":"2022-06","unique_postings":0}],"total_unique_postings":0},"previous_year_active_postings":{"results":[{"month":"2020-06","unique_postings":0},{"month":"2020-07","unique_postings":0},{"month":"2020-08","unique_postings":0},{"month":"2020-09"