# Gartner Wanted Analytics API

In this example notebook, we will use the `WantedQuery` class, which is a python wrapper around the Gartner TalentNeuron API. We will explore job market data and use some of the tools that are builted in `talentml` to accelerate analysis.

In [17]:
from talentml.gartne.corre import WantedQuery, WantedDB
import talentml.onet.core as onet
from talentml.utils import viz

import os # operating system
import pandas as pd # excel
import matplotlib.pyplot as plt # visualisation
import igraph as ig
import chart_studio.plotly as py

## Extracting data

Parameters (See Wanted Analytics documentation for more details)

In [277]:
passkey = os.getenv('gartner_API_key') # This is a 32 characters-long key
date='2016-01-01-2020-02-07'
query = '"data scientist"|"scientifique des données"|"Artificial intelligence"|"intelligence artificielle"|"big data"\
        |"machine learning"|"science des données"|"data science"|"deep learning"|"apprentissage profond"|"apprentissage automatique"\
        |"apprentissage machine"'
'
function = '10' # Information technology

> Several operations can be made with the `query` parameter. In this example, the vertical bar (« | ») means 'OR'. Hence, we will search jobs that are related to data science, big data, machine learning and artificial intelligence

Now, let's call an object of type `WantedQuery` with defined parameters
> We are using the same wording for the class parameters and the variable names to make the call look slicker

In [3]:
wq = WantedQuery(
    passkey=passkey, 
    function=function, 
    query=query,
    date=date
)
wq

Full URL : 
https://tnrp-api.gartner.com/wantedapi/v5.0/jobs?responsetype=json&descriptiontype=long&function=10&pagesize=100&query="data%22scientist"|"scientifique%22des%22données"|"Artificial%22intelligence"|"intelligence%22artificielle"|"big%22data"%22%22%22%22%22%22%22%22|"machine%22learning"|"deep%22learning"|"apprentissage%22profond"|"apprentissage%22automatique"&date=2016-01-01-2020-02-07&passkey=86a96c09ddc92d62bde25f8aded37ebf 


<talentml.gartner.core.WantedQuery at 0x254b1bb7670>

Note that some parameters have default values

- pagesize = '100' (can't be superior to 100)
- responsetype = 'json' (can also be 'xml')
- description = 'long' (can be 'short' or None)

Download data with the `get_data` method

In [173]:
data = wq.get_data()

## Exploring data

The `WantedDB` class creates more human-readable feature names, type-check some features and contains some hands-on function to accelerate data analysis

In [174]:
# Shape
data.shape

(309, 56)

--> *340 job postings found. Each observation has 56 features.*

In [175]:
data.columns

Index(['dates_first_seen', 'dates_refreshed', 'dates_posted', 'ids',
       'hash_number', 'ref_number', 'is_staffing', 'is_third_party',
       'is_inappropriate', 'is_bulk', 'is_aggregator', 'is_free',
       'is_classified_occupation', 'is_classified_industry', 'is_current',
       'title_name', 'title_id', 'semi_clean_title_id', 'clean_title_id',
       'description', 'occupation_code', 'occupation_label',
       'occupation_revision', 'industry_code', 'industry_label', 'function_id',
       'function_name', 'employer_id', 'employer_name',
       'employer_super_alias_id', 'city_code', 'city_name', 'state_code',
       'state_name', 'county_code', 'county_name', 'msa_code', 'msa_name',
       'wib_code', 'wib_name', 'latitude', 'longitude', 'salary_id',
       'salary_type', 'salary_value', 'jobtype_0_id', 'jobtype_0_name',
       'jobtype_1_id', 'tags', 'source_job_id', 'source_id', 'source_tags',
       'source_type', 'source_name', 'source_url', 'source_valid_link'],
      dtype

Let's plot the occurence over time with `hist_plot` function. 

In [176]:
viz.hist_plot(
    series = data.dates_first_seen,
    title = "Évolution du nombre d'emploi (2016-2020)"
)    

--> *We are approaching 30 new postings per week, which is quite a good momentum*

In [177]:
city_count = viz.city_postings(df=data)
city_count

Unnamed: 0,city_name,count_scaled,latitude,longitude
Montréal,246,0.9,45.527901,-73.651703
Québec,11,0.132653,46.8517,-71.330299
Saint-Laurent,4,0.109796,45.522598,-73.732903
Sherbrooke,4,0.109796,45.401798,-71.965797
Brossard,2,0.103265,45.446602,-73.4562
Dorval,2,0.103265,45.450901,-73.753304
Sainte-Julie,1,0.1,45.598801,-73.329498
LaSalle,1,0.1,45.446098,-73.631401
Boucherville,1,0.1,45.596401,-73.43
Anjou,1,0.1,45.619999,-73.588997


In [178]:
viz.map_city_count(city_value_counts_df=city_count)

--> *Circle map shows Montreal as the epicenter of jobs in AI, data science and Big Data* 

In [179]:
skill = onet.OnetDB()

In [180]:
hot_techs = skill.get('hot_technologies')[1:]['Hot Technologies'].values
descriptions = data.description.values

In [181]:
from collections import Counter



def word_count(text, word_list):
    counts = Counter()
    
    uniques = set(text.split())

    for word in word_list:
        if word in uniques:
            counts[word] += text.count(word)
    
    return(counts)

Counter({'Python': 1})

In [182]:
counters = data.description.apply(lambda x: word_count(x, hot_techs))

In [183]:
counters

0                     {'Python': 1}
1      {'MongoDB': 1, 'Node.js': 2}
2                                {}
3                     {'Python': 2}
4            {'R': 3, 'Tableau': 1}
                   ...             
304                    {'Linux': 1}
305                  {'Tableau': 2}
306                              {}
307                              {}
308                              {}
Name: description, Length: 309, dtype: object

In [184]:
df = pd.DataFrame(index = range(len(counters)))

links = []

for obs in range(len(counters)):
    words = [k for k,v in counters[obs].items()]
    occurences = [v for k,v in counters[obs].items()]
    
    combinations = list(itertools.combinations(words, 2))
    
    for idx, word in enumerate(words) :
        df.loc[obs, word] = occurences[idx]
        
    for combo in combinations:
        links.append([df.columns.get_loc(combo[0]), df.columns.get_loc(combo[1])])
    

df = df.fillna(0)


links

[[1, 2],
 [3, 4],
 [5, 6],
 [5, 7],
 [5, 0],
 [7, 0],
 [6, 0],
 [7, 11],
 [7, 10],
 [11, 10],
 [9, 7],
 [9, 6],
 [9, 13],
 [7, 6],
 [7, 13],
 [6, 13],
 [14, 15],
 [5, 6],
 [5, 0],
 [6, 0],
 [16, 7],
 [6, 17],
 [7, 0],
 [12, 18],
 [12, 3],
 [18, 3],
 [14, 19],
 [5, 6],
 [8, 0],
 [20, 11],
 [20, 21],
 [11, 21],
 [22, 11],
 [16, 6],
 [0, 15],
 [7, 0],
 [23, 3],
 [16, 0],
 [0, 15],
 [5, 6],
 [11, 2],
 [17, 10],
 [5, 2],
 [5, 10],
 [2, 10],
 [25, 6],
 [25, 0],
 [6, 0],
 [6, 0],
 [17, 3],
 [17, 15],
 [3, 15],
 [14, 6],
 [14, 0],
 [14, 15],
 [6, 0],
 [6, 15],
 [0, 15],
 [16, 5],
 [16, 0],
 [5, 0],
 [5, 11],
 [5, 17],
 [5, 0],
 [11, 17],
 [11, 0],
 [17, 0],
 [17, 15],
 [5, 6],
 [5, 2],
 [5, 26],
 [6, 2],
 [6, 26],
 [2, 26],
 [5, 6],
 [5, 2],
 [5, 26],
 [6, 2],
 [6, 26],
 [2, 26],
 [0, 3],
 [0, 27],
 [3, 27],
 [17, 10],
 [17, 10],
 [17, 10],
 [17, 10],
 [5, 11],
 [5, 6],
 [5, 0],
 [11, 6],
 [11, 0],
 [6, 0],
 [14, 6],
 [14, 0],
 [14, 15],
 [6, 0],
 [6, 15],
 [0, 15],
 [0, 15],
 [14, 16],
 [1, 1

In [252]:
[links.count(link) for link in links if links.count(link)>5]

[9,
 7,
 13,
 9,
 7,
 13,
 9,
 10,
 9,
 10,
 9,
 13,
 13,
 13,
 10,
 9,
 7,
 7,
 9,
 9,
 9,
 7,
 13,
 13,
 10,
 10,
 10,
 9,
 13,
 10,
 10,
 13,
 9,
 7,
 9,
 9,
 9,
 7,
 13,
 9,
 13,
 10,
 9,
 13,
 9,
 9,
 10,
 13]

In [276]:


from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go

init_notebook_mode(connected=True)

#DEFAULT_CHART_CONFIG = { 'modeBarButtons': [ [ 'toImage', 'sendDataToCloud']]}

G=ig.Graph(links, directed=False)
layt=G.layout('kk', dim=3)

from sklearn.cluster import KMeans
import numpy as np

import seaborn as sns
palette = sns.color_palette('Blues', 15)


X = np.array(links)
kmeans = KMeans(n_clusters=3, random_state=0).fit(X)

link_occurence = [links.count(link) for link in links]

Xn=[layt[k][0] for k in range(len(layt))]# x-coordinates of nodes
Yn=[layt[k][1] for k in range(len(layt))]# y-coordinates
Zn=[layt[k][2] for k in range(len(layt))]# z-coordinates
Xe=[]
Ye=[]
Ze=[]
for e in links:
    Xe+=[layt[e[0]][0],layt[e[1]][0], None]# x-coordinates of edge ends
    Ye+=[layt[e[0]][1],layt[e[1]][1], None]
    Ze+=[layt[e[0]][2],layt[e[1]][2], None]
    

trace1=go.Scatter3d(x=Xe,
               y=Ye,
               z=Ze,
               mode='lines',
               opacity = 0.1,
               line=dict(color=[palette[link] for link in link_occurence], colorscale= 'Viridis',width=2)
               )

trace2=go.Scatter3d(x=Xn,
               y=Yn,
               z=Zn,
               mode='markers+text',
               text = df.columns,
               name='skills',
               marker=dict(symbol='circle',
                           color = [np.ceil(sum(df[x])/3) for x in df.columns],
                           colorscale = 'Blues',
                             size=[np.ceil(sum(df[x])/3) for x in df.columns],
                             line=dict(color='rgb(50,50,50)', width=0.5),
                           opacity = 1
                             )
               )

axis=dict(showbackground=False,
          showline=False,
          zeroline=False,
          showgrid=False,
          showticklabels=False,
          showspikes=False,
          title=''
          )

layout = go.Layout(
         title="Network of coappearances of characters in Victor Hugo's novel<br> Les Miserables (3D visualization)",
         width=1000,
         height=1000,
         showlegend=False,
         scene=dict(
             xaxis=dict(axis),
             yaxis=dict(axis),
             zaxis=dict(axis),
        ),
     margin=dict(
        t=100
    ),
    hovermode='closest',
    annotations=[
           dict(
           showarrow=False,
            text="Data source: <a href='http://bost.ocks.org/mike/miserables/miserables.json'>[1] miserables.json</a>",
            xref='paper',
            yref='paper',
            x=0,
            y=0.1,
            xanchor='left',
            yanchor='bottom',
            font=dict(
            size=14
            )
            )
        ],    )


#idx

#skills_id=range(list(_df.columns))

data=[trace1, trace2]
fig=go.Figure(data=data, layout=layout)

iplot(fig)



In [237]:
link_occurence = [links.count(link) for link in links]

[palette[link] for link in link_occurence]



[(0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.44844290657439445, 0.44844290657439445, 0.44844290657439445),
 (0.8955478662053056, 0.8955478662053056, 0.8955478662053056),
 (0.586082276047674, 0.586082276047674, 0.586082276047674),
 (0.8501191849288735, 0.8501191849288735, 0.8501191849288735),
 (0.14111495578623606, 0.14111495578623606, 0.14111495578623606),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.8955478662053056, 0.8955478662053056, 0.8955478662053056),
 (0.9408227604767397, 0.9408227604767397, 0.9408227604767397),
 (0.9408227604767397, 0.9408227604767397, 0.94082276

In [36]:
# G=ig.Graph(links, directed=False)
# layt=G.layout('kk', dim=3)
# layt[92]

[-1.6899906964966604, -6.029840274824559, 6.3735930503096725]

In [43]:
# from sklearn.cluster import KMeans
# import numpy as np

# X = np.array(links)
# kmeans = KMeans(n_clusters=3, random_state=0).fit(X)

# kmeans.labels_


# #kmeans.cluster_centers_array

array([1, 1, 1, ..., 0, 2, 2])

In [37]:
# Xn=[layt[k][0] for k in range(92)]# x-coordinates of nodes
# Yn=[layt[k][1] for k in range(92)]# y-coordinates
# Zn=[layt[k][2] for k in range(92)]# z-coordinates
# Xe=[]
# Ye=[]
# Ze=[]
# for e in links:
#     Xe+=[layt[e[0]][0],layt[e[1]][0], None]# x-coordinates of edge ends
#     Ye+=[layt[e[0]][1],layt[e[1]][1], None]
#     Ze+=[layt[e[0]][2],layt[e[1]][2], None]

In [51]:
# print(Xn)


# trace1=go.Scatter3d(x=Xe,
#                y=Ye,
#                z=Ze,
#                mode='lines',
#                line=dict(color='rgb(125,125,125)', width=0.1)
#                )

# trace2=go.Scatter3d(x=Xn,
#                y=Yn,
#                z=Zn,
#                mode='markers',
#                name='actors',
#                marker=dict(symbol='circle',
#                              size=np.random.randint(8,20,92),
#                              line=dict(color='rgb(50,50,50)', width=0.5)
#                              )
#                )

# axis=dict(showbackground=False,
#           showline=False,
#           zeroline=False,
#           showgrid=False,
#           showticklabels=False,
#           title=''
#           )

# layout = go.Layout(
#          title="Network of coappearances of characters in Victor Hugo's novel<br> Les Miserables (3D visualization)",
#          width=1000,
#          height=1000,
#          showlegend=False,
#          scene=dict(
#              xaxis=dict(axis),
#              yaxis=dict(axis),
#              zaxis=dict(axis),
#         ),
#      margin=dict(
#         t=100
#     ),
#     hovermode='closest',
#     annotations=[
#            dict(
#            showarrow=False,
#             text="Data source: <a href='http://bost.ocks.org/mike/miserables/miserables.json'>[1] miserables.json</a>",
#             xref='paper',
#             yref='paper',
#             x=0,
#             y=0.1,
#             xanchor='left',
#             yanchor='bottom',
#             font=dict(
#             size=14
#             )
#             )
#         ],    )

[-0.14283486022019182, -0.29651973693728867, 3.9512793567612916, 3.0891346533981427, -2.6061613070442693, -1.8706842186521286, -3.7009418657436712, -1.191452315450057, 1.2812954081014407, -4.087659503035206, 6.897505721682005, 1.3729095214057012, 4.226003512155625, -3.8368771360413, 2.2414923078264057, 6.106085394145131, 1.5683174674898783, 3.349716648395007, 1.2647785016562212, -2.3071879020281036, -1.0886744421995038, 6.4542967162043094, -0.86106088895735, 2.3118205488899677, 4.257523051325946, 0.9105783369902071, -0.8052361972992307, 5.015551677052296, 1.4589141448067493, 0.9560964921723953, 0.34070398484240066, 1.7756158315318546, 0.1676175162994189, -2.0028948155578137, -2.843346533081098, 0.598102994920803, 4.6075281694645245, 7.194747719850974, 3.7919791127844973, 2.126643053077535, 1.6669163931729027, 5.013244686976771, 1.201678947180494, -2.6979697387568002, 2.033452916036112, 2.2340962562357207, -0.13257859846331443, -0.14986082342904358, -0.5551319620981918, 2.72832382152048

In [52]:
# data=[trace1, trace2]
# fig=go.Figure(data=data, layout=layout)

# plotly.offline.iplot(fig, filename='Les-Miserables')

In [90]:
onet_skills = pd.read_excel('https://www.onetcenter.org/dl_files/database/db_24_2_excel/Technology%20Skills.xlsx')
onet_skills['count'] = 0
skills = pd.DataFrame(onet_skills['Example'].unique(), index=range(len(onet_skills['Example'].unique())), columns=['name'])
skills['count'] = 0


onet_knowledge = pd.read_excel('https://www.onetcenter.org/dl_files/database/db_24_2_excel/Knowledge.xlsx')
onet_knowledge['count'] = 0
knowledge = pd.DataFrame(onet_knowledge['Element Name'].unique(), index=range(len(onet_knowledge['Element Name'].unique())), columns=['name'])
knowledge['count'] = 0


In [91]:
for job in range(len(data)):
    for x in range(len(skills)):
        if skills.loc[x, 'name'] in data.loc[job,'description_value']:
            skills.loc[x, 'count']+=data.loc[job,'description_value'].count(skills.loc[x, 'name'])
            

for job in range(len(data)):
    for y in range(len(knowledge)):
        if knowledge.loc[y, 'name'] in data.loc[job,'description_value']:
            knowledge.loc[y, 'count']+=data.loc[job,'description_value'].count(knowledge.loc[y, 'name'])

In [83]:
data.loc[0,'description_value']

'DÉVELOPPEUR BACK-END WEB – PRODUIT QUELQUES MOTS SUR NOUS Moment Factory est un studio multimédia, réunissant un large éventail d’expertises sous un même toit. Notre équipe combine des spécialités dans la vidéo, l’éclairage, l’architecture, le son et les effets spéciaux afin de créer des expériences mémorables. Basé à Montréal, le studio possède également des bureaux à Los Angeles, Londres, Paris, New York et Tokyo. Depuis ses débuts en 2001, Moment Factory a créé plus de 400 productions et destinations uniques dans le monde, pour des clients tels que l’aéroport de Los Angeles, Nine Inch Nails, Microsoft, la NFL, Sony, Toyota, la Sagrada Familia de Barcelone, Madonna et la Royal Caribbean. VOTRE ÉQUIPE Conçu par Moment Factory, le logiciel X-Agora simplifie la gestion des expériences immersives et permet l’opération de nos spectacles. Pour soutenir son évolution constante, l’équipe X-Agora met à profit ses esprits logiques et créatifs pour offrir un produit adapté aux projets. Rejoind

In [92]:
#onet_skills.columns
knowledge.sort_values(by='count',ascending=False)

Unnamed: 0,name,count
10,Design,65
13,Mathematics,9
32,Transportation,3
14,Physics,2
18,Sociology and Anthropology,0
20,Medicine and Dentistry,0
21,Therapy and Counseling,0
22,Education and Training,0
23,English Language,0
24,Foreign Language,0
