## Download Skills, Type, and Tags
***
The purpose of this worksheet is to extract the full list of skills from Emsi's library, as well as provide the type of skill (Hard, Soft, Certification) as well as any relevant link/short description that is tied to a Wikipedia article.

In [1]:
from EmsiApiPy import SkillsClassificationConnection

conn = SkillsClassificationConnection()

# download the data
all_skills = conn.get_list_all_skills(fields = "id,name,tags,type")
all_skills.keys()

dict_keys(['attributions', 'data'])

In [2]:
# Wikipedia attribution
all_skills["attributions"]

[{'name': 'Wikipedia',
  'text': 'Wikipedia extracts are distributed under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/3.0/)'}]

In [3]:
# we should have over 30k skills
len(all_skills["data"])

30792

In [4]:
# here's what one of them looks like with the Wikipedia info
all_skills["data"][1]

{'id': 'KS126XS6CQCFGC3NG79X',
 'name': '.NET Assemblies',
 'tags': [{'key': 'wikipediaExtract',
   'value': '\nDefined by Microsoft for use in recent versions of Windows, an assembly in the Common Language Infrastructure (CLI) is a compiled code library used for deployment, versioning, and security. There are two types: process assemblies (EXE) and library assemblies (DLL). A process assembly represents a process that will use classes defined in library assemblies. CLI assemblies contain code in CIL, which is usually generated from a CLI language, and then compiled into machine language at run time by the just-in-time compiler. In the .NET Framework implementation, this compiler is part of the Common Language Runtime (CLR).'},
  {'key': 'wikipediaUrl',
   'value': 'https://en.wikipedia.org/wiki/.NET_assemblies'}],
 'type': {'id': 'ST1', 'name': 'Hard Skill'}}

In [5]:
import pandas as pd
df = pd.DataFrame()

for skill in all_skills["data"]:
    temp_df = pd.DataFrame(
        {
            "id": [skill["id"]],
            "name": [skill["name"]],
            "type": [skill["type"]["name"]]
        }
    )
    
    for tag in skill["tags"]:
        temp_df[tag["key"]] = tag["value"]
    
    df = df.append(temp_df)

df.head()

Unnamed: 0,id,name,type,wikipediaExtract,wikipediaUrl
0,KS120P86XDXZJT3B7KVJ,(American Society For Quality) ASQ Certified,Certification,,
0,KS126XS6CQCFGC3NG79X,.NET Assemblies,Hard Skill,\nDefined by Microsoft for use in recent versi...,https://en.wikipedia.org/wiki/.NET_assemblies
0,KS1200B62W5ZF38RJ7TD,.NET Framework,Hard Skill,The .NET Framework is a software framework dev...,https://en.wikipedia.org/wiki/.NET_Framework
0,KS126XW78QJCF4TRV2X7,.NET Framework 1,Hard Skill,Microsoft started development on the .NET Fram...,https://en.wikipedia.org/wiki/.NET_Framework_1.0
0,KS126XY68BNKXSBSLPYS,.NET Framework 3,Hard Skill,The .NET Framework is a software framework dev...,https://en.wikipedia.org/wiki/.NET_Framework


In [6]:
with pd.ExcelWriter("skills_info.xlsx") as writer:
    df.to_excel(writer, "Data", index = False)