# Data Professions × Data Competencies

— Avi Alkalay, Data Scientis

Generates a diagram that maps data-related professions to data-related competencies, as seen at the end of this notebook.

This was reconstructed by me from a slide shown by Julia Tessler, data scientist at iFood.

Represents what iFood expects from data professionals.

In [1]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from math import pi
%matplotlib inline
%config InlineBackend.figure_formats = ['svg']

In [2]:
# competencies = pd.read_csv('data.csv')
# competencies.rename(columns={'Unnamed: 0': "competencies"},inplace=True)
# competencies.set_index('competencies', inplace=True)
# competencies.replace(to_replace=r'.00\%$', value='', regex=True, inplace=True)
# competencies=competencies.apply(pd.to_numeric)
# competencies/=10

Here is the data to build the diagram

In [3]:
dataCompetencies = {
    'Data Analyst': {
        'Domain Knowledge': 10,
        'Statistics': 5,
        'Advanced Math': 3,
        'Software Engineering': 3,
        'Data Infrastructure': 2,
        'Data Wrangling': 8,
        'Communication & Visualization': 10
    },
    'Data Engineer': {
        'Domain Knowledge': 7,
        'Statistics': 3,
        'Advanced Math': 2,
        'Software Engineering': 9,
        'Data Infrastructure': 10,
        'Data Wrangling': 10,
        'Communication & Visualization': 3
    },
    'Data Scientist': {
        'Domain Knowledge': 7,
        'Statistics': 10,
        'Advanced Math': 10,
        'Software Engineering': 7,
        'Data Infrastructure': 4,
        'Data Wrangling': 8,
        'Communication & Visualization': 7
    }
}

index=pd.Index([
    'Domain Knowledge',
    'Statistics',
    'Advanced Math',
    'Software Engineering',
    'Data Infrastructure',
    'Data Wrangling',
    'Communication & Visualization'
])

In [4]:
competencies=pd.DataFrame(dataCompetencies, index=index)
competencies

Unnamed: 0,Data Analyst,Data Engineer,Data Scientist
Domain Knowledge,10,7,7
Statistics,5,3,10
Advanced Math,3,2,10
Software Engineering,3,9,7
Data Infrastructure,2,10,4
Data Wrangling,8,10,8
Communication & Visualization,10,3,7


In [5]:
competencies.describe()

Unnamed: 0,Data Analyst,Data Engineer,Data Scientist
count,7.0,7.0,7.0
mean,5.857143,6.285714,7.571429
std,3.436499,3.545621,2.070197
min,2.0,2.0,4.0
25%,3.0,3.0,7.0
50%,5.0,7.0,7.0
75%,9.0,9.5,9.0
max,10.0,10.0,10.0


Here is a generic function to generate the desired diagram.

In [6]:
def spiderWeb(df=None):
    # ------- PART 1: Create background

    # number of variables
    categories=df.shape[0]

    # What will be the angle of each axis in the plot? (we divide the plot / number of variable)
    angles = [ 2 * pi * n / float(categories) for n in range(categories)]
    angles += angles[:1]

    # Initialise the spider plot
    ax = plt.axes(polar=True)

    # If you want the first axis to be on top:
    ax.set_theta_offset(pi / 2)
    ax.set_theta_direction(-1)

    # Draw one axe per variable + add labels labels yet
    plt.xticks(angles[:-1], df.index)

    # Draw ylabels
    ax.set_rlabel_position(0)
    plt.yticks(range(df.max().max()+1), range(df.max().max()+1), color="grey", size=7)
    plt.ylim(0,df.max().max())


    # ------- PART 2: Add plots

    # Plot each individual = each line of the data
    # I don't do a loop, because plotting more than 3 groups makes the chart unreadable

    for c in df.columns:
        values = df[c].tolist()
        values += values[:1]
        
        label = "{label} (μ={mean}, σ={deviation})".format(
            label=c,
            mean=round(df[c].mean(),2),
            deviation=round(df[c].std(),2)
        )
        
        ax.plot(angles, values, linewidth=1, linestyle='solid', label=label)
        ax.fill(angles, values, alpha=0.1)

    # Add legend
#     plt.legend(loc='lower right', bbox_to_anchor=(0.1, 0.1))
    plt.legend(bbox_to_anchor=(0.1, 0.1))

In [None]:
spiderWeb(competencies)

# Visual manipulation

As you can see, matplotlib graph doesn't have a very nice layout. So I took its SVG output and instrumented it in Inkscape. Here is the result:

<img src="Data Profession Competencies.svg"/>