## Deep Learning Framework Power Scores 2018
## By Jeff Hale

### See this Medium article for a discussion of the state of Python deep learning frameworks in 2018 featuring these charts.

I'm going to use plotly and pandas to make interactive visuals for this project.

# Please upvote this Kaggle kernel if you find it helpful.

In [None]:
# import the usual frameworks
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import collections
import warnings

from IPython.core.display import display, HTML
from sklearn.preprocessing import MinMaxScaler

import os
print(os.listdir("../input"))
    
# import plotly 
import plotly.figure_factory as ff
import plotly.graph_objs as go
import plotly.offline as py
import plotly.tools as tls

# for color scales in plotly
import colorlover as cl 

# define color scale https://plot.ly/ipython-notebooks/color-scales/
cs = cl.scales['7']['qual']['Dark2']     

# configure things
warnings.filterwarnings('ignore')

pd.options.display.float_format = '{:,.2f}'.format  
pd.options.display.max_columns = 999

py.init_notebook_mode(connected=True)

%load_ext autoreload
%autoreload 2
%matplotlib inline

List package versions for reproducibility.

In [None]:
!pip list

Read in the data from the csv. The Google sheet that holds the data is available [here](https://docs.google.com/spreadsheets/d/1mYfHMZfuXGpZ0ggBVDot3SJMU-VsCsEGceEL8xd1QBo/edit?usp=sharing).

In [None]:
new_col_names = ['framework','indeed', 'monster', 'simply', 'linkedin', 'angel', 
                 'usage', 'search', 'medium', 'books', 'arxiv', 'stars', 
                 'watchers', 'forks', 'contribs',
                ]

df = pd.read_csv('../input/dsframeworks.csv', 
                 skiprows=4,
                 header=None, 
                 nrows=7, 
                 thousands=',',
                 index_col=0,
                 names=new_col_names,
                 usecols=new_col_names,
                )
df

Cool. We used the read_csv parameters to give us just what we wanted.

## Basic Data Exploration
Let's see what the data look like.

In [None]:
df.info()

In [None]:
df.describe()

Looks like pandas read the usage column as a string because of it's percent sign. Let's make that a decimal.

In [None]:
df['usage'] = pd.to_numeric(df['usage'].str.strip('%'))
df['usage'] = df['usage'].astype(int)
df

In [None]:
df.info()

All ints! Great!

# Plotly
Let's make interactive plots with plotly for each popularity category.

## Online Job Listings
I looked at how many times each framework appeared in searches on job listing websites. For more discussion see the Medium Article the accompanies this notebook here.

In [None]:
# sum groupby for the hiring columns
df['hiring'] = df['indeed'] + df['monster'] + df['linkedin'] + df['simply'] + df['angel']

In [None]:
df

In [None]:
data = [go.Bar(
    x=df.index,
    y=df.hiring,
    marker=dict(color=cs),
    )
]

layout = {'title': 'Online Job Listings',
          'xaxis': {'title': 'Framework'},
          'yaxis': {'title': "Quantity"},
         }

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

That's just the aggregate listings. Let's plot the job listing mentions for each website in a stacked bar chart. This will take multiple traces.

In [None]:
y_indeed = df['indeed']
y_monster = df['monster']
y_simply = df['simply']
y_linkedin = df['linkedin']
y_angel = df['angel']

In [None]:
indeed = go.Bar(x=df.index, y=y_indeed, name = 'Indeed')
simply = go.Bar(x=df.index, y=y_simply, name='Simply Hired')
monster = go.Bar(x=df.index, y=y_monster, name='Monster')
linked = go.Bar(x=df.index, y=y_linkedin, name='LinkedIn')
angel = go.Bar(x=df.index, y=y_angel, name='Angel List')

data = [linked, indeed, simply, monster, angel]
layout = go.Layout(
    barmode='stack',
    title='Online Job Listings',
    xaxis={'title': 'Framework'},
    yaxis={'title': 'Mentions', 'separatethousands': True},
    colorway=cs,
)

fig = go.Figure(data = data, layout = layout)
py.iplot(fig)

Cool. Now let's see how this data looks with grouped bars instead of stacked bars by changing the barmode to "group".

In [None]:
indeed = go.Bar(x=df.index, y=y_indeed, name = "Indeed")
simply = go.Bar(x=df.index, y=y_simply, name="Simply Hired")
monster = go.Bar(x=df.index, y=y_monster, name="Monster")
linked = go.Bar(x=df.index, y=y_linkedin, name="LinkedIn")
angel = go.Bar(x=df.index, y=y_angel, name='Angel List')

data = [linked, indeed, simply, monster, angel]
layout = go.Layout(
    barmode='group',
    title="Online Job Listings",
    xaxis={'title': 'Framework'},
    yaxis={'title': "Listings", 'separatethousands': True,
    }
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

## KDnuggets Usage Survey
Let's look at usage as reported in KDnuggets 2018 survey.

In [None]:
# Make sure you have colorlover imported as cl for color scale
df['usage'] = df['usage'] / 100

## Google Search Volume

In [None]:
data = [
    go.Bar(
        x=df.index, 
        y=df['usage'],
        marker=dict(color=cs)  
    )
]
    
layout = {
    'title': 'KDnuggets Usage Survey',
    'xaxis': {'title': 'Framework'},
    'yaxis': {'title': "% Respondents Used in Past Year", 'tickformat': '.0%'},
}

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

In [None]:
data = [
    go.Bar(
        x = df.index, 
        y = df['search'],
        marker = dict(color=cs),  
    )
]
    
layout = {
    'title': 'Google Search Volume',
    'xaxis': {'title': 'Framework'},
    'yaxis': {'title': "Relative Search Volume"},
}

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

## Medium Articles

In [None]:
# Make sure you have colorlover imported as cl for color scale
# cs is defined in first cell

data = [
    go.Bar(
        x=df.index, 
        y=df['medium'],
        marker=dict(color=cs) ,
    )
]
    
layout = {
    'title': 'Medium Articles',
    'xaxis': {'title': 'Framework'},
    'yaxis': {'title': "Articles"},
}

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

## Amazon Books

In [None]:
data = [
    go.Bar(
        x=df.index, 
        y=df['books'],
        marker=dict(color=cs),           
    )
]
    
layout = {
    'title': 'Amazon Books',
    'xaxis': {'title': 'Framework'},
    'yaxis': {'title': "Books"},
}

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

## ArXiv Articles

In [None]:
data = [
    go.Bar(
        x=df.index, 
        y=df['arxiv'],
        marker=dict(color=cs),           
    )
]

layout = {
    'title': 'ArXiv Articles',
    'xaxis': {'title': 'Framework'},
    'yaxis': {'title': "Articles"},
}

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

# GitHub Activity
Let's make another stacked bar chart of the four GitHub categories.

In [None]:
y_stars = df['stars']
y_watchers = df['watchers']
y_forks = df['forks']
y_contribs = df['contribs']

stars = go.Bar(x = df.index, y=y_stars, name="Stars")
watchers = go.Bar(x=df.index, y=y_watchers, name="Watchers")
forks = go.Bar(x=df.index, y=y_forks, name="Forks")
contribs = go.Bar(x=df.index, y=y_contribs, name="Contributors")


data = [stars, watchers, forks, contribs]
layout = go.Layout(barmode='stack', 
    title="GitHub Activity",
    xaxis={'title': 'Framework'},
    yaxis={
        'title': "Quantity",
        'separatethousands': True,
    }
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

This configuration doesn't make the most sense, because there are going to be way more stars than contributors. It's not an apples to apples comparison. Let's try four subplots instead.

In [None]:
trace1 = go.Bar(
    x=df.index,
    y=df['stars'],
    name='Stars',
    marker=dict(color=cs),
)
trace2 = go.Bar(
    x=df.index,
    y=df['forks'],
    name ="Forks",
    marker=dict(color=cs)
)
trace3 = go.Bar(
    x=df.index,
    y=df['watchers'],
    name='Watchers',
    marker=dict(color=cs)
)
trace4 = go.Bar(
    x=df.index,
    y=df['contribs'],
    name='Contributors',
    marker=dict(color=cs),
)

fig = tls.make_subplots(
    rows=2, 
    cols=2, 
    subplot_titles=(
        'Stars', 
        'Forks',
        'Watchers',
        'Contributors',
    )
)

fig['layout']['yaxis3'].update(separatethousands = True)
fig['layout']['yaxis4'].update(separatethousands = True)
fig['layout']['yaxis2'].update(tickformat = ',k', separatethousands = True)
fig['layout']['yaxis1'].update(tickformat = ',k', separatethousands = True)

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)

fig['layout'].update(title = 'GitHub Activity', showlegend = False)
py.iplot(fig)

This presentation shows the information in a more comprehensible and appropriate format.

# Scale and Aggregate for Power Scores
Scale each column. For each column we'll use MinMaxScaler to subtract the minumum and divide by the original max - original min.

In [None]:
df.info()

In [None]:
scale = MinMaxScaler()
scaled_df = pd.DataFrame(
    scale.fit_transform(df), 
    columns = df.columns,
    index = df.index)    

In [None]:
scaled_df

### Scaled Online Job Listings
Let's combine the scaled online job listing columns into a new column.

In [None]:
scaled_df['hiring_score'] = scaled_df[['indeed', 'monster', 'simply', 'linkedin', 'angel']].mean(axis=1)

In [None]:
scaled_df

Now we have a hiring score.

### Scaled GitHub Activity

Let's combine the scaled GitHub columns into a new column.

In [None]:
scaled_df['github_score'] = scaled_df[['stars', 'watchers', 'forks', 'contribs']].mean(axis=1)

In [None]:
scaled_df

Now we have all our aggregate columns and are ready to turn to the weights.

## Weights

Let's make a pie chart of weights by category.

In [None]:
weights = {'Online Job Listings ': .3,
           'KDnuggets Usage Survey': .2,
           'GitHub Activity': .1,
           'Google Search Volume': .1,
           'Medium Articles': .1,
           'Amazon Books': .1,
           'ArXiv Articles': .1 }

In [None]:
# changing colors because we want to show these aren't the frameworks
weight_colors = cl.scales['7']['qual']['Set1'] 

common_props = dict(
    labels = list(weights.keys()),
    values = list(weights.values()),
    textfont=dict(size=16),
    marker=dict(colors=weight_colors),
    hoverinfo='none',
    showlegend=False,
)

trace1 = go.Pie(
    **common_props,
    textinfo='label',
    textposition='outside',
)

trace2 = go.Pie(
    **common_props,
    textinfo='percent',
    textposition='inside',
)

layout = go.Layout(title = 'Weights by Category')

fig = go.Figure([trace1, trace2], layout=layout)
py.iplot(fig)

## Weight the Categories

In [None]:
scaled_df['w_hiring'] = scaled_df['hiring_score'] * .3
scaled_df['w_usage'] = scaled_df['usage'] * .2
scaled_df['w_github'] = scaled_df['github_score'] * .1
scaled_df['w_search'] = scaled_df['search'] * .1
scaled_df['w_arxiv'] = scaled_df['arxiv'] * .1
scaled_df['w_books'] = scaled_df['books'] * .1
scaled_df['w_medium'] = scaled_df['medium'] * .1

In [None]:
weight_list = ['w_hiring', 'w_usage', 'w_github', 'w_search', 'w_arxiv', 'w_books', 'w_medium']
scaled_df = scaled_df[weight_list]
scaled_df

## Power Scores
Let's make the power score column by summing the seven category scores.

In [None]:
scaled_df['ps'] = scaled_df[weight_list].sum(axis = 1)
scaled_df

Let's clean things up for publication

In [None]:
p_s_df = scaled_df * 100
p_s_df = p_s_df.round(2)
p_s_df.columns = ['Job Search Listings', 'Usage Survey', 'Search Volume', 'Medium Articles', 'Amazon Books', 'ArXiv Articles', 'GitHub Activity', 'Power Score']
p_s_df.rename_axis('Framework', inplace = True)
p_s_df

Let's make a bar chart of the power scores.

In [None]:
data = [
    go.Bar(
        x=scaled_df.index,          # you can pass plotly the axis
        y=p_s_df['Power Score'],
        marker=dict(color=cs),
        text=p_s_df['Power Score'],
        textposition='outside',
        textfont=dict(size=10)
    )
]

layout = {
    'title': 'Deep Learning Framework Power Scores 2018',
    'xaxis': {'title': 'Framework'},
    'yaxis': {'title': "Score"}
}

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

## That's the end! 
## Pleave upvote if you found this interesting or informative!