This notebook uses Viola to render the notebook as a dashboard. It's hosted on the cloud for free using mybinder.org. 

### NBA Player Stats and Salary

For this article, we will be using two separate a dataset from kaggle, The salary data that can be found <a href="https://www.kaggle.com/koki25ando/salary"> here </a> and the player stats can be found <a href="https://www.kaggle.com/mcamli/nba17-18#nba.csv"> here. </a> Both dataset are from the season 2017 - 2018.

In [3]:
%%capture
!pip install -U altair vega_datasets notebook vega

In [2]:
import numpy as np 
import pandas as pd 
import seaborn as sns
import warnings
import matplotlib.pyplot as plt
from ipywidgets import interact
import altair as alt

%matplotlib inline
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns',False)
pd.set_option('display.max_rows',100)
sns.set(style='ticks')
alt.renderers.enable('kaggle')

RendererRegistry.enable('kaggle')

In [3]:
salary = pd.read_csv('NBA_season1718_salary.csv')
stats = pd.read_csv('nba_extra.csv')

First, we will transform the ```Player``` column in the stats data, so we can match in with the salary data. We'll also drop the first column in the salary data.



##### The first few rows of the Salary Data:


In [4]:
salary.head()

Unnamed: 0.1,Unnamed: 0,Player,Tm,season17_18
0,1,Stephen Curry,GSW,34682550.0
1,2,LeBron James,CLE,33285709.0
2,3,Paul Millsap,DEN,31269231.0
3,4,Gordon Hayward,BOS,29727900.0
4,5,Blake Griffin,DET,29512900.0


##### The first few rows of the Stats Data:

In [5]:
salary.drop('Unnamed: 0',axis=1,inplace=True)
stats.head()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,1,Alex Abrines\abrinal01,SG,24,OKC,75,8,1134,115,291,0.395,84,221,0.38,31,70,0.443,0.54,39,46,0.848,26,88,114,28,38,8,25,124,353
1,2,Quincy Acy\acyqu01,PF,27,BRK,70,8,1359,130,365,0.356,102,292,0.349,28,73,0.384,0.496,49,60,0.817,40,217,257,57,33,29,60,149,411
2,3,Steven Adams\adamsst01,C,24,OKC,76,76,2487,448,712,0.629,0,2,0.0,448,710,0.631,0.629,160,286,0.559,384,301,685,88,92,78,128,215,1056
3,4,Bam Adebayo\adebaba01,C,20,MIA,69,19,1368,174,340,0.512,0,7,0.0,174,333,0.523,0.512,129,179,0.721,118,263,381,101,32,41,66,138,477
4,5,Arron Afflalo\afflaar01,SG,32,ORL,53,3,682,65,162,0.401,27,70,0.386,38,92,0.413,0.485,22,26,0.846,4,62,66,30,4,9,21,56,179


In [6]:
new=[]
for i in range(0,len(stats.Player)):
    x=stats.Player[i].split("\\")
    new.append(x[0])
stats["Player"]=new

In [7]:
data = pd.merge(salary,stats,on=['Player','Tm'])

##### First 10 rows of the  merged Data, sorted by salary. 

Note that this dataset contains duplicate rows, because players can switch teams in a single season.

In [8]:
data.sort_values(by='season17_18',ascending=False).head(10)

Unnamed: 0,Player,Tm,season17_18,Rk,Pos,Age,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
0,Stephen Curry,GSW,34682550.0,120,PG,29,51,51,1631,428,864,0.495,212,501,0.423,216,363,0.595,0.618,278,302,0.921,36,225,261,310,80,8,153,114,1346
1,LeBron James,CLE,33285709.0,248,PF,33,82,82,3026,857,1580,0.542,149,406,0.367,708,1174,0.603,0.59,388,531,0.731,97,612,709,747,116,71,347,136,2251
2,Paul Millsap,DEN,31269231.0,341,PF,32,38,37,1143,202,435,0.464,39,113,0.345,163,322,0.506,0.509,112,161,0.696,65,180,245,105,39,44,73,99,555
3,Gordon Hayward,BOS,29727900.0,207,SF,27,1,1,5,1,2,0.5,0,1,0.0,1,1,1.0,0.5,0,0,,0,1,1,0,0,0,0,1,2
4,Blake Griffin,DET,29512900.0,191,PF,28,25,25,831,181,418,0.433,47,135,0.348,134,283,0.473,0.489,87,111,0.784,27,139,166,155,11,9,66,61,496
5,Kyle Lowry,TOR,28703704.0,305,PG,31,78,78,2510,403,944,0.427,238,596,0.399,165,348,0.474,0.553,223,261,0.854,66,368,434,537,85,19,183,192,1267
6,Russell Westbrook,OKC,28530608.0,508,PG,29,80,80,2914,757,1687,0.449,97,326,0.298,660,1361,0.485,0.477,417,566,0.737,152,652,804,820,147,20,381,200,2028
7,Mike Conley,MEM,28530608.0,106,PG,30,12,12,373,64,168,0.381,24,77,0.312,40,91,0.44,0.452,53,66,0.803,0,27,27,49,12,3,18,24,205
8,James Harden,HOU,28299399.0,194,SG,28,72,72,2551,651,1449,0.449,265,722,0.367,386,727,0.531,0.541,624,727,0.858,41,348,389,630,126,50,315,169,2191
9,DeMar DeRozan,TOR,27739975.0,130,SG,28,80,80,2711,645,1413,0.456,89,287,0.31,556,1126,0.494,0.488,461,559,0.825,59,256,315,417,85,22,175,151,1840


In [9]:
missing = pd.DataFrame(data.isnull().sum())
missing.columns = ['% of missing']
missing['% of missing'] = missing['% of missing']/data.shape[0]
missing[missing['% of missing']>0]

Unnamed: 0,% of missing
FG%,0.004107
3P%,0.090349
2P%,0.024641
eFG%,0.004107
FT%,0.073922


### Interactive Cross Filter of Player Salary, FG% and FT%.

- Brushing on one histogram, affects the other histogram.

In [10]:
brush = alt.selection(type='interval', encodings=['x'])

# Define the base chart, with the common parts of the
# background and highlights
base = alt.Chart().mark_bar().encode(
    x=alt.X(alt.repeat('column'), type='quantitative', bin=alt.Bin(maxbins=30)),
    y='count()'
).properties(
    width=250,
    height=250
)

# blue background with selection
background = base.add_selection(brush)

# yellow highlights on the transformed data
highlight = base.encode(
    color=alt.value('goldenrod')
).transform_filter(brush)

# layer the two charts & repeat
alt.layer(
    background,
    highlight,
    data=data
).repeat(column=["season17_18", "FG%", "FT%"])

### Cross filter of any columns

In [11]:
def cross_filter(column1='FG%',column2='Age',column3='3P'):
    brush = alt.selection(type='interval', encodings=['x'])

    # Define the base chart, with the common parts of the
    # background and highlights
    base = alt.Chart().mark_bar().encode(
        x=alt.X(alt.repeat('column'), type='quantitative', bin=alt.Bin(maxbins=30)),
        y='count()'
    ).properties(
        width=250,
        height=250
    )

    # blue background with selection
    background = base.add_selection(brush)

    # yellow highlights on the transformed data
    highlight = base.encode(
        color=alt.value('goldenrod')
    ).transform_filter(brush)

    # layer the two charts & repeat
    plot = alt.layer(
        background,
        highlight,
        data=data
    ).repeat(column=[column1,column2,column3])
    return plot

In [12]:
interactive = interact(cross_filter,column1=list(data.columns[2:]),column2=list(data.columns[2:]),column3=list(data.columns[2:]))

interactive(children=(Dropdown(description='column1', index=9, options=('season17_18', 'Rk', 'Pos', 'Age', 'G'…