## Session Objectives
<ul>
    <li>Introduction to Plotly</li>
    <li>Scatter Plots</li>
    <li>Line Charts</li>
    <li>Bar Plot</li>
    <li>Bubble Plot</li>
    <li>Box Plot</li>
    <li>Histograms</li>
    <li>Distplots</li>
    <li>Heatmaps</li>
</ul>

## 0. Installing Plotly

<p>Use the Conda install plotly command. Visit the following link<Br>
    <a href="https://anaconda.org/plotly/plotly">https://anaconda.org/plotly/plotly</a></p>

## 1. Introduction to Plotly

<p>
    1. Until now we did visualisations using Matplotlib, Seaborn and Pandas. All of them produce
    static image files.<br><br>
    2. Plotly is company based out in Canada famous for it's products like Plotly and Dash<br><br>
    3. Plotly creates interactive visualisations in the form of HTML files<br><br>
    4. Drawback- can't work with a live data source<br><br>
    5. Dash is used to create live data based dashboards.
</p>

In [1]:
import numpy as np
import pandas as pd
import plotly.offline as pyo
import plotly.graph_objs as go

In [2]:
match=pd.read_csv('matches.csv')
delivery=pd.read_csv('deliveries.csv')

ipl=delivery.merge(match,left_on='match_id',right_on='id')
ipl.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,non_striker,bowler,is_super_over,...,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,1,DA Warner,S Dhawan,TS Mills,0.0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,2,DA Warner,S Dhawan,TS Mills,0.0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
2,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,3,DA Warner,S Dhawan,TS Mills,0.0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
3,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,4,DA Warner,S Dhawan,TS Mills,0.0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
4,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,5,DA Warner,S Dhawan,TS Mills,0.0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,


## 1. Scatter Plots

<img src="https://www.mathsisfun.com/data/images/scatter-ice-cream1.svg"/>

In [3]:
# Scatter plots are drawn between to continous variables
# Problem :- We are going to draw a scatter plot between Batsman Avg(X axis) and
# Batsman Strike Rate(Y axis) of the top 50 batsman in IPL(All time)


In [4]:
# Avg vs SR graph of Top 50 batsman(in terms of total runs)

# Fetching a new dataframe with Top 50 batsman
top50=ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(50).index.tolist()
new_ipl=ipl[ipl['batsman'].isin(top50)]


In [26]:
# Calculating SR
# SR=[(number of runs scored)/(number of balls played)]*100
runs=new_ipl.groupby('batsman')['batsman_runs'].sum()
balls=new_ipl.groupby('batsman')['batsman_runs'].count()

sr=(runs/balls)*100

sr=sr.reset_index()
sr.head()

Unnamed: 0,batsman,batsman_runs
0,A Symonds,600.0
1,AB de Villiers,600.0
2,AC Gilchrist,600.0
3,AM Rahane,600.0
4,AT Rayudu,600.0


In [27]:
# Calculating Avg
# Avg=(Total number of Runs)/(Number of outs)

# Calculating number of outs for top 50 batsman
out=ipl[ipl['player_dismissed'].isin(top50)]

nouts=out['player_dismissed'].value_counts()

avg=runs/nouts

avg=avg.reset_index()
avg.rename(columns={'index':'batsman',0:'avg'},inplace=True)

avg=avg.merge(sr,on='batsman')
avg.head()

Unnamed: 0,batsman,avg,batsman_runs
0,A Symonds,9.111111,600.0
1,AB de Villiers,7.963636,600.0
2,AC Gilchrist,7.36,600.0
3,AM Rahane,3.735849,600.0
4,AT Rayudu,5.04,600.0


In [7]:
# Plot Scatter Plot here
trace=go.Scatter(x=avg['avg'],y=avg['batsman_runs'],mode='markers',text=avg['batsman'],marker={'color':'#00a65a','size':16})

data=[trace]

layout=go.Layout(title='Batsman Avg vs SR',xaxis={'title':'Batsman Average'},yaxis={'title':'Batsman_strikerate'})

fig=go.Figure(data=data,layout=layout)

pyo.plot(fig,filename='myfile.html')


plotly.graph_objs._figure.Figure

## 2. Line Chart

<p>It's an extension of Scatter plot. Usually used to show a time series data</p>
<img src='https://apexcharts.com/wp-content/uploads/2018/01/basic-line-chart.svg'/>

In [28]:
# Year by Year batsman performance

single=ipl[ipl['batsman']=='V Kohli']
performance=single.groupby('season')['batsman_runs'].sum().reset_index()
performance

single1=ipl[ipl['batsman']=='MS Dhoni']
performance1=single1.groupby('season')['batsman_runs'].sum().reset_index()
performance1.head()

Unnamed: 0,season,batsman_runs
0,2008,414.0
1,2009,332.0
2,2010,287.0
3,2011,392.0
4,2012,357.0


In [9]:
# Plot Line Chart here
trace=go.Scatter(x=performance['season'],y=performance['batsman_runs'],mode='lines+markers',marker={'color':'#00a65a','size':16},name='V kohli')

trace1=go.Scatter(x=performance1['season'],y=performance1['batsman_runs'],mode='lines+markers',marker={'size':16},name='MS DHONI')
data=[trace,trace1]
layout=go.Layout(title='Year by year performance',xaxis={'title':'season'},yaxis={'title':'total runs'})
fig=go.Figure(data=data,layout=layout)
pyo.plot(fig)


'temp-plot.html'

In [10]:
# Multiple Line Charts

def batsman_comp(*name):
    data=[]
    for i in name:
        single=ipl[ipl['batsman']==i]
        performance=single.groupby('season')['batsman_runs'].sum().reset_index()

        trace=go.Scatter(x=performance['season'],y=performance['batsman_runs']
                         ,mode='lines + markers',name=i)
        
        data.append(trace)
    
    layout=go.Layout(title='Batsman Record Comparator',
                xaxis={'title':'Season'},
                yaxis={'title':'Runs'})

    fig=go.Figure(data=data,layout=layout)

    pyo.plot(fig,filename='year_by_year')
        
        

In [11]:
batsman_comp('V Kohli', 'RG Sharma','DA Warner','MS Dhoni')


Your filename `year_by_year` didn't end with .html. Adding .html to the end of your file.



## 3. Bar Plot

<p>Used to show relation between one categorical and 1 numerical data</p>
<img src="https://images.ctfassets.net/fevtq3bap7tj/5FSJrJeDIIGAmGCsGcQ8S4/e2fc867a487614b47f72104a36fbcf7f/simple-column.png"/>

In [12]:
top10=ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(10).index.tolist()
top10_df=ipl[ipl['batsman'].isin(top10)]

In [13]:
top10_score=top10_df.groupby('batsman')['batsman_runs'].sum().reset_index()
top10_score

Unnamed: 0,batsman,batsman_runs
0,CH Gayle,2648.0
1,G Gambhir,2959.0
2,JH Kallis,2252.0
3,MS Dhoni,2388.0
4,RG Sharma,2795.0
5,RV Uthappa,2336.0
6,S Dhawan,2254.0
7,SK Raina,3163.0
8,SR Tendulkar,2334.0
9,V Kohli,2516.0


In [14]:
# Plot Bar Graph

trace=go.Bar(x=top10_score['batsman'],y=top10_score['batsman_runs'])
data=[trace]
layout=go.Layout(title='Top 10 IPL Batsman',xaxis={'title':'batsman'},yaxis={'title':'batsman runs'})
fig=go.Figure(data=data,layout=layout)

pyo.plot(fig)

'temp-plot.html'

### There are 2 types of Bar Graphs
<p>
    1. Nested Bar Graph<Br>
    2. Stacked Bar Graph<br>
    3. Overlayed Bar Graph
</p>

In [15]:
iw=top10_df.groupby(['batsman','inning'])['batsman_runs'].sum().reset_index()
mask=iw['inning']==1
mask2=iw['inning']==2
one=iw[mask]
two=iw[mask2]


one.rename(columns={'batsman_runs':'1st Innings'},inplace=True)
two.rename(columns={'batsman_runs':'2nd Innings'},inplace=True)

final=one.merge(two,on='batsman')[['batsman','1st Innings','2nd Innings']]

final



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



Unnamed: 0,batsman,1st Innings,2nd Innings
0,CH Gayle,1518.0,1105.0
1,G Gambhir,1314.0,1645.0
2,JH Kallis,795.0,1457.0
3,MS Dhoni,1607.0,781.0
4,RG Sharma,1592.0,1203.0
5,RV Uthappa,989.0,1347.0
6,S Dhawan,1498.0,756.0
7,SK Raina,1975.0,1180.0
8,SR Tendulkar,1237.0,1097.0
9,V Kohli,1294.0,1217.0


In [16]:
trace1=go.Bar(x=final['batsman'],y=final['1st Innings'], name='1st innings',
              marker={'color':'#00a65a'})

trace2=go.Bar(x=final['batsman'],y=final['2nd Innings'], name='2nd innings',
              marker={'color':'#a6a65a'})

data=[trace1,trace2]

layout=go.Layout(title='Inning wise Scores',
                xaxis={'title':'Batsman'},
                yaxis={'title':'Runs'})

fig=go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

## 4. Bubble Plot
<p>Again an extension of Scatter plot. with some additional informations</p>
<img src="https://www.data-to-viz.com/graph/bubble_files/figure-html/unnamed-chunk-1-1.png"/>

In [17]:
new_ipl=new_ipl[new_ipl['batsman_runs']==6]

six=new_ipl.groupby('batsman')['batsman_runs'].count().reset_index()

x=avg.merge(six,on='batsman')

trace=go.Scatter(x=x['avg'],y=x['batsman_runs_x'],mode='markers',marker={'size':x['batsman_runs_y']})

data=[trace]

layout=go.Layout(title='Bubble chart',
                xaxis={'title':'average'},
                yaxis={'title':'strike-rate'})

fig=go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

## 5. Box Plot

<p>A box and whisker plot—also called a box plot—displays the five-number summary of a set of data.</p>
<img src="https://miro.medium.com/max/18000/1*2c21SkzJMf3frPXPAR_gZA.png"/>

In [29]:
match_agg=delivery.groupby(['match_id'])['total_runs'].sum().reset_index()
season_wise=match_agg.merge(match,left_on='match_id',right_on='id')[['match_id','total_runs','season']]
season_wise.head()

Unnamed: 0,match_id,total_runs,season
0,1,379.0,2017
1,2,371.0,2017
2,3,367.0,2017
3,4,327.0,2017
4,5,299.0,2017


In [19]:
# Plot Box Plot
trace1=go.Box(x=season_wise[season_wise['season']==2017]['total_runs'],name='2017',marker={'color':'#00a65a'})
trace2=go.Box(x=season_wise[season_wise['season']==2008]['total_runs'],name='2008')

data=[trace1,trace2]
layout=go.Layout(title='Total Score Analysis',
                xaxis={'title':'Total score'})
fig=go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

### 6. Histograms

<p>A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data.</p>

<img src="https://www.math-only-math.com/images/histogram-problems.png"/>

In [30]:
x=delivery.groupby('batsman')['batsman_runs'].count()>150
x=x[x].index.tolist()

new=delivery[delivery['batsman'].isin(x)]


runs=new.groupby('batsman')['batsman_runs'].sum()
balls=new.groupby('batsman')['batsman_runs'].count()

sr=(runs/balls)*100

sr=sr.reset_index()
sr.head()

Unnamed: 0,batsman,batsman_runs
0,A Mishra,93.103448
1,A Symonds,124.711908
2,AA Jhunjhunwala,99.541284
3,AB Agarkar,111.875
4,AB de Villiers,131.343284


In [21]:
# Plot Histogram
trace=go.Histogram(x=sr['batsman_runs'],xbins={'size':2,'start':50,'end':100})
data=[trace]
layout=go.Layout(title='Strike rate analysis',
                xaxis={'title':'Strike Rates'})

fig=go.Figure(data=data,layout=layout)

pyo.plot(fig)


'temp-plot.html'

### 7. Distplots

<p></p>
<img src="https://plot.ly/~PythonPlotBot/10/customized-distplot.png"/>

In [22]:
# Plot Distplot
 
import plotly.figure_factory as ff

hist_data=[avg['avg'],avg['batsman_runs']]

group_labels=['Average', 'Strike Rate']

fig=ff.create_distplot(hist_data,group_labels,bin_size=[10,20])

pyo.plot(fig)


'temp-plot.html'

### 8. Heatmaps

<p>A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors.</p>

<img src="https://seaborn.pydata.org/_images/heatmap_annotation.png"/>

In [23]:
six=delivery[delivery['batsman_runs']==6]
six=six.groupby(['batting_team','over'])['batsman_runs'].count().reset_index()

dots=delivery[delivery['batsman_runs']==0]
dots=dots.groupby(['batting_team','over'])['batsman_runs'].count().reset_index()

In [24]:
# Plot Heatmap

trace=go.Heatmap(x=six['batting_team'],y=six['over'],z=six['batsman_runs'])

data=[trace]

layout=go.Layout(title='Six heatmap')

fig=go.Figure(data=data,layout=layout)

pyo.plot(fig)

'temp-plot.html'

In [25]:
# Side by Side Heatmap

from plotly import tools

trace1=go.Heatmap(x=six['batting_team'],y=six['over'],
                 z=six['batsman_runs'].values.tolist())

trace2=go.Heatmap(x=dots['batting_team'],y=dots['over'],
                 z=dots['batsman_runs'].values.tolist())


fig=tools.make_subplots(rows=1,cols=2,subplot_titles=["6's","0's"], shared_yaxes=True)

fig.append_trace(trace1,1,1)
fig.append_trace(trace2,1,2)

pyo.plot(fig)



plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



'temp-plot.html'