# Installing Plotly

<p>Use the Conda install plotly command. Visit the following link<Br>
    <a href="https://anaconda.org/plotly/plotly">https://anaconda.org/plotly/plotly</a></p>

# Introduction to Plotly

<p>
    1. Until now we did visualisations using Matplotlib, Seaborn and Pandas. All of them produce
    static image files.<br><br>
    2. Plotly is company based out in Canada famous for it's products like Plotly and Dash<br><br>
    3. Plotly creates interactive visualisations in the form of HTML files<br><br>
    4. Drawback- can't work with a live data source<br><br>
    5. Dash is used to create live data based dashboards.
</p>

In [1]:
import numpy as np
import pandas as pd
import plotly.offline as pyo
import plotly.graph_objs as go

In [3]:
match=pd.read_csv('Dataset/matches.csv')
delivery=pd.read_csv('Dataset/deliveries.csv')

ipl=delivery.merge(match,left_on='match_id',right_on='id')
ipl.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,non_striker,bowler,is_super_over,...,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,1,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,2,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
2,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,3,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
3,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,4,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
4,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,5,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,


# Scatter Plots

<img src="https://www.mathsisfun.com/data/images/scatter-ice-cream1.svg"/>

Scatter plots are drawn between to continous variables

### Problem :- 
We are going to draw a scatter plot between Batsman Avg(X axis) and Batsman Strike Rate(Y axis) of the top 50 batsman in IPL(All time)

In [4]:
# Fetching a new dataframe with Top 50 batsman
top50=ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(50).index.tolist()
new_ipl=ipl[ipl['batsman'].isin(top50)]

In [5]:
# Calculating Strike Rate (SR)
# SR=[(number of runs scored)/(number of balls played)]*100
runs=new_ipl.groupby('batsman')['batsman_runs'].sum()
balls=new_ipl.groupby('batsman')['batsman_runs'].count()

sr=(runs/balls)*100

sr=sr.reset_index()
sr

Unnamed: 0,batsman,batsman_runs
0,AB de Villiers,145.129059
1,AC Gilchrist,133.054662
2,AJ Finch,126.299213
3,AM Rahane,117.486549
4,AT Rayudu,123.014257
5,BB McCullum,126.318203
6,BJ Hodge,121.422376
7,CH Gayle,144.194313
8,DA Miller,137.709251
9,DA Warner,138.318401


In [6]:
# Calculating Avgegrage
# Avg=(Total number of Runs)/(Number of outs)

# Calculating number of outs for top 50 batsman
out=ipl[ipl['player_dismissed'].isin(top50)]

nouts=out['player_dismissed'].value_counts()

avg=runs/nouts

avg=avg.reset_index()
avg.rename(columns={'index':'batsman',0:'avg'},inplace=True)

avg=avg.merge(sr,on='batsman')
avg

Unnamed: 0,batsman,avg,batsman_runs
0,AB de Villiers,38.307692,145.129059
1,AC Gilchrist,27.223684,133.054662
2,AJ Finch,27.186441,126.299213
3,AM Rahane,33.593407,117.486549
4,AT Rayudu,27.146067,123.014257
5,BB McCullum,28.112245,126.318203
6,BJ Hodge,33.333333,121.422376
7,CH Gayle,41.022472,144.194313
8,DA Miller,34.733333,137.709251
9,DA Warner,40.14,138.318401


## To plot using plotly,

We use a plot function with <b>figure object</b>.

And to make object of figure we need <b>data</b> and <b>layout</b>.

<b>data</b> is a <b>list</b>, we can add multiple or single trace.

<b>trace</b>, when we plot one graph it is known as trace
When we need to show two things in a graph, we need 2 trace.

<b>layout</b> is used to define other properties of plot like, title, x-axis label, y-axis label, background.

In [17]:
# Plot Scatter Plot here

trace = go.Scatter(x=avg['avg'], 
                   y=avg['batsman_runs'], 
                   mode='markers')

data = [trace]

layout = go.Layout(title='Batsman Average vs Strike Rate',
                  xaxis = {'title': 'Runs'},
                  yaxis = {'title': 'Strike Rate'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

<b>To get the names of batsman in above graph</b>

In [19]:
# Plot Scatter Plot here

trace = go.Scatter(x=avg['avg'],
                   y=avg['batsman_runs'], 
                   mode='markers',
                   text = avg['batsman'])

data = [trace]

layout = go.Layout(title='Batsman Average vs Strike Rate',
                  xaxis = {'title': 'Runs'},
                  yaxis = {'title': 'Strike Rate'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

<b>To plot yellow points instead of blue</b>
(Use hexadecimal code in color)</b>

In [21]:
# Plot Scatter Plot here

trace = go.Scatter(x=avg['avg'], 
                   y=avg['batsman_runs'], 
                   mode='markers',
                   text = avg['batsman'],
                   marker = {'color': '#00a65a'})

data = [trace]

layout = go.Layout(title='Batsman Average vs Strike Rate',
                  xaxis = {'title': 'Runs'},
                  yaxis = {'title': 'Strike Rate'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

<b>We can also increase or decrease size of marker</b>

In [23]:
# Plot Scatter Plot here

trace = go.Scatter(x=avg['avg'], 
                   y=avg['batsman_runs'], 
                   mode='markers',
                   text = avg['batsman'],
                   marker = {'color': '#00a65a',
                             'size': 16})

data = [trace]

layout = go.Layout(title='Batsman Average vs Strike Rate',
                  xaxis = {'title': 'Runs'},
                  yaxis = {'title': 'Strike Rate'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

<b>To save a plot with a given name.</b>

In [25]:
# Plot Scatter Plot here

trace = go.Scatter(x=avg['avg'], 
                   y=avg['batsman_runs'], 
                   mode='markers',
                   text = avg['batsman'],
                   marker = {'color': '#00a65a',
                             'size': 16})

data = [trace]

layout = go.Layout(title='Batsman Average vs Strike Rate',
                  xaxis = {'title': 'Runs'},
                  yaxis = {'title': 'Strike Rate'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig,
         filename='plot1.html')

'plot1.html'

# Line Chart

<p>It's an extension of Scatter plot. Usually used to show a time series data</p>
<img src='https://apexcharts.com/wp-content/uploads/2018/01/basic-line-chart.svg'/>

### Problem:
Year by Year batsman performance.

In [47]:
single1=ipl[ipl['batsman']=='V Kohli']
performance1=single.groupby('season')['batsman_runs'].sum().reset_index()

single2=ipl[ipl['batsman']=='MS Dhoni']
performance2=single2.groupby('season')['batsman_runs'].sum().reset_index()

In [39]:
# Plot Of Line Chart

trace = go.Scatter(x=performance1['season'],
                   y=performance1['batsman_runs'],
                   mode='lines',
                   marker={'color': '#a0065a'})

data=[trace]

layout = go.Layout(title='Year by Year Performance',
                   xaxis={'title': 'Year'},
                   yaxis={'title': 'Runs'})

fig=go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Year by Year Performane.html')

'Year by Year Performane.html'

<b>To plot markers along with the line.</b>

In [40]:
# Plot Of Line Chart

trace = go.Scatter(x=performance1['season'],
                   y=performance1['batsman_runs'],
                   mode='lines+markers',
                   marker={'color': '#a0065a'})

data=[trace]

layout = go.Layout(title='Year by Year Performance',
                   xaxis={'title': 'Year'},
                   yaxis={'title': 'Runs'})

fig=go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Year by Year Performane.html')

'Year by Year Performane.html'

### Showing two plots using two trace:

In [49]:
# Plot Of Line Chart

trace1 = go.Scatter(x=performance1['season'],
                   y=performance1['batsman_runs'],
                   mode='lines+markers',
                   marker={'color': '#a0065a'},
                   name='Virat Kholi')

trace2 = go.Scatter(x=performance2['season'],
                   y=performance2['batsman_runs'],
                   mode='lines+markers',
                   marker={'color': 'red'},
                   name='MS Dhoni')

data=[trace1, trace2]

layout = go.Layout(title='Year by Year Performance',
                   xaxis={'title': 'Year'},
                   yaxis={'title': 'Runs'})

fig=go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Year by Year Performane.html')

'Year by Year Performane.html'

### Problem:
Using function to print multiples batsman performance

In [58]:
# Multiple Line Charts

def batsman_comp(*name):
    data=[]
    for i in name:
        single=ipl[ipl['batsman']==i]
        performance=single.groupby('season')['batsman_runs'].sum().reset_index()

        trace=go.Scatter(x=performance['season'],y=performance['batsman_runs']
                         ,mode='lines + markers',name=i)
        
        data.append(trace)
    
    layout=go.Layout(title='Batsman Record Comparator',
                xaxis={'title':'Season'},
                yaxis={'title':'Runs'})

    fig=go.Figure(data=data,layout=layout)

    pyo.plot(fig,filename='year_by_year.html')
        
        

In [59]:
batsman_comp('V Kohli', 'RG Sharma','DA Warner','MS Dhoni')

# Bar Plot

<p>Used to show relation between one categorical and 1 numerical data.</p>
<img src="https://images.ctfassets.net/fevtq3bap7tj/5FSJrJeDIIGAmGCsGcQ8S4/e2fc867a487614b47f72104a36fbcf7f/simple-column.png"/>

### Problem:
Barplot for total score of top 10 Batsman.

In [52]:
top10=ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(10).index.tolist()
top10_df=ipl[ipl['batsman'].isin(top10)]

In [53]:
top10_score=top10_df.groupby('batsman')['batsman_runs'].sum().reset_index()
top10_score

Unnamed: 0,batsman,batsman_runs
0,AB de Villiers,3486
1,CH Gayle,3651
2,DA Warner,4014
3,G Gambhir,4132
4,MS Dhoni,3560
5,RG Sharma,4207
6,RV Uthappa,3778
7,S Dhawan,3561
8,SK Raina,4548
9,V Kohli,4423


In [57]:
# Plot Bar Graph

trace = go.Bar(x=top10_score['batsman'],
               y=top10_score['batsman_runs'])

data = [trace]

layout = go.Layout(title='Top 10 Batsman',
                   xaxis={'title':'Batsman'},
                   yaxis={'title':'Total Runs'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename="Top 10 Batsman.html")

'Top 10 Batsman.html'

## There are 3 types of Bar Graphs
<p>
    1. Nested Bar Graph<Br>
    2. Stacked Bar Graph<br>
    3. Overlayed Bar Graph
</p>

In [61]:
iw=top10_df.groupby(['batsman','inning'])['batsman_runs'].sum().reset_index()
mask=iw['inning']==1
mask2=iw['inning']==2
one=iw[mask]
two=iw[mask2]


one.rename(columns={'batsman_runs':'1st Innings'},inplace=True)
two.rename(columns={'batsman_runs':'2nd Innings'},inplace=True)

final=one.merge(two,on='batsman')[['batsman','1st Innings','2nd Innings']]

final



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,batsman,1st Innings,2nd Innings
0,AB de Villiers,2128,1345
1,CH Gayle,2003,1623
2,DA Warner,2118,1896
3,G Gambhir,1699,2433
4,MS Dhoni,2232,1328
5,RG Sharma,2344,1863
6,RV Uthappa,1516,2262
7,S Dhawan,2262,1299
8,SK Raina,2647,1893
9,V Kohli,2391,2027


## Overlayed Bargraph

In [67]:
trace1 = go.Bar(x=final['batsman'],
                y=final['1st Innings'],
                name='1st Innings')

trace2 = go.Bar(x=final['batsman'],
                y=final['2nd Innings'],
                name='2nd Innings')

data = [trace1, trace2]

layout = go.Layout(title = 'Innings wise score of top 10 batsman',
                   xaxis = {'title': 'Batsman'},
                   yaxis = {'title': 'Runs'},
                   barmode='overlay')

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Innings wise score of top 10 batsman (Overlayed).html')

'Innings wise score of top 10 batsman (Overlayed).html'

## Stacked Bargraph
In this 1st innings score shown and then 2nd innings score stacked over it, i.e (1st inning + 2nd inning score) is shown in bargraph.

In [66]:
trace1 = go.Bar(x=final['batsman'],
                y=final['1st Innings'],
                name='1st Innings')

trace2 = go.Bar(x=final['batsman'],
                y=final['2nd Innings'],
                name='2nd Innings')

data = [trace1, trace2]

layout = go.Layout(title = 'Innings wise score of top 10 batsman',
                   xaxis = {'title': 'Batsman'},
                   yaxis = {'title': 'Runs'},
                   barmode='stack')

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Innings wise score of top 10 batsman (Stacked).html')

'Innings wise score of top 10 batsman (Stacked).html'

## Nested Bargraph
We need not to pass barmode, by default it's stacked.

In [68]:
trace1 = go.Bar(x=final['batsman'],
                y=final['1st Innings'],
                name='1st Innings')

trace2 = go.Bar(x=final['batsman'],
                y=final['2nd Innings'],
                name='2nd Innings')

data = [trace1, trace2]

layout = go.Layout(title = 'Innings wise score of top 10 batsman',
                   xaxis = {'title': 'Batsman'},
                   yaxis = {'title': 'Runs'},)

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Innings wise score of top 10 batsman (Nested).html')

'Innings wise score of top 10 batsman (Nested).html'

# Bubble Plot
<p>Again an extension of Scatter plot. with some additional informations i.e:
    1. Introducing 3rd axis with diameter of the point or the circle(3rd column as a paramter)(numerical data).
    2. Introducing 4th axis using color.(categorical data).
</p>
<img src="https://www.data-to-viz.com/graph/bubble_files/figure-html/unnamed-chunk-1-1.png"/>

In [73]:
new_ipl=new_ipl[new_ipl['batsman_runs']==6]

six=new_ipl.groupby('batsman')['batsman_runs'].count().reset_index()

x=avg.merge(six,on='batsman')

x.head()

Unnamed: 0,batsman,avg,batsman_runs_x,batsman_runs_y
0,AB de Villiers,38.307692,145.129059,158
1,AC Gilchrist,27.223684,133.054662,92
2,AJ Finch,27.186441,126.299213,59
3,AM Rahane,33.593407,117.486549,60
4,AT Rayudu,27.146067,123.014257,79


In [79]:
# Plot Bubble chart here
trace = go.Scatter(x=x['avg'], 
                   y=x['batsman_runs_x'],
                   mode='markers',
                   marker={'size': x['batsman_runs_y']})

data = [trace]

layout = go.Layout(title='Bubble Chart',
                   xaxis={'title': 'Average'},
                   yaxis={'title': 'Strike Rate'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Bubble Chart.html')

'Bubble Chart.html'

# Box Plot

<p>A box and whisker plot—also called a box plot—displays the five-number summary of a set of data.</p>
<img src="https://miro.medium.com/max/18000/1*2c21SkzJMf3frPXPAR_gZA.png"/>

### Problem:
Plot box plot on Match wise total score

In [82]:
match_agg=delivery.groupby(['match_id'])['total_runs'].sum().reset_index()
season_wise=match_agg.merge(match,left_on='match_id',right_on='id')[['match_id','total_runs','season']]
season_wise

Unnamed: 0,match_id,total_runs,season
0,1,379,2017
1,2,371,2017
2,3,367,2017
3,4,327,2017
4,5,299,2017
...,...,...,...
631,632,277,2016
632,633,317,2016
633,634,302,2016
634,635,325,2016


In [84]:
# Plot Box Plot here
trace = go.Box(x=season_wise['total_runs'], 
               name='All Seasons')

data = [trace]

layout = go.Layout(title = 'Total Score Analysis',
                   xaxis = {'title': 'Total Score'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Boxplot.html')

'Boxplot.html'

<b>How to change color of box plot.</b>

In [87]:
# Plot Box Plot here
trace = go.Box(x=season_wise['total_runs'], 
               name='All Seasons',
               marker={'color': '#00a65a'})

data = [trace]

layout = go.Layout(title = 'Total Score Analysis',
                   xaxis = {'title': 'Total Score'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Boxplot.html')

'Boxplot.html'

### Two boxplot side by side:
1st: 2017 Season

2nd: 2008 Season

In [91]:
trace1 = go.Box(x=season_wise[season_wise['season'] == 2017]['total_runs'], 
               name='2017',
               marker={'color': '#00a65a'})

trace2 = go.Box(x=season_wise[season_wise['season'] == 2008]['total_runs'], 
               name='2008')

data = [trace1, trace2]

layout = go.Layout(title = 'Total Score Analysis',
                   xaxis = {'title': 'Total Score'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig, filename='Boxplot.html')

'Boxplot.html'

# Histograms

<p>A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data.</p>

<img src="https://www.math-only-math.com/images/histogram-problems.png"/>

In [103]:
x=delivery.groupby('batsman')['batsman_runs'].count()>150
x=x[x].index.tolist()

new=delivery[delivery['batsman'].isin(x)]


runs=new.groupby('batsman')['batsman_runs'].sum()
balls=new.groupby('batsman')['batsman_runs'].count()

sr=(runs/balls)*100

sr=sr.reset_index()
sr.head()

Unnamed: 0,batsman,batsman_runs
0,A Ashish Reddy,142.857143
1,A Mishra,89.756098
2,A Symonds,124.711908
3,AA Jhunjhunwala,99.541284
4,AB Agarkar,111.875


In [106]:
# Plot Histogram
trace = go.Histogram(x=sr['batsman_runs'])

data = [trace]

layout = go.Layout(title='Strike Rate Analysis',
                   xaxis={'title': 'Strike Rates'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

### How to change binsize?

In [108]:
# Plot Histogram
trace = go.Histogram(x=sr['batsman_runs'], 
                     xbins={'size': 2})

data = [trace]

layout = go.Layout(title='Strike Rate Analysis',
                   xaxis={'title': 'Strike Rates'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

### We can also define range of data to be used
For example we only need batsman whose strike rate is between 50 and 150

In [112]:
# Plot Histogram
trace = go.Histogram(x=sr['batsman_runs'], 
                     xbins={'size': 5,
                            'start': 50,
                            'end': 150})

data = [trace]

layout = go.Layout(title='Strike Rate Analysis',
                   xaxis={'title': 'Strike Rates'})

fig = go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'


# Distplots

<p>It is a combination of 3 plots:</p>
    1. Histogram
    2. KDE
    3. Rugplot
<img src="https://plot.ly/~PythonPlotBot/10/customized-distplot.png"/>

In [109]:
import plotly.figure_factory as ff
avg.head()

Unnamed: 0,batsman,avg,batsman_runs
0,AB de Villiers,38.307692,145.129059
1,AC Gilchrist,27.223684,133.054662
2,AJ Finch,27.186441,126.299213
3,AM Rahane,33.593407,117.486549
4,AT Rayudu,27.146067,123.014257


### For 1 Column

In [99]:
hist_data=[avg['avg']]

group_labels=['Average']

fig=ff.create_distplot(hist_data,
                       group_labels)

pyo.plot(fig)

'temp-plot.html'

### For 2 Column

In [100]:
hist_data=[avg['avg'], 
           avg['batsman_runs']]

group_labels=['Average', 
              'Strike Rate']

fig=ff.create_distplot(hist_data,
                       group_labels)

pyo.plot(fig)

'temp-plot.html'

### How to modify binsize

In [115]:
hist_data=[avg['avg'], 
           avg['batsman_runs']]

group_labels=['Average', 
              'Strike Rate']

fig=ff.create_distplot(hist_data,
                       group_labels,
                       bin_size=[10,20])

pyo.plot(fig)

'temp-plot.html'

# Heatmaps

<p>A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors.</p>

<img src="https://seaborn.pydata.org/_images/heatmap_annotation.png"/>

In [113]:
six=delivery[delivery['batsman_runs']==6]
six=six.groupby(['batting_team','over'])['batsman_runs'].count().reset_index()

six.head()

Unnamed: 0,batting_team,over,batsman_runs
0,Chennai Super Kings,1,9
1,Chennai Super Kings,2,21
2,Chennai Super Kings,3,49
3,Chennai Super Kings,4,45
4,Chennai Super Kings,5,53


In [117]:
# Plot Heatmap

trace=go.Heatmap(x=six['batting_team'],
                 y=six['over'],
                 z=six['batsman_runs'])

data=[trace]

layout=go.Layout(title='Six Heatmap')

fig=go.Figure(data=data, layout=layout)

pyo.plot(fig)

'temp-plot.html'

### Side By Side Heatmap

In [119]:
dots=delivery[delivery['batsman_runs']==0]
dots=dots.groupby(['batting_team','over'])['batsman_runs'].count().reset_index()

from plotly import tools

trace1=go.Heatmap(x=six['batting_team'],
                  y=six['over'],
                  z=six['batsman_runs'].values.tolist())

trace2=go.Heatmap(x=dots['batting_team'],
                  y=dots['over'],
                  z=dots['batsman_runs'].values.tolist())


fig=tools.make_subplots(rows=1,
                        cols=2,
                        subplot_titles=["6's","0's"], 
                        shared_yaxes=True)

fig.append_trace(trace1,1,1)
fig.append_trace(trace2,1,2)

pyo.plot(fig)



plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



'temp-plot.html'