<a href="https://www.kaggle.com/code/itsabhijith/ipl-analysis-plotly-eda?scriptVersionId=103204409" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<h2 style="font-family:verdana;text-align:center"> IPL Dataset Analysis</h2>
<hr>
<p style="font-family:verdana;font-size:18px;"> 
    Hey there!! I love cricket and wanted to provide my insights on this dataset. This will be an EDA notebook as I've been meaning to experiment with Plotly.<br>
    Before proceeding further, let me walk you through my approach. There are two angles that I'm considering to bring about my analysis.<br>
    <b> 1. Identify trends pertaining to each team.</b> Eg: Team Combinations, Performance in the powerplay, etc<br>
    <b> 2. Identify trends that occurred during the tournament</b> Eg: Decision of winning the toss, Average first innings scores, etc<br><br> 
    Also, for each topic that I look to cover, I'll have an inference, marked with a '📌' and a conclusion followed by next action represented by a '📢'. I hope to uncover something cool. Let's go!!
</p>
    
   
        

<h3> Importing the required libraries

In [None]:
import pandas as pd
import numpy as np
import plotly
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import cufflinks as cf
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
init_notebook_mode(connected=True)
cf.go_offline(True)

<h3> Starting off with the 2022 season </h3>

In [None]:
df_season_details_2022 = pd.read_csv("../input/indian-premier-league-ipl-all-seasons/2022/season_details.csv")
df_season_summary_2022 = pd.read_csv("../input/indian-premier-league-ipl-all-seasons/2022/season_summary.csv")

In [None]:
df_season_summary_2022.head()

In [None]:
df_season_summary_2022.columns

In [None]:
df_season_summary_2022.iloc[:10,[3,5,6,7,8,9,10,13]]

In [None]:
df_season_summary_2022.info()

<h3> Analysing the impact of the toss on the outcome of the game </h3>

In [None]:
df_toss_bowling = df_season_summary_2022[df_season_summary_2022.decision=="BOWL FIRST"] 
df_toss_batting = df_season_summary_2022[df_season_summary_2022.decision=="BAT FIRST"]

In [None]:
teams_winning_chasing = df_toss_bowling[df_toss_bowling.toss_won == df_toss_bowling.winner]
teams_losing_chasing = df_toss_bowling[df_toss_bowling.toss_won != df_toss_bowling.winner]

len(df_toss_bowling),np.shape(teams_winning_chasing)[0],np.shape(teams_losing_chasing)[0]

In [None]:
teams_winning_batting = df_toss_batting[df_toss_batting.toss_won == df_toss_batting.winner]
teams_losing_batting = df_toss_batting[df_toss_batting.toss_won != df_toss_batting.winner]

len(df_toss_batting),np.shape(teams_winning_batting)[0],np.shape(teams_losing_batting)[0]

In [None]:
toss_decision_labels = list(df_season_summary_2022.decision.value_counts().index)
toss_decision_values = df_season_summary_2022.decision.value_counts()

teams_chasing_labels = ["Games won chasing","Games lost chasing"]
teams_winning_chasing_values = [np.shape(teams_winning_chasing)[0],np.shape(teams_losing_chasing)[0]]

teams_batting_labels = ['Games won batting first','Games lost batting first']
teams_winning_batting_values = [np.shape(teams_winning_batting)[0],np.shape(teams_losing_batting)[0]]

toss_chart_colors = ["#00539CFF","#FFF748"]
fielding_first_chart_colors = ["#1167B1","#D0EFFF"]
batting_first_chart_colors = ["#FFA701","#FFDA00"]

fig = make_subplots(rows=1,cols=3,specs=[[{"type":"pie"},{"type":"pie"},{"type":"pie"}]],subplot_titles=['<b>% of games where team chose to field first</b>',
                                                                                                         '<b>% split after fielding first</b>',
                                                                                                         '<b>% split after batting first</b>'])

fig.add_trace(go.Pie(labels=toss_decision_labels,values=toss_decision_values,pull=[0.2,0],
                     marker=dict(colors=toss_chart_colors)),row=1,col=1)

fig.add_trace(go.Pie(labels=teams_chasing_labels,values=teams_winning_chasing_values,pull=[0.2,0],
                     marker=dict(colors=fielding_first_chart_colors)),row=1,col=2)

fig.add_trace(go.Pie(labels=teams_batting_labels,values=teams_winning_batting_values,pull=[0.2,0],
                     marker=dict(colors=batting_first_chart_colors)),row=1,col=3)

fig.update_traces(hoverinfo='label+value',textfont_size=20,marker=dict(line=dict(color='#000000', width=2)))
fig.show()

<div style="font-family:verdana; font-size:18px"> 
    <p style="background-color:#FFF748;color:#000000"> 📌 Inference: In ~80% of the games (59/74), the team which has won the toss has decided to field first. However, in those 59 matches, the toss winning side has gone on to lose more games than win. Although circumstances vary with each game - different squads, skills, planning, etc, a similar observation can be made for the other scenario involving the decision to bat first.
    </p><br>
    <p>So, the question that arises is, why is there a stark difference in the toss decision?<br>
       - One possible explanation is to factor the start time of the game. Given that a significant proportion of these matches begin in the evening, it is expected that most sides would consider the due factor. This can be easily verified.
    </p>
</div>

In [None]:
df_season_summary_2022['dn_matches'] = df_season_summary_2022.description.apply(lambda x: 1 if x.find("D/N")!=-1 else 0)
df_season_summary_2022.dn_matches.value_counts()

In [None]:
time_of_match_labels = ["Evening Game - Bowl first","Evening Game - Bat first","D/N - Bowl First","D/N - Bat First"]
time_of_match_values = df_season_summary_2022.groupby(['dn_matches'])['decision'].value_counts().reset_index(drop=True)

In [None]:
fig = go.Figure(data=[go.Bar(x=time_of_match_labels,y=time_of_match_values)])
fig.update_traces(textposition="inside",textfont=dict(family="verdana",size=10),marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(title_text='<b>Time of start vs Toss Decision</b>',title_x=0.5)
fig.show()

<div style="font-family:verdana; font-size:18px"> 
    <p style="background-color:#FFF748;color:#000000"> 📌: As indicated above, there is a palpable difference in batting and fielding first for the 8:00 PM kickoff games. Let us dig deeper to check if some teams are better chasers than the rest.
    </p>
</div>

In [None]:
team_toss_split = {}
team_win_split = {}

In [None]:
from collections import Counter

In [None]:
def calculate_wins(x,team):
    if x.winner==team:
        if (x.toss_won==team and x.decision=='BOWL FIRST') or (x.toss_won!=team and x.decision=='BAT FIRST'):
            return 1
        elif (x.toss_won==team and x.decision=='BAT FIRST') or (x.toss_won!=team and x.decision=='BOWL FIRST'):
            return 0
    else:
        return 'lost'

In [None]:
for team in df_season_summary_2022.home_team.unique():
    df_2022_team = df_season_summary_2022[(df_season_summary_2022.home_team==team)|(df_season_summary_2022.away_team==team)]
    bowl_first = dict(Counter(df_2022_team.apply(lambda x: 1 if ((x.toss_won==team and x.decision=='BOWL FIRST') or (x.toss_won!=team and x.decision=='BAT FIRST')) else 0,axis=1)))
    matches_won = dict(Counter(df_2022_team.apply(lambda x: calculate_wins(x,team),axis=1)))

    bowl_first['BOWL FIRST'],bowl_first['BAT FIRST'] = bowl_first[1],bowl_first[0]
    matches_won['BOWL FIRST'],matches_won['BAT FIRST'] = matches_won[1],matches_won[0] 
    del bowl_first[0],bowl_first[1],matches_won[0],matches_won[1],matches_won['lost']
    team_toss_split[team] = bowl_first
    team_win_split[team] = matches_won
    

In [None]:
team_toss_split

In [None]:
team_win_split

In [None]:
team_colors = {'CSK':'#F9CD05','DC':'#EF1B23','GT':'#00008B','RCB':'#32CD32','PBKS':'#ED1B24','KKR':'#800080','LSG':'#7DF9FF','MI':'#1434A4','SRH':'#FF822A','RR':'#FF69B4'}

In [None]:
def plot_subplots(dictionary:dict,title_text:str,plot:str)->None:
    fig = make_subplots(rows=2,cols=5)
    row,col = 1,1
    
    for key,value in dictionary.items():
        if plot=='bar':
            labels = list(value.keys())
            values_toss_decision = list(value.values())
            fig.add_trace(go.Bar(x=labels,y=values_toss_decision,marker=dict(color=team_colors[key]),name=key),row=row,col=col)
         
        elif plot=='box':
            fig.add_trace(go.Box(y=value,name=key,marker=dict(color=team_colors[key])),row=row,col=col)
            
        col+=1
        if col>5:
            col = 1
            row = 2
    fig.update_layout(title_text=f'<b>{title_text}</b>',title_x=0.5)
    fig.show()

In [None]:
plot_subplots(team_toss_split,'Number of games batted and bowled first','bar')

In [None]:
plot_subplots(team_win_split,'Games won for each decision per team','bar')

<div style="background-color:#80d8ff;font-family:verdana; font-size:18px">
    <p>📢 Conclusion: While the decision to field first far outweighs the decision to bat first, certain teams have fared better than the others as they made the most of their opportunities. Therefore, the toss did not have a significant impact on the outcome. For example, GT has chased significantly better than the other sides whereas, teams like RR, LSG, and RCB have won more games batting first.</p>
</div>

<h3> 
    <p> Before looking at specific teams, it is vital to answers some questions pertaining to the common themes across the tournament.<br><br> For instance:<br>  
        1. What was the median target when games were won and lost while batting first and second<br>
        2. How was the performance of teams during the Powerplay in successful chases<br>
        3. How many games were dragged till the final overs and what was average to get in the last 4 overs etc<br><br>
        Obtaining the answers to these questions will help to set the context for realising which players have stood out and guided their team to victory.</p>
</h3>

In [None]:
first_inning_score_winning_chasing = np.array(teams_winning_chasing['1st_inning_score'].apply(lambda x: "".join(x.split("/")[0])).astype(str).astype(int))
first_inning_score_losing_chasing = np.array(teams_losing_chasing['1st_inning_score'].apply(lambda x: "".join(x.split("/")[0])).astype(str).astype(int))
first_inning_score_winning_batting = np.array(teams_winning_batting['1st_inning_score'].apply(lambda x: "".join(x.split("/")[0])).astype(str).astype(int))
first_inning_score_losing_batting = np.array(teams_losing_batting['1st_inning_score'].apply(lambda x: "".join(x.split("/")[0])).astype(str).astype(int))

In [None]:
first_inning_score_summary = make_subplots(rows=1,cols=4,specs=[[{"type":"box"},{"type":"box"},{"type":"box"},{"type":"box"}]],shared_yaxes=True)

In [None]:
first_inning_score_summary.add_trace(go.Box(y=first_inning_score_winning_chasing,name='<b>Chase and win</b>',marker=dict(color='#000000')),row=1,col=1)
first_inning_score_summary.add_trace(go.Box(y=first_inning_score_losing_chasing,name='<b>Chase and lose</b>',marker=dict(color='#000000')),row=1,col=2)
first_inning_score_summary.add_trace(go.Box(y=first_inning_score_winning_batting,name='<b>Bat and win</b>',marker=dict(color='#000000')),row=1,col=3)
first_inning_score_summary.add_trace(go.Box(y=first_inning_score_losing_batting,name='<b>Bat and lose</b>',marker=dict(color='#000000')),row=1,col=4)

first_inning_score_summary.update_layout(title_text = '<b>1st inning score summary</b>',title_x = 0.5,
                                         hoverlabel=dict(bgcolor="white",font_size=12,font_family='verdana',font_color='black'))
first_inning_score_summary.show()

In [None]:
df_season_details_2022.head()

In [None]:
df_season_details_2022.columns

In [None]:
df_season_details_2022.info()

In [None]:
df_season_details_2022.iloc[:,[2,3,6,7,8,9,10,11,12,13,14,20,23,24,33,34]].head(40)

In [None]:
ppscore_bat = {}
ppscore_chase = {}
ppwickets_bat = {}
ppwickets_chase = {}

In [None]:
for team in df_season_details_2022.home_team.unique():
    ppscore_bat[team] = []
    ppscore_chase[team] = []
    ppwickets_bat[team] = []
    ppwickets_chase[team] = []

In [None]:
df_season_summary_2022['Bat_First'] = 'Empty'
df_season_summary_2022['Bat_First_RunsInPP'] = 0
df_season_summary_2022['Bowl_First'] = 'Empty'
df_season_summary_2022['Bowl_First_RunsInPP'] = 0
df_season_summary_2022['Wickets_LostInPP_Batting'] = 0
df_season_summary_2022['Wickets_LostInPP_Chasing'] = 0

In [None]:
for id in df_season_details_2022.match_id.unique():
    match_df = df_season_details_2022[df_season_details_2022.match_id==id]
    team_batting_first = match_df.current_innings[match_df.innings_id==1].unique()[0]
    team_batting_second = match_df.current_innings[match_df.innings_id==2].unique()[0]
    
    runs_batting = list(match_df.apply(lambda row: row.runs if (1<=row.over<=6 and row.innings_id==1) else "Not powerplay",axis=1))
    runs_batting = np.sum([x for x in runs_batting if x!='Not powerplay'])
    ppscore_bat[team_batting_first].append(runs_batting)
    
    runs_chasing = list(match_df.apply(lambda row: row.runs if (1<=row.over<=6 and row.innings_id==2) else "Not powerplay",axis=1)) 
    runs_chasing = np.sum([x for x in runs_chasing if x!='Not powerplay'])
    ppscore_chase[team_batting_second].append(runs_chasing)   
    
    wickets_lost_batting = list(match_df.apply(lambda row: row.wkt_text if (1<=row.over<=6 and row.innings_id==1) else 'No wicket fell',axis=1).dropna())
    wickets_lost_batting = len([x for x in wickets_lost_batting if x!='No wicket fell'])
    ppwickets_bat[team_batting_first].append(wickets_lost_batting)
    
    wickets_lost_chasing = list(match_df.apply(lambda row: row.wkt_text if (1<=row.over<=6 and row.innings_id==2) else 'No wicket fell',axis=1).dropna())
    wickets_lost_chasing = len([x for x in wickets_lost_chasing if x!='No wicket fell'])
    ppwickets_chase[team_batting_second].append(wickets_lost_chasing)
    
    df_season_summary_2022.loc[df_season_summary_2022.id == id,'Bat_First'] = team_batting_first
    df_season_summary_2022.loc[df_season_summary_2022.id == id,'Bat_First_RunsInPP'] = runs_batting 
    df_season_summary_2022.loc[df_season_summary_2022.id == id,'Bowl_First'] = team_batting_second
    df_season_summary_2022.loc[df_season_summary_2022.id == id,'Bowl_First_RunsInPP'] = runs_chasing
    df_season_summary_2022.loc[df_season_summary_2022.id == id,'Wickets_LostInPP_Batting'] = wickets_lost_batting
    df_season_summary_2022.loc[df_season_summary_2022.id == id,'Wickets_LostInPP_Chasing'] = wickets_lost_chasing

In [None]:
plot_subplots(ppscore_bat,"Powerplay scores in first innings for each team",'box')

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: Three teams namely CSK, RR,and PBKS have batted first in more games than the others whereas the split between batting and fielding first is nearly equal for some sides such as GT,RCB,LSG. Here are some observations: <br>
        <li style="margin-left:10px"> Of the 3 franchises, PBKS had been more aggressive from the get go and averaged 8 runs per over. Meanwhile, RR got off to decent starts with an average runrate slightly over 7<br> 
        <li style="margin-left:10px"> Chennai on the other hand were unable to consistently get good starts. This could be due to the wickets lost early on. Further analysis will be done on the same.<br>
        <li style="margin-left:10px"> All the other sides managed to get around 7-8 RPO. 
    </p>
</div>

In [None]:
plot_subplots(ppwickets_bat,"Wickets lost in the powerplay while batting first",'box')

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: The franchises can be grouped into the following categories: <br>
        1. Those that went beserk right from the beginning, also known as Baz ball or the English brand of cricket.<br>
        2. Those that didn't lose too many wickets and also maintained a healthy run rate<br>
        3. Those whose scoring rate could have been hampered by the loss of wickets. These could be the top heavy sides.<br>
        <li style="margin-left:10px"> Teams like DC and PBKS satisfy the first condition where even though they lost a couple of wickets upfront, it didn't deter their scoring rate. Another side that can be included is SRH. Despite a small sample size of 3 games, they were able to capitalize on the field restrictions and score nearly at 9 an over.<br> 
        <li style="margin-left:10px"> Teams like GT, RR meet the second criteria where the average run rate was 7.5-8 an over but they didn't lose too many wickets.<br>
        <li style="margin-left:10px"> Teams like CSK, RCB, LSG, MI and KKR fall under the last category. There have been a few instances where they had blistering starts but early wickets marred their momentum during the other games.<br> 
    </p>
</div>

In [None]:
plot_subplots(ppscore_chase,"Powerplay scores in second innings for each team",'box')

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: In continuation, all the others teams except CSK, RR, and PBKS have either bowled first in more games [DC,GT,SRH,MI,KKR] or the split is nearly 50-50 [LSG,RCB]. Here are the takeaways:<br>
        <li style="margin-left:10px"> On inspecting these sides, DC has gotten off to blistering starts with an average runrate of 9.1 and are closely followed by GT at 8.8.<br>
        <li style="margin-left:10px"> However, on the other side of spectrum lies SRH, where they didn't always start well and also registered the minimum runs scored in the PP.<br>               
        <li style="margin-left:10px"> Sandwitched between these sides are RCB,LSG,KKR and MI, averaging around 7 RPO.    
    </p>
</div>

In [None]:
plot_subplots(ppwickets_chase,"Wickets lost in the powerplay while chasing",'box')

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: Applying the same classification as before: <br>
        <li style="margin-left:10px"> Three sides, DC, PBKS and GT belong to the category where even though they lost a couple of wickets upfront, it didn't affect their scoring rate. All of them scored at a rate of 9 an over .<br> 
        <li style="margin-left:10px"> For franchises like RCB, MI, LSG and KKR, it appears as if, if the side lost one, they lost a couple more quickly. Although, their run rate is slight better than the likes of CSK and SRH.<br>
    </p>
</div>

<div style="background-color:#80d8ff;font-family:verdana; font-size:18px">
    <p>📢: From the last four boxplots, it is evident that DC and PBKS have started more aggressively than the rest, and are closely followed by GT and RR. It also appears to be that sides like CSK, MI, RCB and SRH are very top order team dependent teams. This means that when the openers have failed, the middle-order wasn't able to consistently steady the ship and maintain a good run-rate.</p>
</div>

<p style="font-family:verdana;font-size:18px;">  
After understanding of the performance of all teams in the powerplay, it is now time to discover which players stood out during the campaign. Starting off with the bowling department, I'll be checking how the bowlers have performance in three phases, ie, the powerplay, the middle overs and the slog overs.
</p>

In [None]:
bowlers_df = pd.DataFrame()

In [None]:
def overs_bowled(df:pd.DataFrame)->int:
    total_overs = 0
    for match in df.match_id.unique():
        match_df = df[df.match_id==match] 
        total_overs += len(set(match_df.over))
    return total_overs

In [None]:
def measure_performance(wickets_picked:int,economy:float,wickets_threshold:int,economy_threshold:float)->int:
    if wickets_picked>=wickets_threshold:
        if economy<=economy_threshold:
            return 40
        else:
            return 25
    else:
        if economy<=economy_threshold:
            return 30
        else:
            return 10

In [None]:
for bowler in df_season_details_2022.bowler1_name.unique():
    bowlers_info = {}
    bowlers_info['Bowler'] = bowler
    bowler_df = df_season_details_2022.loc[df_season_details_2022.bowler1_name==bowler,['match_id','over','runs','isWide','isNoball','bowler1_name','wkt_bowler_name','wkt_text']]

    wickets_picked_pp = 0
    wickets_picked_middle = 0
    wickets_picked_death = 0

    overs_bowled_pp = 0
    overs_bowled_middle = 0
    overs_bowled_death = 0

    economy_pp = 0
    economy_middle = 0
    economy_death = 0

    # Powerplay Analysis
    bowler_df_pp =  bowler_df[(bowler_df.over>=1) & (bowler_df.over<=6)]
    if not np.shape(bowler_df_pp)[0]==0:
        runs_in_pp = bowler_df_pp['runs'].sum()
        wickets_picked_pp += np.shape(bowler_df_pp[(bowler_df_pp.wkt_bowler_name == bowler)&(bowler_df_pp.wkt_text.str.contains("run out")==False)])[0]
        overs_bowled_pp = overs_bowled(bowler_df_pp)
        economy_pp = runs_in_pp/overs_bowled_pp       
        performance_rating_pp = measure_performance(wickets_picked_pp,economy_pp,5,8.0)
    else:
        wickets_picked_pp = 'NA'
        economy_pp = 'NA'
        performance_rating_pp = 'NA'
              
    # Middle overs Analysis
    bowler_df_middle =  bowler_df[(bowler_df.over>6) & (bowler_df.over<17)]
    if not np.shape(bowler_df_middle)[0]==0:
        runs_in_middle = bowler_df_middle['runs'].sum()
        wickets_picked_middle += np.shape(bowler_df_middle[(bowler_df_middle.wkt_bowler_name == bowler)&(bowler_df_middle.wkt_text.str.contains("run out")==False)])[0]
        overs_bowled_middle = overs_bowled(bowler_df_middle)
        economy_middle = runs_in_middle/overs_bowled_middle 
        performance_rating_middle = measure_performance(wickets_picked_middle,economy_middle,6,7.7)
    else:
        wickets_picked_middle = 'NA'
        economy_middle = 'NA'
        performance_rating_middle = 'NA'
        
    # Death Overs Analysis
    bowler_df_death =  bowler_df[(bowler_df.over>=17)&(bowler_df.over<=20)]
    if not np.shape(bowler_df_death)[0]==0:
        runs_in_death = bowler_df_death['runs'].sum()
        wickets_picked_death += np.shape(bowler_df_death[(bowler_df_death.wkt_bowler_name == bowler)&(bowler_df_death.wkt_text.str.contains("run out")==False)])[0]
        overs_bowled_death = overs_bowled(bowler_df_death)
        economy_death = runs_in_death/overs_bowled_death
        performance_rating_death = measure_performance(wickets_picked_death,economy_death,6,9.0)
    else:
        wickets_picked_death = 'NA'
        economy_death = 'NA'
        performance_rating_death = 'NA'
                
    bowlers_info['Overs_in_PP'] = overs_bowled_pp
    bowlers_info['Wickets_in_PP'] = wickets_picked_pp
    bowlers_info['Economy_in_PP'] = economy_pp
    bowlers_info['Performance_in_PP'] = performance_rating_pp
    
    bowlers_info['Overs_in_Middle'] = overs_bowled_middle
    bowlers_info['Wickets_in_Middle'] = wickets_picked_middle
    bowlers_info['Economy_in_Middle'] = economy_middle
    bowlers_info['Performance_in_Middle'] = performance_rating_middle
    
    bowlers_info['Overs_in_Death'] = overs_bowled_death
    bowlers_info['Wickets_in_Death'] = wickets_picked_death
    bowlers_info['Economy_in_Death'] = economy_death
    bowlers_info['Performance_in_Death'] = performance_rating_death
    
    bowlers_df = bowlers_df.append(bowlers_info,ignore_index=True)

In [None]:
def plot_bowlers_performance(df:pd.DataFrame,xaxis_feature:str,yaxis_feature:str,marker_size_feature:str,marker_color_feature:str,title_text:str,wickets_threshold:int,economy_threshold:int)->None: 
    fig = go.Figure(data=[go.Scatter(x=df[xaxis_feature],y=df[yaxis_feature],text=df.Bowler,mode='markers',name='Base Plot: Bowler Info',marker=dict(size=list(df[marker_size_feature]),color=df[marker_color_feature],colorscale='rainbow',showscale=True)),
                          go.Scatter(x=[x for x in range(25)],y=[wickets_threshold]*(25),mode='lines',line=dict(color='black',dash='dot',width=0.4),name='Wickets Threshold'),
                          go.Scatter(x=[economy_threshold]*(25),y=[y for y in range(25)],mode='lines',line=dict(color='black',dash='dot',width=0.4),name='Economy Threshold')])
    
    fig.update_layout(title_text=f'<b>Economy vs Wickets taken in {title_text}</b>',title_x = 0.5,xaxis=dict(tick0=min(df[xaxis_feature])),xaxis_title=f'<b>{xaxis_feature}</b>',
                      yaxis_title=f'<b>{yaxis_feature}</b>',legend=dict(yanchor="top",y=0.99,xanchor="left",x=0.01))
    fig.show()

In [None]:
bowler_analysis_pp = bowlers_df[bowlers_df.Overs_in_PP>4]
bowler_analysis_middle_overs = bowlers_df[bowlers_df.Overs_in_Middle>8]
bowler_analysis_death = bowlers_df[bowlers_df.Overs_in_Death>4]

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: Before heading to the interpretation, here are some features that will be common to the three scatter plots: 
        <li style="margin-left:10px"> The thresholds for the wickets taken and the economy are the two dotted lines parallel to the Y and X axis respectively. Both the thresholds vary for each phase.<br>
         <li style="margin-left:10px"> For instance: In the powerplay: Wickets - 5; Economy - 8<br> Middle overs: Wickets - 6; Economy - 7.7<br> At the death: Wickets - 6; Economy - 9<br>
        <li style="margin-left:10px"> The rainbow colorscale on the right indicates the overs bowled.<br>
        <li style="margin-left:10px"> The data appearing on hovering over each point is of the form: (economy,wickets) and is applicable for all three phases.<br> 
    </p>
</div>

In [None]:
plot_bowlers_performance(bowler_analysis_pp,'Economy_in_PP','Wickets_in_PP','Performance_in_PP','Overs_in_PP','Powerplay',4,8)

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: The observations from the powerplay are as follows: 
        <li style="margin-left:10px">Five bowlers namely Prasidh Krishna,Trent Boult,Mohammed Shami,Mukesh Choudhary and Kagiso Rabada were highly effective with the new ball. Although,the first three players were more economical than the last two.<br> 
        <li style="margin-left:10px"> Others like Bhuveshwar Kumar,Jasprit Bumrah,Marco Jansen,Josh Hazlewood,and Umesh Yadav did a commendable job of keeping things tight upfront.<br>
        <li style="margin-left:10px"> The right end of the graph contains those bowlers who leaked more runs and were not able to strike effectively with the ball. Mohammed Siraj had a forgetful tournament. On the bright side, first time IPL participants like Maheesh Theekshana and Dushmantha Chameera showed how handy they were for their franchise.   
    </p>
</div>

In [None]:
plot_bowlers_performance(bowler_analysis_middle_overs,'Economy_in_Middle','Wickets_in_Middle','Performance_in_Middle','Overs_in_Middle','Middle Overs',6,7.7)

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: The observations from the middle overs are as follows: 
        <li style="margin-left:10px">It was expected that the spinners would dominate this phase of the game and that hypothesis was true.<br> 
        <li style="margin-left:10px">Six quality spinners came out on top, starting from Kul-Cha to veterans like Narine, Rashid Khan and Ravi Ashwin. Hasaranga fully repaid the trust the RCB management showed during the auction. Wrist spinners Ravi Bishnoi and Rahul Chahar also had decent tournaments.<br>
        <li style="margin-left:10px"> Amongst the pacers, fiery Umran Malik had an great season and Purple Patel continued his good form during the tournament.
    </p>
</div>

In [None]:
plot_bowlers_performance(bowler_analysis_death,'Economy_in_Death','Wickets_in_Death','Performance_in_Death','Overs_in_Death','Death Overs',6,9)

<div style="font-family:verdana;background-color:#FFF748;font-size:18px"> 
    <p style="color:#000000"> 📌: The observations from the slog overs are as follows: 
        <li style="margin-left:10px">The return of Bhuveshwar Kumar this season is great for the Indian T20 setup. His ability to strike at the key moments deters the momentum of the opposition but attention needs to be cast on Arshdeep Singh and Jasprit Bumrah. To go for 7 an over when batsmen look to accelerate is incredible.<br> 
        <li style="margin-left:10px">The same bowlers who contained in the powerplay were on the expensive side but still managed to get wickets.<br>
        <li style="margin-left:10px">Bowlers like Shardul Thakur and Natarajan, known for their ability to pick up wickets, were smashed despite managing to get 6-8 wickets. 
    </p>
</div>

<div style="background-color:#80d8ff;font-family:verdana; font-size:18px">
    <p>📢: In a mixed season filled with comebacks and disappointments, some bowlers were more effective than the rest.</p>
     <li style="margin-left:10px"> On the bowling front, the return of Kuldeep Yadav and Bhuveshwar Kumar along with the breakthrough for Arshdeep were the positives of the season.
     <li style="margin-left:10px"> As a summary on the shape of the three graphs, during the middle overs, even though it is obvious, the amount of bowlers being experimented with [data which is marked in purple] is higher than the powerplay or the death. This is because most teams use their spinners along with the fifth or sixth bowling option and go back to their main bowlers at the end.
     <li style="margin-left:10px"> This concludes the bowling analysis.
</div>

<div style="background-color:#80d8ff;font-family:verdana; font-size:18px">
    <p>📢: Repeating the exercise in order to identify the standout performers with the bat. Once again, the analysis will be done over the three phases.</p>
     <li style="margin-left:10px"> The first task will be collect the strike rate of batsmen in the powerplay.
     <li style="margin-left:10px"> In continuation, other details such as the boundaries and sixers hit will also be highlighted.
</div>

In [None]:
batsmen_df = pd.DataFrame()

In [None]:
def extract_batsmen_info(df:pd.DataFrame)->list:
    runs_scored = []
    times_out = 0
    balls_played = 0
    boundaries_hit = 0
    for id in df.match_id.unique():
        match = df[df.match_id==id]
        balls_played+= np.shape(match[(match.isWide==False)&(match.isNoball==False)])[0]
        runs_scored.append(match['batsman1_runs'].tail(1))
        times_out+=len(match['wkt_batsman_name'].dropna())
        boundaries_scored = match['isBoundary'].astype(dtype=str)
        try: 
            boundaries = boundaries_scored.value_counts()['True']
            boundaries_hit += boundaries
        except:
            continue
    return [balls_played,np.sum(runs_scored),times_out,boundaries_hit]

In [None]:
for batsman in df_season_details_2022.batsman1_name.unique():
    batsman_info = {}
    batsman_info['Batsman'] = batsman
    print(f"Processing batsman: {batsman}")
    batter_df = df_season_details_2022.loc[df_season_details_2022.batsman1_name == batsman,['match_id','over','runs','isWide','isNoball','isBoundary','batsman1_runs','wkt_batsman_name','wkt_batsman_runs']]
    
    runs_in_pp = 0
    boundaries_in_pp = 0
    balls_played_in_pp = 0 
    times_out_pp = 0
    strike_rate = 0
    
    batter_df_pp = batter_df[(batter_df.over>=1) & (batter_df.over<=6)]
    if not batter_df_pp.shape[0]==0:
        batsman_data = extract_batsmen_info(batter_df_pp)
        balls_faced_in_pp,runs_in_pp,times_out_pp,boundaries_in_pp = batsman_data[0],batsman_data[1],batsman_data[2],batsman_data[3]
        strike_rate = (runs_in_pp/balls_faced_in_pp)*100
        print(f"Strike rate in powerplay: {strike_rate}\nTimes out: {times_out_pp}\nBoundaries scored:{boundaries_in_pp}\n")
    else:
        print(f"{batsman} hasn't batted in the powerplay\n")