<a href="https://colab.research.google.com/github/div-yash/Data-Science-Projects/blob/main/T20_World_Cup_2024_Match_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T20 World Cup 2024 Match Analysis


In [1]:
import pandas as pd

cup_data=pd.read_csv("/content/india-usa_innings_data.csv")

cup_data.head()


Unnamed: 0,batter,bowler,non_striker,runs_batter,runs_extras,runs_total,wickets_0_player_out,wickets_0_kind,team,over,...,wickets_0_fielders_0_name,review_by,review_umpire,review_batter,review_decision,review_type,extras_legbyes,wickets_0_fielders_1_name,extras_noballs,extras_penalty
0,Shayan Jahangir,Arshdeep Singh,SR Taylor,0,0,0,Shayan Jahangir,lbw,United States of America,0,...,,,,,,,,,,
1,AGS Gous,Arshdeep Singh,SR Taylor,0,0,0,,,United States of America,0,...,,,,,,,,,,
2,AGS Gous,Arshdeep Singh,SR Taylor,0,0,0,,,United States of America,0,...,,,,,,,,,,
3,AGS Gous,Arshdeep Singh,SR Taylor,0,1,1,,,United States of America,0,...,,,,,,,,,,
4,AGS Gous,Arshdeep Singh,SR Taylor,2,0,2,,,United States of America,0,...,,,,,,,,,,


We can observe, there are missing values and datatypes.

In [2]:
# checking for missing values :
missing_values=cup_data.isnull().sum()

In [3]:
missing_values

Unnamed: 0,0
batter,0
bowler,0
non_striker,0
runs_batter,0
runs_extras,0
runs_total,0
wickets_0_player_out,225
wickets_0_kind,225
team,0
over,0


In [4]:
# checking the data type of the columns :
data_types=cup_data.dtypes

In [5]:
data_types

Unnamed: 0,0
batter,object
bowler,object
non_striker,object
runs_batter,int64
runs_extras,int64
runs_total,int64
wickets_0_player_out,object
wickets_0_kind,object
team,object
over,int64


Here , the data has null values in various columns , but in such datasets, even null vslues have a meaning , so we will leave them as it is .

In [6]:
# group the data for analysis :

# total runs scored by each team :
total_runs=cup_data.groupby('team')['runs_total'].sum()

# total wickets taken by each team :
total_wickets =cup_data['wickets_0_player_out'].notna().groupby(cup_data['team']).sum()

# total extras given :
total_extras=cup_data[['team','runs_extras','extras_wides','extras_noballs','extras_legbyes','extras_penalty']].groupby('team').sum()

# runs scored by each batsman :
batter_runs=cup_data.groupby('batter')['runs_batter'].sum()

# balls faced by each batsman :
balls_faced=cup_data.groupby('batter').size()

# strike-rate of each batsman :
strike_rate=(batter_runs/balls_faced)*100

# boundaries hit by each batsman :
boundaries=cup_data[(cup_data['runs_batter']==4)| (cup_data['runs_batter']==6)].groupby(['batter','runs_batter']).size().unstack(fill_value=0)

# wickets taken by each bowler :
wickets_taken =cup_data['wickets_0_player_out'].notna().groupby(cup_data['bowler']).sum()

# runs conceded by each bowler :
runs_conceded = cup_data.groupby('bowler')['runs_total'].sum()

# balls bowled by each bowler :
balls_bowled = cup_data.groupby('bowler').size()

#economy rate of each bowler :
economy_rate=runs_conceded/(balls_bowled/6)

# dot balls bowled by each bowler :
dot_balls = cup_data[cup_data['runs_total']==0].groupby('bowler').size()

# combine all these stats into dataframes for batsman and bowlers
batter_stats =pd.DataFrame({
    'Runs':batter_runs,
    'Balls Faced':balls_faced,
    'Strike Rate':strike_rate,
}).join(boundaries)

bowler_stats=pd.DataFrame({
    'Wickets':wickets_taken,
    'Runs Conceded':runs_conceded,
    'Balls Bowled':balls_bowled,
    'Economy Rate':economy_rate,
    'Dot Balls':dot_balls,

})


I have grouped the data that we needed to analyze this data properly. You can look at all the grouped insights one by one. Here’s an example:

In [7]:
total_runs

Unnamed: 0_level_0,runs_total
team,Unnamed: 1_level_1
India,111
United States of America,110


In [8]:
total_wickets

Unnamed: 0_level_0,wickets_0_player_out
team,Unnamed: 1_level_1
India,3
United States of America,8


In [9]:
total_extras

Unnamed: 0_level_0,runs_extras,extras_wides,extras_noballs,extras_legbyes,extras_penalty
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
India,9,2.0,1.0,1.0,5.0
United States of America,8,7.0,0.0,1.0,0.0


Progression of runs over overs:

In [10]:
import plotly.graph_objects as go

india_runs=cup_data[cup_data['team']=='India'].groupby('over')['runs_total'].sum().cumsum()
usa_runs=cup_data[cup_data['team']=='United States of America'].groupby('over')['runs_total'].sum().cumsum()

fig=go.Figure()

fig.add_trace(go.Scatter(
    x=india_runs.index,
    y=india_runs.values,
    mode='lines+markers',
    name='India'
))

fig.add_trace(go.Scatter(
    x=usa_runs.index,
    y=usa_runs.values,
    mode='lines+markers',
    name='USA'
))

fig.update_layout(
    title='Runs Progression Over Overs ',
    xaxis_title='Overs',
    yaxis_title='Cumulative Runs',
    legend_title='Teams',
    template='plotly_white'
)

fig.show()

The graph shows the progression of the cumulative run over the overs for both India and the USA in their T20 World Cup match. Initially, both teams had a steady run rate, with India slightly ahead in the early overs. As the innings progressed, USA gained momentum and took the lead briefly around the middle overs. However, India accelerated their scoring in the later overs, surpassing the USA and maintaining the lead until the end. The key takeaway is India’s strong finish, which enabled them to secure the win by consistently increasing their run rate in the final overs.

 Timeline showing fall of Wickets

In [14]:
from re import template
india_wickets= cup_data[(cup_data['team']=='India') & cup_data['wickets_0_player_out'].notna()].groupby('over').size()
usa_wickets  = cup_data[(cup_data['team']=='United States of America') & cup_data['wickets_0_player_out'].notna()].groupby('over').size()


fig=go.Figure()

fig.add_trace(go.Bar(
    x=india_wickets.index,
    y=india_wickets.values,
    name='India',
    marker_color='blue',
    opacity=0.7
))

fig.add_trace(go.Bar(
    x=usa_wickets.index,
    y=usa_wickets.values,
    name='USA',
    marker_color='red',
    opacity=0.7
))

fig.update_layout(
    title='Fall of Wickets',
    xaxis_title='Overs',
    yaxis_title='Number of Wickets',
    barmode='group',
    template='plotly_white',
    legend_title='Teams'
)

fig.show()

Runs distribution by batters :

In [18]:
import plotly.express as px

fig=px.bar(
    batter_stats,
    x=batter_stats.index,
    y='Runs',
    title='Run distribution by batters',
    labels={
        'x':'Batter','Runs':'Runs Scored'
    },
    template='plotly_white'
)

fig.update_layout(
    xaxis_title='Batter',
    yaxis_title='Runs Scored',
    xaxis=dict(tickangle=90)
)

fig.show()



Notably, S. A. Yadav emerged as the highest scorer with a significant contribution, followed by NR Kumar and S. Dube.

Hence , these three players were pivotal in their team’s innings, providing the bulk of the runs.

Bowling Performance:

In [20]:
fig=go.Figure()

fig.add_trace(go.Scatter(
    x=bowler_stats['Economy Rate'],
    y=bowler_stats['Wickets'],
    mode='markers+text',
    text=bowler_stats.index,
    textposition='top center',
    textfont=dict(
        family="sans serif",
        size=12,
        color="black"
    ),
    marker=dict(color='red',size=10),
    name='Bowlers'
))

fig.update_layout(
    title='Bowling Performance',
    xaxis_title='Economy Rate',
    yaxis_title='plotly_white',
    autosize=False,
    width=800,
    height=600
)

fig.show()

Arshdeep Singh stands out as the most effective bowler, taking the highest number of wickets (4) with a commendable economy rate.

Partnership contributions in the India’s innings:

In [22]:
# separate data for India and USA
india_partnership_data = cup_data[cup_data['team'] == 'India'].groupby(['over', 'batter', 'non_striker'])['runs_total'].sum().reset_index()
usa_partnership_data = cup_data[cup_data['team'] == 'United States of America'].groupby(['over', 'batter', 'non_striker'])['runs_total'].sum().reset_index()

# create pivot tables for better visualization
india_partnership_pivot = india_partnership_data.pivot(index='over', columns=['batter', 'non_striker'], values='runs_total').fillna(0)
usa_partnership_pivot = usa_partnership_data.pivot(index='over', columns=['batter', 'non_striker'], values='runs_total').fillna(0)


# convert the pivot table to a long format
india_partnership_long = india_partnership_pivot.reset_index().melt(id_vars='over', var_name=['batter', 'non_striker'], value_name='runs_total')

# create a stacked bar chart
fig = go.Figure()

# add bars for each partnership
for (batter, non_striker) in india_partnership_pivot.columns:
    partnership_data = india_partnership_long[(india_partnership_long['batter'] == batter) & (india_partnership_long['non_striker'] == non_striker)]
    fig.add_trace(go.Bar(
        x=partnership_data['over'],
        y=partnership_data['runs_total'],
        name=f'{batter} & {non_striker}'
    ))

fig.update_layout(
    title='Partnership Contributions - India',
    xaxis_title='Over',
    yaxis_title='Runs',
    barmode='stack',
    template='plotly_white',
    legend_title='Partnership',
    legend=dict(
        x=1.05,
        y=1,
        traceorder='normal',
        font=dict(size=10)
    ),
    autosize=False,
    width=900,
    height=600
)

fig.show()

Notably, the partnerships of RG Sharma & RR Pant and SA Yadav & S Dube were particularly productive, especially in the middle and death overs, contributing significantly to the team’s total.

Partnership contributions in the USA’s innings:

In [23]:
usa_partnership_long = usa_partnership_pivot.reset_index().melt(id_vars='over', var_name=['batter', 'non_striker'], value_name='runs_total')

# create a stacked bar chart
fig = go.Figure()

# add bars for each partnership
for (batter, non_striker) in usa_partnership_pivot.columns:
    partnership_data = usa_partnership_long[(usa_partnership_long['batter'] == batter) & (usa_partnership_long['non_striker'] == non_striker)]
    fig.add_trace(go.Bar(
        x=partnership_data['over'],
        y=partnership_data['runs_total'],
        name=f'{batter} & {non_striker}'
    ))

fig.update_layout(
    title='Partnership Contributions - USA',
    xaxis_title='Over',
    yaxis_title='Runs',
    barmode='stack',
    template='plotly_white',
    legend_title='Partnership',
    legend=dict(
        x=1.05,
        y=1,
        traceorder='normal',
        font=dict(size=10)
    ),
    autosize=False,
    width=900,
    height=600
)

fig.show()

Key partnerships such as SR Taylor & Aaron Jones and NR Kumar & SR Taylor significantly boosted the scoring, particularly in the middle and late overs. However, the contributions are more sporadic compared to India, with several partnerships contributing only marginally.

Key moments for India:

In [24]:
india_cumulative_runs = cup_data[cup_data['team'] == 'India'].groupby('over')['runs_total'].sum().cumsum()
india_wickets_fall = cup_data[(cup_data['team'] == 'India') & cup_data['wickets_0_player_out'].notna()].groupby('over').size().cumsum()
india_key_moments = cup_data[(cup_data['team'] == 'India') & cup_data['wickets_0_player_out'].notna()].reset_index()

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=india_cumulative_runs.index,
    y=india_cumulative_runs.values,
    mode='lines+markers',
    name='India Cumulative Runs',
    line=dict(color='green')
))

fig.add_trace(go.Scatter(
    x=india_wickets_fall.index,
    y=india_cumulative_runs.loc[india_wickets_fall.index],
    mode='markers',
    name='India Wickets',
    marker=dict(color='red', size=10)
))

for _, row in india_key_moments.iterrows():
    fig.add_annotation(
        x=row['over'],
        y=india_cumulative_runs.loc[row['over']],
        text=f"{row['batter']} ({row['over']})",
        showarrow=True,
        arrowhead=2,
        ax=row['over'],
        ay=india_cumulative_runs.loc[row['over']] + 5,
        arrowcolor='black'
    )

fig.update_layout(
    title='India Key Moments in Innings',
    xaxis_title='Overs',
    yaxis_title='Cumulative Runs',
    template='plotly_white',
    legend_title='India Innings',
    autosize=False,
    width=900,
    height=600
)

fig.show()

Compare the average run rate for both teams:

In [25]:
india_run_rate = cup_data[cup_data['team'] == 'India'].groupby('over')['runs_total'].sum().mean()
usa_run_rate = cup_data[cup_data['team'] == 'United States of America'].groupby('over')['runs_total'].sum().mean()

fig = go.Figure()

fig.add_trace(go.Bar(
    x=['India', 'USA'],
    y=[india_run_rate, usa_run_rate],
    marker_color=['green', 'blue']
))

fig.add_annotation(
    x='India',
    y=india_run_rate,
    text=f"{india_run_rate:.2f}",
    showarrow=False,
    yshift=10
)

fig.add_annotation(
    x='USA',
    y=usa_run_rate,
    text=f"{usa_run_rate:.2f}",
    showarrow=False,
    yshift=10
)

fig.update_layout(
    title='Comparison of Average Run Rate per Over',
    xaxis_title='Team',
    yaxis_title='Average Run Rate per Over',
    template='plotly_white'
)

fig.show()

Comparison of the run rate per over:

In [27]:
india_run_rate_per_over = cup_data[cup_data['team'] == 'India'].groupby('over')['runs_total'].sum()
usa_run_rate_per_over = cup_data[cup_data['team'] == 'United States of America'].groupby('over')['runs_total'].sum()

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=india_run_rate_per_over.index,
    y=india_run_rate_per_over.values,
    mode='lines+markers',
    name='India Run Rate',
    line=dict(color='green')
))

fig.add_trace(go.Scatter(
    x=usa_run_rate_per_over.index,
    y=usa_run_rate_per_over.values,
    mode='lines+markers',
    name='USA Run Rate',
    line=dict(color='blue')
))

fig.update_layout(
    title='Comparison of Run Rate per Over',
    xaxis_title='Overs',
    yaxis_title='Runs',
    template='plotly_white',
    legend_title='Runrate',
    autosize=False,
    width=1000,
    height=600
)

fig.show()

In summary, India's cricket strategy, characterized by consistent scoring, effective partnerships, and a well-balanced bowling attack, demonstrated superiority over the USA's inconsistent batting performance and less impactful bowling, ultimately leading to their success.