# T20 World Cup 2024 Match Analysis

The dataset using provides a comprehensive record of the innings, including player performance, team statistics, and specific events like dismissals and reviews.

In [24]:
import pandas as pd
import numpy as np 
import seaborn as sns 
import matplotlib.pyplot as plt
import plotly.graph_objects as go

In [3]:
data=pd.read_csv("india-usa_innings_data.csv")

In [4]:
data.head()

Unnamed: 0,batter,bowler,non_striker,runs_batter,runs_extras,runs_total,wickets_0_player_out,wickets_0_kind,team,over,...,wickets_0_fielders_0_name,review_by,review_umpire,review_batter,review_decision,review_type,extras_legbyes,wickets_0_fielders_1_name,extras_noballs,extras_penalty
0,Shayan Jahangir,Arshdeep Singh,SR Taylor,0,0,0,Shayan Jahangir,lbw,United States of America,0,...,,,,,,,,,,
1,AGS Gous,Arshdeep Singh,SR Taylor,0,0,0,,,United States of America,0,...,,,,,,,,,,
2,AGS Gous,Arshdeep Singh,SR Taylor,0,0,0,,,United States of America,0,...,,,,,,,,,,
3,AGS Gous,Arshdeep Singh,SR Taylor,0,1,1,,,United States of America,0,...,,,,,,,,,,
4,AGS Gous,Arshdeep Singh,SR Taylor,2,0,2,,,United States of America,0,...,,,,,,,,,,


In [7]:
data.tail()

Unnamed: 0,batter,bowler,non_striker,runs_batter,runs_extras,runs_total,wickets_0_player_out,wickets_0_kind,team,over,...,wickets_0_fielders_0_name,review_by,review_umpire,review_batter,review_decision,review_type,extras_legbyes,wickets_0_fielders_1_name,extras_noballs,extras_penalty
231,SA Yadav,SN Netravalkar,S Dube,0,0,0,,,India,17,...,,,,,,,,,,
232,SA Yadav,SN Netravalkar,S Dube,1,0,1,,,India,17,...,,,,,,,,,,
233,SA Yadav,Ali Khan,S Dube,1,0,1,,,India,18,...,,,,,,,,,,
234,S Dube,Ali Khan,SA Yadav,0,1,1,,,India,18,...,,,,,,,,,,
235,S Dube,Ali Khan,SA Yadav,2,0,2,,,India,18,...,,,,,,,,,,


In [9]:
## checking missing values 
missing_values=data.isnull().sum()
missing_values

batter                         0
bowler                         0
non_striker                    0
runs_batter                    0
runs_extras                    0
runs_total                     0
wickets_0_player_out         225
wickets_0_kind               225
team                           0
over                           0
extras_wides                 231
wickets_0_fielders_0_name    228
review_by                    235
review_umpire                235
review_batter                235
review_decision              235
review_type                  235
extras_legbyes               234
wickets_0_fielders_1_name    235
extras_noballs               235
extras_penalty               235
dtype: int64

In [12]:
## checking with data types
data_type=data.dtypes
data_type

batter                        object
bowler                        object
non_striker                   object
runs_batter                    int64
runs_extras                    int64
runs_total                     int64
wickets_0_player_out          object
wickets_0_kind                object
team                          object
over                           int64
extras_wides                 float64
wickets_0_fielders_0_name     object
review_by                     object
review_umpire                 object
review_batter                 object
review_decision               object
review_type                   object
extras_legbyes               float64
wickets_0_fielders_1_name     object
extras_noballs               float64
extras_penalty               float64
dtype: object

The data has null values in various columns. But in such datasets, even null values have a meaning, so we will leave them as it is and move forward.

In [13]:
#total runs scored by each team
total_runs = data.groupby('team')['runs_total'].sum()

# total wickets taken by each team
total_wickets = data['wickets_0_player_out'].notna().groupby(data['team']).sum()

# total extras
total_extras = data[['team', 'runs_extras', 'extras_wides', 'extras_noballs', 'extras_legbyes', 'extras_penalty']].groupby('team').sum()

# runs scored by each batter
batter_runs = data.groupby('batter')['runs_batter'].sum()

# balls faced by each batter
balls_faced = data.groupby('batter').size()

# strike rate of each batter
strike_rate = (batter_runs / balls_faced) * 100

# boundaries hit by each batter
boundaries = data[(data['runs_batter'] == 4) | (data['runs_batter'] == 6)].groupby(['batter', 'runs_batter']).size().unstack(fill_value=0)

# wickets taken by each bowler
wickets_taken = data['wickets_0_player_out'].notna().groupby(data['bowler']).sum()

# runs conceded by each bowler
runs_conceded = data.groupby('bowler')['runs_total'].sum()

# balls bowled by each bowler
balls_bowled = data.groupby('bowler').size()

# economy rate of each bowler
economy_rate = runs_conceded / (balls_bowled / 6)

# dott balls bowled by each bowler
dot_balls = data[data['runs_total'] == 0].groupby('bowler').size()



In [14]:
# combine all these statistics into dataframes for batters and bowlers
batter_stats = pd.DataFrame({
    'Runs': batter_runs,
    'Balls Faced': balls_faced,
    'Strike Rate': strike_rate,
}).join(boundaries)

bowler_stats = pd.DataFrame({
    'Wickets': wickets_taken,
    'Runs Conceded': runs_conceded,
    'Balls Bowled': balls_bowled,
    'Economy Rate': economy_rate,
    'Dot Balls': dot_balls,
})


In [15]:
total_runs

team
India                       111
United States of America    110
Name: runs_total, dtype: int64

In [16]:
total_wickets

team
India                       3
United States of America    8
Name: wickets_0_player_out, dtype: int64

In [17]:
total_extras


Unnamed: 0_level_0,runs_extras,extras_wides,extras_noballs,extras_legbyes,extras_penalty
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
India,9,2.0,1.0,1.0,5.0
United States of America,8,7.0,0.0,1.0,0.0


moving forward of the runs on overs

In [20]:
pip install plotly

Note: you may need to restart the kernel to use updated packages.


In [23]:
india_runs_progression = data[data['team'] == 'India'].groupby('over')['runs_total'].sum().cumsum()
usa_runs_progression = data[data['team'] == 'United States of America'].groupby('over')['runs_total'].sum().cumsum()

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=india_runs_progression.index,
    y=india_runs_progression.values,
    mode='lines+markers',
    name='India'
))

fig.add_trace(go.Scatter(
    x=usa_runs_progression.index,
    y=usa_runs_progression.values,
    mode='lines+markers',
    name='USA'
))

fig.update_layout(
    title='Runs Progression Over Overs',
    xaxis_title='Overs',
    yaxis_title='Cumulative Runs',
    legend_title='Teams',
    template='plotly_white'
)

fig.show()

 - The graph shows the moving forward of the cumulative run over the overs for both India and the USA in their T20 World Cup match. Initially, both teams had a steady run rate, with India slightly ahead in the early overs. As the innings progressed, USA gained momentum and took the lead briefly around the middle overs. 
 - However, India accelerated their scoring in the later overs, surpassing the USA and maintaining the lead until the end. The key takeaway is India’s strong finish, which enabled them to secure the win by consistently increasing their run rate in the final overs.

In [25]:
india_wickets = data[(data['team'] == 'India') & data['wickets_0_player_out'].notna()].groupby('over').size()
usa_wickets = data[(data['team'] == 'United States of America') & data['wickets_0_player_out'].notna()].groupby('over').size()


In [26]:
fig = go.Figure()

fig.add_trace(go.Bar(
    x=india_wickets.index,
    y=india_wickets.values,
    name='India',
    marker_color='blue',
    opacity=0.7
))

fig.add_trace(go.Bar(
    x=usa_wickets.index,
    y=usa_wickets.values,
    name='USA',
    marker_color='red',
    opacity=0.7
))

fig.update_layout(
    title='Wickets Timeline',
    xaxis_title='Overs',
    yaxis_title='Number of Wickets',
    barmode='group',
    template='plotly_white',
    legend_title='Teams'
)

fig.show()

The wickets timeline graph illustrates the distribution of wickets taken over the overs for both India and the USA.
 - The USA lost wickets more frequently, especially in the early overs, with two wickets falling in the first over, followed by consistent wicket losses throughout their innings. In contrast, India experienced their wicket losses more evenly spread across their innings, with a couple of early wickets but maintaining longer partnerships in the middle overs. The frequent loss of wickets by the USA disrupted their momentum, while India’s ability to avoid clusters of wickets falling in succession helped them maintain a steady scoring rate and ultimately secure the win.