<a href="https://www.kaggle.com/code/dattapadal/eda-t20-world-cup-summary-analysis?scriptVersionId=115574295" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Data Analysis of T20 World Cup Summary Analysis

In [1]:
#import the necessary libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

pio.templates.default = 'plotly_white'

In [2]:
data = pd.read_csv("/kaggle/input/world-t20-2022-analysis/t20-world-cup-22.csv")
data.head()

Unnamed: 0,venue,team1,team2,stage,toss winner,toss decision,first innings score,first innings wickets,second innings score,second innings wickets,winner,won by,player of the match,top scorer,highest score,best bowler,best bowling figure
0,SCG,New Zealand,Australia,Super 12,Australia,Field,200.0,3.0,111.0,10.0,New Zealand,Runs,Devon Conway,Devon Conway,92.0,Tim Southee,3-6
1,Optus Stadium,Afghanistan,England,Super 12,England,Field,112.0,10.0,113.0,5.0,England,Wickets,Sam Curran,Ibrahim Zadran,32.0,Sam Curran,5-10
2,Blundstone Arena,Ireland,Sri lanka,Super 12,Ireland,Bat,128.0,8.0,133.0,1.0,Sri lanka,Wickets,Kusal Mendis,Kusal Mendis,68.0,Maheesh Theekshana,2-19
3,MCG,Pakistan,India,Super 12,India,Field,159.0,8.0,160.0,6.0,India,Wickets,Virat Kohli,Virat Kohli,82.0,Hardik Pandya,3-30
4,Blundstone Arena,Bangladesh,Netherlands,Super 12,Netherlands,Field,144.0,8.0,135.0,10.0,Bangladesh,Runs,Taskin Ahmed,Colin Ackermann,62.0,Taskin Ahmed,4-25


In [3]:
#check for NaN values
data.isna().sum()

venue                     0
team1                     0
team2                     0
stage                     0
toss winner               3
toss decision             3
first innings score       3
first innings wickets     3
second innings score      3
second innings wickets    3
winner                    4
won by                    4
player of the match       4
top scorer                3
highest score             3
best bowler               3
best bowling figure       3
dtype: int64

In [4]:
filter_ = data['toss winner'].isna()
data[filter_]

Unnamed: 0,venue,team1,team2,stage,toss winner,toss decision,first innings score,first innings wickets,second innings score,second innings wickets,winner,won by,player of the match,top scorer,highest score,best bowler,best bowling figure
8,MCG,New Zealand,Afghanistan,Super 12,,,,,,,,,,,,,
12,MCG,Afghanistan,Ireland,Super 12,,,,,,,,,,,,,
13,MCG,Australia,England,Super 12,,,,,,,,,,,,,


There are few abondaned matches and hence their values have `NaN`

# Number of matches won by each time

In [5]:
figure = px.bar(data, x=data['winner'], title='Number of Matches won by teams n t20 world cup 2022')
figure.show()

* England won the world cup and has highest number of matches won.
* Both India and Pakistan won 4 matches each.

Let's have look at the number of matches won by batting first or second n the t20 world cup 2022.

In [6]:
won_by = data['won by'].value_counts()
won_by

Runs       16
Wickets    13
Name: won by, dtype: int64

In [7]:
label = won_by.index
print(label)
counts = won_by.values
print(counts)

Index(['Runs', 'Wickets'], dtype='object')
[16 13]


In [8]:
colors = ['gold','lightgreen']
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Number of matches won by Runs or Wickets')
fig.update_traces(hoverinfo='label+percent', 
                  textinfo='value', 
                  textfont_size=30,
                  marker=dict(colors=colors, line=dict(color='black', width=3))
                 )

So, in T20 World cup 2022, 16 matches were won by batting first, and 13 matches were won by chasing.

Lets look at the toss decisions.

In [9]:
toss = data['toss decision'].value_counts().reset_index()
toss = toss.rename(columns={'index':'Toss Decision', 'toss decision':'Count'})
toss

Unnamed: 0,Toss Decision,Count
0,Bat,17
1,Field,13


In [10]:
fig = px.pie(toss, values='Count', names='Toss Decision', title='Number of matches won by Runs or Wickets')
fig.update_traces(textposition='inside', textinfo='value+label',)
fig.show()

So in 17 matches, the teams decided to bat first, and in 13 matches, the teams decided to choose field.

Lets have a look at the top scorers

In [11]:
figure = px.bar(data,
                x=data['top scorer'],
                y=data['highest score'],
                color=data['highest score'],
                title='Top scorers in t20 world cup 2022'
               )
figure.show()

So, Virat Kohli scored the highest in 3 matches. 

Now let’s have a look at the number of player of the match awards in the world cup:

In [12]:
figure = px.bar(data,
                x=data['player of the match'],
                title='Player of the Match Awards in t20 World Cup 2022'
               )
figure.show()

Virat Kohli, Sam Curran, Taskin Ahmed, Suryakumar Yadav, and Shadab Khan got the player of the match in 2 matches

Now let’s have a look at the bowlers with the best bowling figures at the end of the matches

In [13]:
figure = px.bar(data,
                x=data['best bowler'],
                title='Best bowlers in T20 world cup 2022'
               )
figure.show()

Sam Curran was the only best bowler in 3 matches. 

Now let’s compare the runs scored in the first innings and second innings in every stadium of the t20 world cup 2022:

In [14]:
fig = go.Figure()
fig.add_trace(go.Bar(x=data['venue'],
                     y=data['first innings score'],
                     name='First innings runs',
                     marker_color='blue'
                    ))
fig.add_trace(go.Bar(x=data['venue'],
                     y=data['second innings score'],
                     name='Second innings runs',
                     marker_color='red'
                    ))
fig.update_layout(barmode='group',
                  xaxis_tickangle=-45,
                  title="Best Stadiums to Bat First or Chase"
                 )
fig.show()

So SCG was the only stadium in the world cup that was best for batting first. Other stadiums didn’t make much difference while batting first or chasing.

Lets compare the number of wickets lost in the first innings and second innings in every stadium of the t20 world cup 2022.

In [15]:
fig = go.Figure()
fig.add_trace(go.Bar(x=data['venue'],
                     y=data['first innings wickets'],
                     name='First innings wickets',
                     marker_color='blue'
                    ))
fig.add_trace(go.Bar(x=data['venue'],
                     y=data['second innings wickets'],
                     name='Second innings wickets',
                     marker_color='red'
                    ))
fig.update_layout(barmode='group',
                  xaxis_tickangle=-45,
                  title="Best Stadiums to Bowl First or Defend"
                 )
fig.show()

SCG was the best stadium to bowl while defending the target. While the Optus Stadium was the best stadium to bowl first.

# Summary

* England won the most number of matches.
* Virat Kohli scored the highest in the most number of matches.
* Sam Curran was the best bowler in the most number of matches.
* More teams won by batting first.
* More teams decided to bat first.
* SCG was the best stadium to bat first.
* SCG was the best stadium to defend the target.
* The Optus stadium was the best stadium to bowl first.

Special thanks to `Aman Khawal` for his data analytics tutorials.