# Cricket Analytics Masterclass: IPL Edition(2023)

Explore the exciting world of Indian Premier League (IPL) through this data analysis project! Discover which teams shine the brightest, who the star players are, and the strategies behind winning tosses. Dive into venue-specific strengths, learn about nail-biting margins of victory, and understand the role of umpires in this cricket extravaganza. It's a fascinating journey through IPL's rich history from 2008 to 2022!

## About the Author

**HASHMI MOHSIN BHATT**

**B.TECH CSE_AI**

- GitHub: https://github.com/HASHMI2503/IPL-Performance-Analysis (Same Project)

Feel free to connect and reach out! feedback are welcome.


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
ipl = pd.read_csv('/kaggle/input/ipl-2008-to-2021-all-match-dataset/IPL_Matches_2008_2022.csv')

### Data Check-in: 7 Quick Questions

1. **Size of the Squad?**
   - How many players (rows) and stats (columns) are we dealing with?

2. **First Impressions?**
   - Sneak peek at the starting lineup (first few rows).

3. **What's in the Mix?**
   - Quick rundown on the types of data we've got.

4. **Any Missing Players?**
   - Spotted any gaps in the team? Check for missing values.

5. **Math in Action?**
   - What do the numbers tell us? Quick stats on the players' performance.

6. **Seeing Double?**
   - Any players pulling double duty? Look out for duplicated entries.

7. **Team Chemistry?**
   - Do certain stats play well together? Check the correlation.

Just 7 chill questions to vibe with our dataset! 🚀


In [None]:
ipl.shape

In [None]:
ipl.head()

In [None]:
ipl.info()

In [None]:
ipl.isnull().sum()

In [None]:
ipl.describe()

In [None]:
ipl.duplicated().sum()

In [None]:
# ipl.corr()
# The dataset contains numerous categorical variables,
# making it challenging to pinpoint correlations between data columns.

### Key Observations & Next Steps:

**Data Size:**
- Medium-sized dataset with 950 rows and 19 columns.

**Data Types:**
- Opportunity to convert categorical columns ('TossDecision,' 'Margin,' 'SuperOver') to integers for clarity.

**Null Values:**
- Presence of null values in some columns.
- **Handling Null Values:**
  - → Focus on filling null values, especially in critical columns.
  - → Explore strategies for imputing missing values.

**Dropping Column:**
- → Consider dropping 'Method' due to high null count.
- → Assess the impact on overall data quality.

**Duplicates:**
- No duplicated values found in the dataset.
- → Keep an eye out for potential duplicated entries.


### Convert Columns to Integers:

- 'TossDecision'
- 'Margin'
- 'SuperOver'


In [None]:
# Convert 'TossDecision' column to integers (map 'bat' to 1 and 'field' to 0)
ipl['TossDecision'] = ipl['TossDecision'].map({'bat': 1, 'field': 0}).astype(int)

In [None]:
# Convert 'SuperOver' column to integers (map 'N' to 0 and 'Y' to 1), fill null values with 0
ipl['SuperOver'] = ipl['SuperOver'].map({'N': 0, 'Y': 1}).fillna(0).astype(int)

In [None]:
# Convert 'Margin' column to integers, fill null values with 0
ipl['Margin'] = ipl['Margin'].fillna(0).astype(int)

In [None]:
ipl.sample(10)

In [None]:
ipl.info()

### Fill Nulls in 'City' Column:

Null values in the 'City' column are filled with 'Dubai' for consistency and representation clarity.


In [None]:
# Fill null values in 'City' column with 'Dubai'
ipl['City'].fillna('Dubai', inplace=True)

In [None]:
ipl[ipl['City'] == 'Dubai']

In [None]:
# Fill null values in 'WinningTeam' column with 'None'
ipl['WinningTeam'].fillna('None', inplace=True)

In [None]:
# Drop the 'method' column
ipl.drop(columns=['method'], inplace=True)

In [None]:
# Fill null values in 'Player_of_Match' column with 'None'
ipl['Player_of_Match'].fillna('None', inplace=True)

In [None]:
ipl.isnull().sum()

### Handling Null Values:

All null values in the dataset have been addressed:

- 'City' column: Filled with 'Dubai' for consistency.
- 'SuperOver' column: Filled with 0 for clarity (mapped 'N' to 0).
- 'Margin' column: Filled with 0.
- 'Season' column: No null values after conversion.
- 'Player_of_Match' column: Filled with 'None'.
- 'WinningTeam' column: Filled with 'None'.
- 'Method' column: Dropped to eliminate high null count.

### The dataset is now free of null values, enhancing its completeness and suitability for analysis.


### '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


# Let the Analysis Begin and Visualization :



In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


### 1. Team Performance Analysis:



In [None]:
# Unique team names
pd.unique(pd.concat([ipl['Team1'], ipl['Team2']]))

In [None]:
# Team-wise wins, total matches played, and losses
team_records = pd.DataFrame(index=pd.unique(pd.concat([ipl['Team1'], ipl['Team2']])))
team_records['Total Played'] = pd.concat([ipl['Team1'], ipl['Team2']]).value_counts().reset_index(name='Games Played').groupby('index').sum()['Games Played']
team_records['Wins'] = ipl['WinningTeam'].value_counts()
team_records['Losses'] = team_records['Total Played'] - team_records['Wins']
team_records


In [None]:
# Sorting the teams by total matches played
sorted_teams_by_matches_played = team_records.sort_values(by='Total Played', ascending=False)

# Bar plot for total matches played by each team with numbers
plt.figure(figsize=(16, 8))
bar_plot = sns.barplot(x=sorted_teams_by_matches_played.index, y=sorted_teams_by_matches_played['Total Played'])
plt.xticks(rotation=90)
plt.title('Total Matches Played by Each Team')

# Add annotations with numbers
for index, value in enumerate(sorted_teams_by_matches_played['Total Played']):
    bar_plot.text(index, value, str(value), ha='center', va='bottom')

plt.show()


In [None]:
plt.figure(figsize=(12, 6))

# Stacked bar plot for wins and losses
stacked_bar_plot = team_records[['Wins', 'Losses']].sort_values(by='Wins', ascending=False).plot(kind='bar', stacked=True)

# Add annotations with numbers
for container in stacked_bar_plot.containers:
    plt.bar_label(container, fmt='%d', label_type='edge', fontsize=10, color='black')

plt.xticks(rotation=90)
plt.title('Team-wise Wins and Losses')
plt.show()


In [None]:
# Total toss Wins
toss_wins = ipl['TossWinner'].value_counts().sort_values(ascending=False)
toss_wins

In [None]:
# Plotting
plt.figure(figsize=(12, 6))
toss_wins.plot(kind='bar', color='purple')

# Adding numbers on top of each bar
for index, value in enumerate(toss_wins):
    plt.text(index, value + 0.1, str(value), ha='center', va='bottom')
    
    
plt.title('Total Toss Wins by Teams')
plt.xlabel('Teams')
plt.ylabel('Number of Toss Wins')
plt.xticks(rotation=45, ha='right')
plt.show()


In [None]:
# Winners of the finals for each season
ipl[ipl['MatchNumber'] == 'Final'][['Season', 'WinningTeam']].sort_values('Season').reset_index(drop=True)


In [None]:
# Number of times each team won the final
final_wins = ipl[ipl['MatchNumber'] == "Final"]['WinningTeam'].value_counts()
final_wins

In [None]:
# Plotting
plt.figure(figsize=(12, 6))
bar_plot = final_wins.plot(kind='bar', color='red')

# Adding numbers on top of each bar
for index, value in enumerate(final_wins):
    plt.text(index, value + 0.1, str(value), ha='center', va='bottom')

plt.title('Number of Times Each Team Won the Final')
plt.xlabel('Teams')
plt.ylabel('Number of Final Wins')
plt.xticks(rotation=45, ha='right')
plt.show()


### Ee Sala Cup Namde 

### 2. Player Performance Analysis:

In [None]:
# Top 10 players with the most "Player of the Match" awards
top_players = ipl['Player_of_Match'].value_counts().head(10)
top_players

In [None]:
# Plotting
plt.figure(figsize=(12, 6))
bar_plot = top_players.plot(kind='bar', color='royalblue')

# Adding numbers on top of each bar
for index, value in enumerate(top_players):
    plt.text(index, value + 0.1, str(value), ha='center', va='bottom')

plt.title('Top 10 Players with the Most "Player of the Match" Awards')
plt.xlabel('Players')
plt.ylabel('Number of Awards')
plt.xticks(rotation=45, ha='right')
plt.show()


### 3. Venue Analysis:

In [None]:
# Display team-wise win count at each venue
team_venue_wins = ipl.groupby(['Venue', 'WinningTeam']).size().unstack().fillna(0)
team_venue_wins


In [None]:
# Plotting the heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(team_venue_wins, cmap='coolwarm', annot=True, fmt='g', cbar_kws={'label': 'Number of Wins'})
plt.title('Team-wise Wins at Each Venue')
plt.show()

### 4. Team Wins Analysis:

In [None]:
# Grouping data by season and team to get the count of wins
season_team_wins = ipl.groupby(['Season', 'WinningTeam']).size().unstack(fill_value=0)

In [None]:
# Plotting
plt.figure(figsize=(15, 8))
heatmap = sns.heatmap(season_team_wins, cmap='YlGnBu', annot=True, fmt='g', cbar_kws={'label': 'Number of Wins'})
plt.title('Team-wise Wins in Each Season')
plt.xlabel('Teams')
plt.ylabel('Seasons')
plt.show()


In [None]:
# Team that won the most matches in each season
season_most_wins = ipl.groupby(['Season', 'WinningTeam']).size().groupby('Season').idxmax().reset_index(name='Most Wins')

# Team that won the final in each season
final_winners = ipl[ipl['MatchNumber'] == 'Final'][['Season', 'WinningTeam']].reset_index(drop=True)

# Merge the two DataFrames on 'Season'
season_wise_results = pd.merge(season_most_wins, final_winners, on='Season', how='left')

# Extract the team name from the 'Most Wins' column
season_wise_results['Most Wins'] = season_wise_results['Most Wins'].apply(lambda x: x[1])

season_wise_results


In [None]:
# Instances where the team with most wins and final winning team are the same
consistent_winners_df = season_wise_results[season_wise_results['Most Wins'] == season_wise_results['WinningTeam']]
consistent_winners_df


In [None]:
# calculate the ratio of total seasons to consistent winners
total_seasons = season_wise_results.shape[0]
consistent_winners_count = consistent_winners_df.shape[0]
consistent_winners_ratio = consistent_winners_count / total_seasons * 100

total_seasons, consistent_winners_count, consistent_winners_ratio


In [None]:
# Plotting
labels = ['Consistent Winners', 'Non-Consistent Winners']
sizes = [consistent_winners_count,total_seasons - consistent_winners_count]
colors = ['gold', 'lime']

plt.figure(figsize=(8, 8))
plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)
plt.title('Ratio of Consistent Winners to Total Seasons')
plt.show()


### 5. Toss Impact:

In [None]:
# team won the toss
toss_counts = ipl['TossWinner'].value_counts()

# toss-winning team is also the match-winning team
toss_and_match_winner = ipl[ipl['TossWinner'] == ipl['WinningTeam']]['TossWinner'].value_counts()

# Combine both into a DataFrame
toss_impact_analysis = pd.DataFrame({'Tosses_Won': toss_counts, 'Matches_Won_After_Winning_Toss': toss_and_match_winner})

# Ratio
toss_impact_analysis['Win_Ratio'] = toss_impact_analysis['Matches_Won_After_Winning_Toss'] / toss_impact_analysis['Tosses_Won']

toss_impact_analysis

In [None]:
# Plotting
plt.figure(figsize=(12, 6))
toss_impact_analysis[['Tosses_Won', 'Matches_Won_After_Winning_Toss']].sort_values(by='Tosses_Won', ascending=False).plot(kind='bar', stacked=True, color=['gold', 'fuchsia'])

# Add annotations with numbers
for container in plt.gca().containers:
    plt.bar_label(container, fmt='%d', label_type='edge', fontsize=10, color='black')

plt.title('Toss Impact Analysis')
plt.xlabel('Teams')
plt.ylabel('Count')
plt.show()


### 6. Margin Analysis:

In [None]:
# Top 10 matches with the largest margin of victory
margin_analysis = ipl[['Team1', 'Team2', 'WinningTeam', 'Margin']]
margin_analysis.nlargest(10, 'Margin')


In [None]:
# Average winning margin for each team
average_margin = ipl.groupby('WinningTeam')['Margin'].mean().sort_values(ascending=False)
average_margin 

In [None]:
# Plotting
plt.figure(figsize=(10, 8))
average_margin.plot(kind='barh', color='darkolivegreen')
plt.title('Average Winning Margin for Each Team')
plt.xlabel('Average Winning Margin')
plt.ylabel('Teams')
plt.grid(axis='x')
plt.show()


### 7. Umpire Analysis:

In [None]:
# Total number of games each umpire officiated
pd.concat([ipl['Umpire1'], ipl['Umpire2']]).value_counts().head(5)


In [None]:
final_matches = ipl[ipl['MatchNumber'] == 'Final']

# Final matches and count occurrences
pd.concat([final_matches['Umpire1'], final_matches['Umpire2']]).value_counts().head(5)


In [None]:
# Umpire witnessed the most wins and the corresponding count
umpires_combined = pd.concat([ipl['Umpire1'], ipl['Umpire2']])
umpire_most_wins = umpires_combined[umpires_combined.notnull()].value_counts().idxmax()

# Filter umpires who have officiated in more than 20 matches
umpires_over_20_matches = umpires_combined.value_counts()[umpires_combined.value_counts() > 20].index

umpires_data = []

for umpire in umpires_over_20_matches:
    matches_with_umpire = ipl[(ipl['Umpire1'] == umpire) | (ipl['Umpire2'] == umpire)]
    umpire_most_wins = matches_with_umpire['WinningTeam'].value_counts().idxmax() if not matches_with_umpire.empty else None
    total_games_played = len(matches_with_umpire)
    total_wins = matches_with_umpire['WinningTeam'].value_counts().get(umpire_most_wins, 0)
    
    umpires_data.append([umpire, umpire_most_wins, total_games_played, total_wins])

# List to Df
pd.DataFrame(umpires_data, columns=['Umpire', 'Team_with_Most_Wins', 'Total_Matches_Played', 'Wins'])


### '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''


# Conclusion

## Team Performance:

- **Mumbai Indians** stand out as the most successful team with the highest number of overall wins and final victories.
- **Kolkata Knight Riders** and **Chennai Super Kings** also have notable performance, consistently securing wins.
- **Gujarat Titans**, **Lucknow Super Giants**, and **Pune Warriors** show relatively fewer wins, indicating room for improvement.
- Total unique teams played in IPL: **19**

## Player Performance:

- **AB de Villiers** (15), **Chris Gayle** (14), and **Rohit Sharma** (13) emerge as the top players, frequently earning the "Player of the Match" awards.

## Toss Wins:

- **Mumbai Indians** lead in winning tosses, showcasing a strategic advantage before matches.
- Highest number of toss wins: **Mumbai Indians (98)**

## Venue Analysis:

- Teams exhibit venue-specific strengths, with **Mumbai Indians** excelling at Wankhede Stadium.

## Margin Analysis:

- In the top 5 most margin wins, **Royal Challengers Bangalore** has three instances, emphasizing their dominance in high-margin victories.
- High margin wins ratio for specific teams:
  - **Chennai Super Kings (CSK)**: 20.37
  - **Lucknow Super Giants (LSG)**: 20.11
  - **Mumbai Indians (MI)**: 19.62

## Umpire Performance:

- **S Ravi** has officiated the highest number of IPL matches, totaling 131 appearances.
- Notably, **SJA Taufel** (5), **Nitin Menon** (4), and **HDPK Dharmasena** (4) have consistently officiated in final matches, showcasing their reliability in crucial encounters.
- **S Ravi**, **AK Chaudhary**, and **HDPK Dharmasena** have officiated in over 20 matches each, with **Mumbai Indians** being the team with the most wins in matches officiated by them.
