# 🏏 IPL Player Performance Analysis (2008–2024)


## 1. Introduction

The Indian Premier League (IPL) is one of the most exciting cricket tournaments in the world, bringing together top players from around the globe. In this project, I performed Exploratory Data Analysis (EDA) on a dataset containing season-wise statistics of IPL players.

The goal is to extract meaningful insights about:
- Top-performing batsmen and bowlers
- Year-wise player trends
- Strike rates, averages, economies
- All-rounders and fielding performance

Tools used: Python, Pandas, Matplotlib



In [None]:
import pandas as pd
df=pd.read_csv("data/ipl_data.csv")
print(df)

## 2. Dataset Overview


In [None]:
print("Shape:", df.shape)
print(df.head())
df.info()
df.describe()

## 3. Data Cleaning


#### 1. dtype: object to float


In [None]:
list=["Batting_Strike_Rate","Batting_Average","Bowling_Average","Economy_Rate","Bowling_Strike_Rate"]
for col in list:
        df[col]=pd.to_numeric(df[col],errors='coerce').astype("float")

#### 2. dtype: object to int

In [None]:
list2=["Year","Matches_Batted","Not_Outs","Runs_Scored","Balls_Faced","Centuries","Half_Centuries","Fours","Sixes","Catches_Taken","Stumpings","Matches_Bowled","Balls_Bowled","Runs_Conceded","Wickets_Taken","Four_Wicket_Hauls","Five_Wicket_Hauls"]
for col in list2:
    df[col]=pd.to_numeric(df[col],errors='coerce').astype("Int64")

#### 3. cleaning ["Highest_Score]

In [None]:
df["Highest_Score"] = (
    df["Highest_Score"]
    .astype(str)                              # ensure it's string
    .str.extract('(\d+)')                     # extract only the number part
    .astype("Int64")                          # convert to integer (nullable)
)

#### 4.Check total null values

In [None]:
df.isnull().sum()

## 4. Exploratory Data Analysis


### 4.1 Batting Analysis

1.Who are the top 10 run-scorers across all years?

In [None]:
top_scorer=(
    df.groupby("Player_Name")["Runs_Scored"]
    .sum()
    .sort_values(ascending=False)
    .head(10)
    .reset_index()
)
print(top_scorer)

2.Who had the best strike rate (min 100 runs)?


In [None]:
filtered_df=df[(df["Batting_Strike_Rate"]>100) & (df["Balls_Faced"]>=100)]
best_rate=filtered_df.sort_values(by="Batting_Strike_Rate",ascending=False).head(1)
print(best_rate[["Player_Name", "Runs_Scored", "Balls_Faced", "Batting_Strike_Rate"]])

3.Top 5 players with the most 50s and 100s?


In [None]:
half_centuries=df.groupby("Player_Name")["Half_Centuries"].sum().sort_values(ascending=False).head(5)
print("Most 50s by player")
print(half_centuries)
centuries=df.groupby("Player_Name")["Centuries"].sum().sort_values(ascending=False).head(5)
print("\nMost 100s by player\n")
print(centuries)




4.Top players with best batting average (min 200 runs)?

In [None]:
filtered_df=df[df["Runs_Scored"]>200]
best_avg=filtered_df.groupby("Player_Name")["Batting_Average"].mean().sort_values(ascending=False).head(5)
print(best_avg)

### 4.2 Bowling Analysis

1.Top 10 wicket-takers of all time?


In [None]:
top_wickets=df.groupby("Player_Name")["Wickets_Taken"].sum().sort_values(ascending=False).head(10)
print(top_wickets)

2.Who had the best economy (min 5 matches)?

In [None]:
filtered_df=df[df["Balls_Bowled"]>30]
economy= filtered_df.groupby("Player_Name")["Economy_Rate"].mean()
best_economy=economy.min()
top_economy=economy[economy==best_economy]
print(top_economy)


3.Most consistent bowlers across years?

In [None]:
filtered_df=df[df["Balls_Bowled"]>30]
consistent_bowler=filtered_df.groupby("Player_Name")["Economy_Rate"].mean().sort_values(ascending=True).head(10)
print(consistent_bowler)

4.Best bowling figures in a match?

In [None]:

df[['Wickets', 'Runs_Conceded']] = df['Best_Bowling_Match'].str.extract(r'(\d+)/(\d+)').astype('Int64')
best = df.sort_values(by=["Wickets", "Runs_Conceded"], ascending=[False, True]).head(5)
print(best[["Player_Name", "Best_Bowling_Match", "Wickets", "Runs_Conceded"]])


### 4.4 Fielding Insights

1.Most catches

In [None]:
most_catches=df.groupby("Player_Name")["Catches_Taken"].sum().sort_values(ascending=False).head(10)
print(most_catches)

2.Most stumpings

In [None]:
most_stumpings=df.groupby("Player_Name")["Stumpings"].sum().sort_values(ascending=False).head(10)
print(most_stumpings)

## 5.Data Visualisation of EDA

1.Top batsmen and bowlers across all years


In [None]:
import matplotlib.pyplot as plt

top_batters = df.groupby("Player_Name")["Runs_Scored"].sum().sort_values(ascending=False).head(10)
top_bowlers = df.groupby("Player_Name")["Wickets_Taken"].sum().sort_values(ascending=False).head(10)

fig,axes=plt.subplots(1,2,figsize=(10,5))

axes[0].bar(top_batters.index, top_batters.values, color=["red","orange","orange","blue","yellow","red","violet","cornflowerblue","red","pink"])
axes[0].set_ylabel("Total Runs Scored")
axes[0].set_xlabel("Player Name")
axes[0].set_title("🏏 Top 10 Run Scorers")
axes[0].tick_params(rotation=90, axis="x")  # Tilt names for readability
axes[0].grid(True)

axes[1].bar(top_bowlers.index, top_bowlers.values, color=["red","blue","darkblue","pink","orange","violet","yellow","blue","#1B2133","violet"])
axes[1].set_ylabel("Total Wickets Taken")
axes[1].set_xlabel("Player Name")
axes[1].set_title("🏏 Top 10 Wickets Taker")
axes[1].tick_params(rotation=90, axis='x')  # Tilt names for readability
plt.grid(True)

plt.savefig("Images/top_players.png",dpi=300, bbox_inches='tight')
plt.tight_layout()
plt.show()


📌 Observation:
 1. Virat Kohli remains the top run scorer across all IPL seasons, consistently leading with over 6000+ runs.
 2. Yuzvendra Chahal remains the top wicket taker across all IPL seasons


2.Line charts (Year-wise trends)


In [None]:
import matplotlib.pyplot as plt

players=["MS Dhoni","Virat Kohli","Rohit Sharma"]

plt.figure(figsize=(10,5))
for player in players:
    player_data=df[df["Player_Name"]==player]
    yearly_data=player_data.groupby("Year")["Runs_Scored"].sum()
    plt.plot(yearly_data.index,yearly_data.values,marker="o",label=player)
plt.title("📈 Year-wise Runs Scored by Key Players")
plt.xlabel("Year")
plt.ylabel("Runs Scored")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.savefig("Images/year_wise_trends.png",dpi=300,bbox_inches='tight')
plt.show()

3.Scatter plots (e.g. SR vs Average)

In [None]:
import matplotlib.pyplot as plt

# Filter out rows with NaN or zero in either column
scatter_df = df[(df["Batting_Average"].notna()) & (df["Batting_Strike_Rate"].notna())]
scatter_df = scatter_df[scatter_df["Runs_Scored"] > 100]  # Optional: Minimum runs filter

plt.figure(figsize=(10, 6))
plt.scatter(scatter_df["Batting_Strike_Rate"], scatter_df["Batting_Average"], alpha=0.7, color='teal')

plt.title("🎯 Strike Rate vs Batting Average")
plt.xlabel("Strike Rate")
plt.ylabel("Batting Average")
plt.grid(True)
plt.savefig("Images/sr_vs_batting_avg.png",dpi=300,bbox_inches='tight')
plt.tight_layout()
plt.show()


4.Box plots (distribution of scores)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
plt.boxplot(df["Runs_Scored"].dropna())
plt.show()

5.Box plots(distribution od economy)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
plt.boxplot(df["Economy_Rate"].dropna())
plt.show()

## 6. Conclusion

- Virat Kohli is top run scorer across all IPL seasons
- Andre Russell is playing with best batting strike rate among all batsmen
- David Warner had scored most 50s
- Virat Kohli had scores most 100s
- Rinku Singh having best batting average among all
- Yuzvendra Chahal is top wicket taker.
- MS Dhoni had actively grabbed most catches and also done most stumpings
- Kohli, Dhoni, and Rohit consistently lead in batting metrics
- Rashid Khan and Bhuvneshwar Kumar stand out in bowling economy

This EDA project strengthened my skills in pandas, groupby, and matplotlib. It also taught me how to extract business-level insights from sports data.
