# 🏆 Comprehensive IPL Data Analysis (2008–2024)

## Project Overview
This repository contains a full-stack, data-driven analysis of the Indian Premier League (IPL) from 2008 to 2024. The project utilizes advanced Python libraries (Pandas, Plotly, Seaborn, Scikit-learn) to process ball-by-ball and match-level data, generate interactive visualizations, and derive actionable strategic insights.

**Key Focus Areas:**
* Team performance evaluation and win/loss ratio analysis.
* In-depth player archetype identification using K-Means clustering.
* Match dynamics, including toss impact and powerplay/death over comparison.
* Strategic recommendations for team management.

## Setup and Execution
1.  Ensure you have the following data files in the root directory: `matches.csv` and `deliveries (1).csv`.
2.  Install required Python libraries: `pip install pandas numpy plotly matplotlib seaborn scikit-learn`
3.  The analysis follows this logical flow:
    * `data_cleaning.py` (Must be run first)
    * `team_analysis.py`
    * `player_analysis.py`
    * `match_venue_analysis.py`
    * `strategic_insights.py`

## 📋 Executive Summary: IPL Performance Insights

| Key Finding | Data Insight | Strategic Implication |
| :--- | :--- | :--- |
| **Dominance Shift** | MI and CSK maintain the highest historical win percentages (~58-60%), yet teams like KKR and GT are rapidly closing the gap. | Focus on sustained core team management over high turnover. Stability drives long-term success. |
| **Toss Bias** | Winning the toss grants a **~52%** overall win probability, with a clear recent trend toward **Fielding first**. | Teams should prioritize adaptability over fixed strategies, but the data marginally supports chasing in most scenarios. |
| **Batting Strategy** | **Explosive hitters** (high SR, low Balls Faced) and **Anchor players** (low SR, high Balls Faced) form distinct, essential clusters. | Optimal batting orders must balance these two archetypes to maximize both innings longevity and run-rate acceleration. |
| **Death Overs** | Death Overs (16-20) consistently yield the highest run rates but are also the most prone to wickets. | Investment in specialist Death Bowlers with high accuracy and variation is critical for controlling game momentum. |
| **Venue Advantage** | Wankhede, Mumbai, is the highest-scoring venue (Avg RR > 8.5), while Chepauk favors spin and lower scoring. | Tailor team composition (pace/spin ratio) and match-day strategy based on specific venue characteristics. |

***

## 💡 Strategic Recommendations

1.  **Optimal Batting Order (from Clustering):**
    * **Anchor Archetype:** (Low SR, High Balls Faced) - Ideal for **Opening and No. 3**. Role is to stabilize the innings and bat deep.
    * **High-Impact Finisher Archetype:** (Very High SR, Low Balls Faced) - Crucial for **Death Overs (No. 5-7)**. Maximizing strike rate in limited time.
    * **Balanced Elite Archetype:** (High Runs, High SR, High Balls) - The **No. 4** engine room. Capable of both anchoring and accelerating.

2.  **Death Bowling Investment:** The high volatility in Death Overs necessitates investment in two distinct death-over specialists: one with extreme pace/yorker accuracy, and one with a deceptive slower-ball repertoire.

3.  **Venue-Specific Tactics:** Training must mimic the predicted pitch behavior (e.g., excessive use of spin on slow, turning pitches like Chennai) before away games.

## 🔮 Conclusion and Future Scope

The IPL data clearly indicates that **consistency and stability** (e.g., MI/CSK success) coupled with **strategic resource allocation** (specialized players for key phases) are the cornerstones of championship success. Future analysis should focus on building a robust Win Probability Added (WPA) model to quantify player impact in high-pressure situations.

In [1]:
import os
print(os.getcwd())


c:\Users\Vishal\OneDrive\Desktop\IPL_DATA_ANALYSIS


In [2]:
import os
print(os.listdir(os.getcwd()))


['deliveries (1).csv', 'ipl_unique_analyses (1).ipynb', 'matches.csv']


In [3]:
import os

cwd = os.getcwd()
print("📂 Current working directory:", cwd)

print("CSV files in current folder:")
for file in os.listdir(cwd):
    if file.endswith(".csv"):
        print(file)



📂 Current working directory: c:\Users\Vishal\OneDrive\Desktop\IPL_DATA_ANALYSIS
CSV files in current folder:
deliveries (1).csv
matches.csv


In [4]:
import os

cwd = os.getcwd()
print("📂 Current working directory:", cwd)

print("Searching for matches.csv and deliveries*.csv files...\n")
for root, dirs, files in os.walk(cwd):
    for file in files:
        if file.endswith(".csv") and ("matches" in file.lower() or "deliveries" in file.lower()):
            print(f"Found CSV: {os.path.join(root, file)}")



📂 Current working directory: c:\Users\Vishal\OneDrive\Desktop\IPL_DATA_ANALYSIS
Searching for matches.csv and deliveries*.csv files...

Found CSV: c:\Users\Vishal\OneDrive\Desktop\IPL_DATA_ANALYSIS\deliveries (1).csv
Found CSV: c:\Users\Vishal\OneDrive\Desktop\IPL_DATA_ANALYSIS\matches.csv


In [5]:
import pandas as pd
from pathlib import Path

# Exact file paths
candidates = [
    r"c:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv",
    r"c:\Users\Vishal\OneDrive\Desktop\mini project\matches.csv"
]

data_path = None
for c in candidates:
    p = Path(c)
    if p.exists() and p.is_file() and p.suffix.lower() == '.csv':
        data_path = str(p)
        break

if not data_path:
    raise FileNotFoundError("⚠️ CSV file not found! Check your filename and path.")

print('Using data file:', data_path)

df = pd.read_csv(data_path)
print('Loaded rows:', len(df))
df.head()
df.tail


Using data file: c:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv
Loaded rows: 260920


<bound method NDFrame.tail of         match_id  inning           batting_team                 bowling_team  \
0         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
1         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
2         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
3         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
4         335982       1  Kolkata Knight Riders  Royal Challengers Bangalore   
...          ...     ...                    ...                          ...   
260915   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260916   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260917   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260918   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   
260919   1426312       2  Kolkata Knight Riders          Sunrisers Hyderabad   

        o

In [6]:
# Data setup — adjust `data_path` if your CSV sits elsewhere.
import os
import pandas as pd
from pathlib import Path

# Try to auto-detect the deliveries.csv file in common paths

candidates = [


 "deliveries (1).csv",
    "matches.csv"

]
data_path = None
for c in candidates:
    p = Path(c)
    if p.exists() and p.is_file() and p.suffix.lower() == '.csv':
        data_path = str(p)
        break

# Fallback: expect the user to set data_path manually
if not data_path:
    # If your file is in the same folder as the notebook, use 'deliveries.csv'
    data_path = 'deliveries.csv'  # <<-- change this if needed

print('Using data file:', data_path)
df = pd.read_csv(data_path)
print('Loaded rows:', len(df))
df.head()

Using data file: deliveries (1).csv
Loaded rows: 260920


Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batter,bowler,non_striker,batsman_runs,extra_runs,total_runs,extras_type,is_wicket,player_dismissed,dismissal_kind,fielder
0,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,1,SC Ganguly,P Kumar,BB McCullum,0,1,1,legbyes,0,,,
1,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,2,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
2,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,3,BB McCullum,P Kumar,SC Ganguly,0,1,1,wides,0,,,
3,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,4,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,
4,335982,1,Kolkata Knight Riders,Royal Challengers Bangalore,0,5,BB McCullum,P Kumar,SC Ganguly,0,0,0,,0,,,


In [7]:
# Imports & plotting style (run once)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style='whitegrid')
plt.rcParams['figure.dpi'] = 120
%matplotlib inline

## 1) Batter Boundary Dependency

**Goal:** Show what percentage of each batter's runs come from boundaries (4s & 6s) to identify boundary-dependent players vs strike-rotators.


In [8]:
import pandas as pd
import plotly.express as px

# -------------------------
# Calculate Boundary Dependency
# -------------------------
batter_runs = df.groupby('batter').agg(batsman_runs=('batsman_runs','sum')).reset_index()
boundary_runs = df[df['batsman_runs'].isin([4,6])].groupby('batter').agg(boundary_runs=('batsman_runs','sum')).reset_index()
batter = batter_runs.merge(boundary_runs, on='batter', how='left').fillna(0)
batter['boundary_ratio'] = batter['boundary_runs'] / batter['batsman_runs'] * 100

# Top 20 batters by total runs
top = batter.sort_values('batsman_runs', ascending=False).head(20)

# -------------------------
# Plotly Express Bar Chart
# -------------------------
fig = px.bar(
    top,
    x='batter',
    y='boundary_ratio',
    text=top['boundary_ratio'].apply(lambda x: f"{x:.1f}%"),
    color='boundary_ratio',
    color_continuous_scale='Viridis',
    title='Top 20 Batters — % Runs from Boundaries',
    labels={'boundary_ratio': '% Runs from Boundaries', 'batter': 'Batter'},
    template='plotly_dark',
    height=500
)

# Rotate x-axis labels
fig.update_layout(
    xaxis_tickangle=-45,
    xaxis_tickfont_size=12,
    yaxis=dict(title='% Runs from Boundaries'),
    coloraxis_showscale=False
)

# Show chart
fig.show()


In [10]:
! pip show ipywidgets

import sys
!{sys.executable} -m pip install ipywidgets



Name: ipywidgets
Version: 8.1.7
Summary: Jupyter interactive widgets
Home-page: http://jupyter.org
Author: Jupyter Development Team
Author-email: jupyter@googlegroups.com
License: BSD 3-Clause License
Location: C:\Users\Vishal\AppData\Roaming\Python\Python313\site-packages
Requires: comm, ipython, jupyterlab_widgets, traitlets, widgetsnbextension
Required-by: 



[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact, Dropdown

# Load dataset
df = pd.read_csv(r"c:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv")

# Create run type category including Dot balls
df['run_type'] = df['batsman_runs'].apply(
    lambda x: 'Dot' if x == 0 else
              'Single' if x == 1 else
              'Double' if x == 2 else
              'Triple' if x == 3 else
              'Four' if x == 4 else
              'Six' if x == 6 else 'Other'
)

# Unique batsmen list
batsmen = sorted(df['batter'].dropna().unique().tolist())

def plot_run_type_distribution(batsman_name):
    df_batsman = df[df['batter'] == batsman_name]
    
    if df_batsman.empty:
        print(f"❌ {batsman_name} ka data nahi mila")
        return

    run_counts = df_batsman.groupby('run_type').size().reset_index(name='count')
    run_counts = run_counts.set_index('run_type').reindex(['Dot', 'Single', 'Double', 'Triple', 'Four', 'Six'], fill_value=0)

    plt.figure(figsize=(10, 6))
    sns.barplot(x=run_counts.index, y=run_counts['count'], palette='plasma')

    plt.title(f"Run Type Distribution — {batsman_name}", fontsize=16)
    plt.xlabel("Run Type", fontsize=12)
    plt.ylabel("Count", fontsize=12)
    plt.grid(axis='y')
    plt.show()

# Interactive dropdown widget
interact(plot_run_type_distribution, batsman_name=Dropdown(options=batsmen, description="Batsman"))


interactive(children=(Dropdown(description='Batsman', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A …

<function __main__.plot_run_type_distribution(batsman_name)>

In [12]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact, Dropdown

# Load dataset
df = pd.read_csv(r"c:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv")

# Create run type category including Dot balls
df['run_type'] = df['batsman_runs'].apply(
    lambda x: 'Dot' if x == 0 else
              'Single' if x == 1 else
              'Double' if x == 2 else
              'Triple' if x == 3 else
              'Four' if x == 4 else
              'Six' if x == 6 else 'Other'
)

# Unique batsmen list
batsmen = sorted(df['batter'].dropna().unique().tolist())

def plot_run_type_distribution(batsman_name):
    df_batsman = df[df['batter'] == batsman_name]
    
    if df_batsman.empty:
        print(f"❌ {batsman_name} ka data nahi mila")
        return

    run_counts = df_batsman.groupby('run_type').size().reset_index(name='count')
    run_counts = run_counts.set_index('run_type').reindex(['Dot', 'Single', 'Double', 'Triple', 'Four', 'Six'], fill_value=0)

    plt.figure(figsize=(10, 6))
    sns.barplot(x=run_counts.index, y=run_counts['count'], palette='plasma')

    plt.title(f"Run Type Distribution — {batsman_name}", fontsize=16)
    plt.xlabel("Run Type", fontsize=12)
    plt.ylabel("Count", fontsize=12)
    plt.grid(axis='y')
    plt.show()

# Interactive dropdown widget
interact(plot_run_type_distribution, batsman_name=Dropdown(options=batsmen, description="Batsman"))


interactive(children=(Dropdown(description='Batsman', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A …

<function __main__.plot_run_type_distribution(batsman_name)>

**Analysis:** Players with high boundary percentages (≥60%) are clear power hitters; those with low percentages rely on ones and twos. Use this to identify finishers vs accumulators.


## 2) Bowler Pressure Index

**Goal:** Plot dot-ball percentage against wicket percentage to find bowlers who build pressure (high dot %) and also convert it to wickets.


In [13]:
import pandas as pd
import plotly.express as px

# -------------------------
# Bowler Pressure Index
# -------------------------
df['is_dot'] = df['total_runs'] == 0

bowler_stats = df.groupby('bowler').agg(
    balls=('ball','count'),
    dots=('is_dot','sum'),
    wickets=('is_wicket','sum')
).reset_index()

bowler_stats['dot_pct'] = bowler_stats['dots'] / bowler_stats['balls'] * 100
bowler_stats['wicket_pct'] = bowler_stats['wickets'] / bowler_stats['balls'] * 100

# Filter regular bowlers
regular = bowler_stats[bowler_stats['balls'] >= 300].sort_values('dot_pct', ascending=False).head(30)

# -------------------------
# Interactive Scatter Plot
# -------------------------
fig = px.scatter(
    regular,
    x='dot_pct',
    y='wicket_pct',
    size='balls',                     # size of bubble ~ balls bowled
    color='wicket_pct',               # color = wicket percentage
    hover_name='bowler',              # shows bowler name on hover
    size_max=60,
    color_continuous_scale='Viridis',
    title='Bowler Pressure Index — Dot % vs Wicket % (bubble size ~ balls bowled)',
    labels={'dot_pct':'Dot Ball %', 'wicket_pct':'Wicket %'}
)

fig.update_layout(
    xaxis=dict(title='Dot Ball Percentage'),
    yaxis=dict(title='Wicket Percentage'),
    template='plotly_dark'
)

fig.show()


**Analysis:** Bowlers in the top-right quadrant (high dot % & high wicket %) are the most effective — they choke scoring and take breakthroughs.


## 3) Batting Momentum by Overs (Heatmap)

**Goal:** Heatmap of average runs per over (1–20) for each team to reveal when teams score most.


In [14]:
import pandas as pd
import plotly.express as px

# -------------------------
# Batting Momentum Heatmap
# -------------------------
df['over'] = df['over'].astype(int)

# Calculate runs per over per team
team_over = df.groupby(['batting_team','over']).agg(
    runs=('total_runs','sum'),
    balls=('ball','count')
).reset_index()

# Normalize to 6-ball over
team_over['runs_per_over'] = team_over['runs'] / team_over['balls'] * 6

# Pivot for heatmap
pivot = team_over.pivot(index='batting_team', columns='over', values='runs_per_over').fillna(0)

# -------------------------
# Interactive Heatmap with Plotly
# -------------------------
fig = px.imshow(
    pivot,
    labels=dict(x="Over", y="Team", color="Runs per Over"),
    x=pivot.columns,
    y=pivot.index,
    color_continuous_scale='YlOrRd',
    aspect="auto",
    title="Batting Momentum Heatmap — Average Runs per Over by Team"
)

fig.update_layout(
    xaxis=dict(title='Over (1-20)'),
    yaxis=dict(title='Team'),
    template='plotly_dark'
)

fig.show()


**Analysis:** Look for teams with bright bands in powerplay (overs 1–6) or death overs (16–20). This helps understand team strategies.


## 4) Dismissal Pattern of Star Players

**Goal:** For selected star players, show how they are dismissed (caught, bowled, lbw, etc.).


In [15]:
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from ipywidgets import interact, SelectMultiple

# -------------------------
# Unique players in dataset
# -------------------------
players = df['player_dismissed'].dropna().unique().tolist()

# Precompute dismissal counts
dismissals = df.groupby(['player_dismissed','dismissal_kind']).size().reset_index(name='count')

# -------------------------
# Function to plot multiple players
# -------------------------
def dismissal_pattern_multi(selected_players):
    if not selected_players:
        print("Select at least one player")
        return
    
    n = len(selected_players)
    fig = make_subplots(
        rows=1, cols=n,
        specs=[[{'type':'domain'}]*n],
        subplot_titles=selected_players
    )
    
    for i, player in enumerate(selected_players):
        sub = dismissals[dismissals['player_dismissed'] == player]
        if sub.empty:
            continue
        fig.add_trace(
            go.Pie(
                labels=sub['dismissal_kind'],
                values=sub['count'],
                name=player,
                hole=0.3,
                textinfo='percent+label'
            ),
            row=1, col=i+1
        )
    
    fig.update_layout(
        title_text="Dismissal Patterns — Multiple Players",
        template='plotly_dark',
        showlegend=False
    )
    
    fig.show()

# -------------------------
# Interactive Widget: Multiple Selection
# -------------------------
interact(
    dismissal_pattern_multi,
    selected_players=SelectMultiple(
        options=players,
        value=[players[0]],
        description='Players',
        rows=10
    )
)



interactive(children=(SelectMultiple(description='Players', index=(0,), options=('SC Ganguly', 'RT Ponting', '…

<function __main__.dismissal_pattern_multi(selected_players)>

**Analysis:** This reveals common weaknesses (e.g., more caught vs bowled) which can be used for targeted bowling strategies.


## 5) Match-Turning Overs

**Goal:** Identify overs across matches where runs >= 15 or wickets >=2 — potential turning overs. Plot frequency and examples.


In [16]:
import pandas as pd
import plotly.express as px

# -------------------------
# Match-turning Overs
# -------------------------
# Aggregate runs and wickets per over per match
over_match = df.groupby(['match_id','over']).agg(
    runs=('total_runs','sum'),
    wickets=('is_wicket','sum')
).reset_index()

# Define turning overs: high impact
turning = over_match[(over_match['runs'] >= 15) | (over_match['wickets'] >= 2)]

# Count frequency of turning overs
freq = turning.groupby('over').size().reset_index(name='count')

# -------------------------
# Interactive Bar Chart
# -------------------------
fig = px.bar(
    freq,
    x='over',
    y='count',
    text='count',
    labels={'over':'Over', 'count':'Number of Turning Overs'},
    title='Which Overs Turn Matches Most Often (runs>=15 or wickets>=2)',
    color='count',
    color_continuous_scale='Viridis',
    height=450
)

# Add labels on top of bars
fig.update_traces(textposition='outside')

# Style layout
fig.update_layout(
    template='plotly_dark',
    xaxis=dict(dtick=1),
    yaxis=dict(title='Count of Turning Overs'),
)

fig.show()


**Analysis:** Peaks show overs that change momentum — teams and captains can plan bowling/batting changes around these.


In [17]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact, Dropdown

# Load dataset
df = pd.read_csv(r"c:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv")

# Create run type category including Dot balls
df['run_type'] = df['batsman_runs'].apply(
    lambda x: 'Dot' if x == 0 else
              'Single' if x == 1 else
              'Double' if x == 2 else
              'Triple' if x == 3 else
              'Four' if x == 4 else
              'Six' if x == 6 else 'Other'
)

# Unique batsmen list
batsmen = sorted(df['batter'].dropna().unique().tolist())

def plot_run_type_distribution(batsman_name):
    df_batsman = df[df['batter'] == batsman_name]
    
    if df_batsman.empty:
        print(f"❌ {batsman_name} ka data nahi mila")
        return

    run_counts = df_batsman.groupby('run_type').size().reset_index(name='count')
    run_counts = run_counts.set_index('run_type').reindex(['Dot', 'Single', 'Double', 'Triple', 'Four', 'Six'], fill_value=0)

    plt.figure(figsize=(10, 6))
    sns.barplot(x=run_counts.index, y=run_counts['count'], palette='plasma')

    plt.title(f"Run Type Distribution — {batsman_name}", fontsize=16)
    plt.xlabel("Run Type", fontsize=12)
    plt.ylabel("Count", fontsize=12)
    plt.grid(axis='y')
    plt.show()

# Interactive dropdown widget
interact(plot_run_type_distribution, batsman_name=Dropdown(options=batsmen, description="Batsman"))


interactive(children=(Dropdown(description='Batsman', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A …

<function __main__.plot_run_type_distribution(batsman_name)>

## 6) Bowler vs Batter Rivalries

**Goal:** Head-to-head runs and dismissals between top batters and top bowlers.


In [18]:
import pandas as pd
import plotly.express as px

# -------------------------
# Load dataset
# -------------------------
df = pd.read_csv(r"c:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv")

# -------------------------
# Top 5 batsmen by total runs
# -------------------------
top_bats = (
    df.groupby('batter')
      .agg(runs=('batsman_runs', 'sum'))
      .reset_index()
      .sort_values('runs', ascending=False)
      .head(5)['batter']
      .tolist()
)

# Top 5 bowlers by balls bowled
top_bows = (
    df.groupby('bowler')
      .agg(balls=('ball', 'count'))
      .reset_index()
      .sort_values('balls', ascending=False)
      .head(5)['bowler']
      .tolist()
)

# Filter dataset for top pairs
pairs = df[df['batter'].isin(top_bats) & df['bowler'].isin(top_bows)]

# Aggregate runs per batter-bowler pair
pair_stats = (
    pairs.groupby(['batter','bowler'])
         .agg(runs=('batsman_runs','sum'))
         .reset_index()
)

# -------------------------
# Interactive Grouped Bar Chart
# -------------------------
fig = px.bar(
    pair_stats,
    x='batter',
    y='runs',
    color='bowler',
    text='runs',
    barmode='group',
    labels={'batter':'Batter', 'runs':'Runs Scored', 'bowler':'Bowler'},
    title='Batter vs Bowler Rivalries — Runs Scored',
    color_discrete_sequence=px.colors.qualitative.Vivid,
    height=500
)

# Add text on bars
fig.update_traces(textposition='outside')

# Style layout
fig.update_layout(
    template='plotly_dark',
    xaxis_tickangle=-30,
    xaxis=dict(title='Batter'),
    yaxis=dict(title='Runs Scored'),
    legend_title='Bowler',
)

fig.show()



**Analysis:** Darker cells mean a batter has scored heavily off a bowler; use dismissals to identify favorable bowlers in match-ups.


## 7) Extras Contribution

**Goal:** Show how many runs come from extras (wides, no-balls, legbyes, etc.) per team.


In [19]:
import pandas as pd
import plotly.express as px

# -------------------------
# Extras contribution
# -------------------------
extras = df[df['extra_runs'] > 0].groupby(['batting_team', 'extras_type']).agg(
    extras=('extra_runs', 'sum')
).reset_index()

# Pivot table
pivot = extras.pivot(index='batting_team', columns='extras_type', values='extras').fillna(0)
pivot['total_extras'] = pivot.sum(axis=1)
pivot = pivot.sort_values('total_extras', ascending=False)
pivot_plot = pivot.drop(columns='total_extras')

# Reset index for plotting
pivot_plot_reset = pivot_plot.reset_index()

# -------------------------
# Interactive Stacked Bar Chart
# -------------------------
fig = px.bar(
    pivot_plot_reset,
    x='batting_team',
    y=pivot_plot.columns,
    title='Extras by Team (Stacked by Type)',
    labels={'value':'Runs from Extras', 'batting_team':'Team'},
    text_auto=True,
    color_discrete_sequence=px.colors.qualitative.Pastel,
    height=500
)

# Style layout
fig.update_layout(
    template='plotly_dark',
    xaxis_tickangle=-30,
    yaxis_title='Runs from Extras',
    legend_title='Extras Type',
)

fig.show()


**Analysis:** Teams that concede many wides/no-balls give free runs and extra deliveries — a discipline issue to fix.


## 8) Wicket Timing Distribution

**Goal:** Histogram of overs when wickets most frequently fall (powerplay, middle, death).


In [20]:
import pandas as pd
import plotly.graph_objects as go

# -------------------------
# Wickets Timing
# -------------------------
wickets = df[df['is_wicket']==1]

# Count wickets per over
wicket_counts = wickets.groupby('over').size().reset_index(name='wickets')

# Compute cumulative wickets
wicket_counts['cumulative'] = wicket_counts['wickets'].cumsum()

# -------------------------
# Combined Bar + Line Plot
# -------------------------
fig = go.Figure()

# Bar: Wickets per over
fig.add_trace(go.Bar(
    x=wicket_counts['over'],
    y=wicket_counts['wickets'],
    name='Wickets per Over',
    marker_color='crimson',
    text=wicket_counts['wickets'],
    textposition='outside'
))

# Line: Cumulative wickets
fig.add_trace(go.Scatter(
    x=wicket_counts['over'],
    y=wicket_counts['cumulative'],
    name='Cumulative Wickets',
    mode='lines+markers+text',
    line=dict(color='yellow', width=3),
    marker=dict(size=8, symbol='diamond'),
    text=wicket_counts['cumulative'],
    textposition='top center'
))

# Layout
fig.update_layout(
    title='Wicket Timing — Distribution & Cumulative Trend',
    xaxis=dict(title='Over', dtick=1),
    yaxis=dict(title='Wickets'),
    template='plotly_dark',
    legend=dict(x=0.8, y=1.15),
    height=450
)

fig.show()


In [21]:
import numpy as np
import plotly.express as px

wickets = df[df['is_wicket']==1]
# Count wickets per over
wicket_counts = wickets.groupby('over').size().reset_index(name='wickets')

# Add random jitter for y-axis for “firework” effect
np.random.seed(42)
wicket_counts['y'] = np.random.uniform(0, 1, len(wicket_counts)) * 0.5 + np.arange(len(wicket_counts))

fig = px.scatter(
    wicket_counts, x='over', y='y',
    size='wickets', size_max=40,
    color='wickets', color_continuous_scale='Viridis',
    hover_data={'over':True, 'wickets':True, 'y':False},
    title='🔥 Wicket Fireworks — Distribution by Over'
)

fig.update_layout(
    yaxis=dict(showticklabels=False),
    template='plotly_dark',
    height=500
)
fig.show()


## 9) Clutch Batsmen — Death Overs (16–20)

**Goal:** Rank batsmen by runs scored in overs 16–20 to identify true finishers.


In [22]:
import plotly.express as px

# -------------------------
# Death Overs (16-20) Data
# -------------------------
death = df[df['over'].between(16,20)]
death_bat = (
    death.groupby('batter')
         .agg(runs=('batsman_runs','sum'), balls=('ball','count'))
         .reset_index()
)
# Filter batters who faced at least 20 balls
death_bat = death_bat[death_bat['balls']>=20].sort_values('runs', ascending=False).head(20)

# -------------------------
# Interactive Bar Chart
# -------------------------
fig = px.bar(
    death_bat,
    x='batter',
    y='runs',
    text='runs',
    color='runs',
    color_continuous_scale='Viridis',
    title='Top 20 Batsmen in Death Overs (16-20)',
    labels={'runs':'Runs', 'batter':'Batsman'}
)

# Style layout
fig.update_traces(textposition='outside')
fig.update_layout(
    template='plotly_dark',
    xaxis_tickangle=-45,
    yaxis_title='Runs Scored (16-20 overs)',
    height=500,
    showlegend=False
)

fig.show()


**Analysis:** These players are candidates for finishing roles; teams should protect them and give them clear hitting roles.


## 10) Bowler Economy in Death Overs (16–20)

**Goal:** Show economy rate distribution for bowlers in overs 16–20 (boxplot).


In [23]:
import plotly.graph_objects as go
import plotly.express as px
from ipywidgets import interact, Dropdown
import pandas as pd

# -------------------------
# Death Overs (16-20) Economy
# -------------------------
death_balls = df[df['over'].between(16,20)].copy()
death_stats = death_balls.groupby('bowler').agg(
    runs=('total_runs','sum'),
    balls=('ball','count')
).reset_index()
death_stats = death_stats[death_stats['balls'] >= 20]
death_stats['economy'] = death_stats['runs'] / (death_stats['balls'] / 6)

# -------------------------
# Function to plot selected bowler
# -------------------------
def death_economy(bowler_name):
    fig = go.Figure()

    # Box plot: Distribution of economy
    fig.add_trace(go.Box(
        y=death_stats['economy'],
        name='Economy Distribution',
        boxpoints='suspectedoutliers',
        marker_color='crimson',
        line=dict(width=2)
    ))

    # Scatter: All bowlers
    fig.add_trace(go.Scatter(
        y=death_stats['economy'],
        x=[0]*len(death_stats),
        mode='markers',
        marker=dict(color='yellow', size=8, opacity=0.8),
        name='Bowlers',
        hovertext=death_stats['bowler']
    ))

    # Highlight selected bowler
    selected = death_stats[death_stats['bowler']==bowler_name]
    fig.add_trace(go.Scatter(
        y=selected['economy'],
        x=[0]*len(selected),
        mode='markers+text',
        marker=dict(color='lime', size=12, symbol='diamond'),
        text=selected['bowler'],
        textposition='top center',
        name='Selected Bowler'
    ))

    fig.update_layout(
        title=f'Death Overs Economy — Highlighting {bowler_name}',
        yaxis_title='Economy Rate',
        xaxis=dict(showticklabels=False),
        template='plotly_dark',
        height=500
    )
    fig.show()

    # Show stats of selected bowler
    display(selected[['bowler','economy','balls','runs']])

# -------------------------
# Dropdown to select bowler
# -------------------------
interact(death_economy, bowler_name=Dropdown(options=death_stats['bowler'].sort_values(), description="Bowler"))


interactive(children=(Dropdown(description='Bowler', options=('A Ashish Reddy', 'A Choudhary', 'A Kumble', 'A …

<function __main__.death_economy(bowler_name)>

**Analysis:** Bowlers with lower economy in death overs are highly valuable for closing matches.


## 10) 📊 Batsman vs Bowler Rivalry Dashboard

 **Goal:**This interactive dashboard visualizes the head-to-head performance of a selected batsman against a selected bowler.

In [24]:
import pandas as pd
import plotly.express as px
from ipywidgets import interact, Dropdown

# Load dataset
df = pd.read_csv(r"c:\\Users\\Vishal\\OneDrive\\Desktop\\mini project\\deliveries (1).csv")

# ---- AUTO DETECT COLUMN NAMES ----
cols = df.columns.tolist()

# Detect batsman/batter column
batsman_col = None
for candidate in ["batter", "batsman", "striker"]:
    if candidate in cols:
        batsman_col = candidate
        break

if batsman_col is None:
    raise ValueError("❌ No batsman/batter column found! Check CSV file.")

# Rename for consistency
if batsman_col != "batter":
    df.rename(columns={batsman_col: "batter"}, inplace=True)

# ✅ Ensure required columns exist
required_cols = ["bowler", "match_id", "ball", "batsman_runs", "is_wicket", "dismissal_kind"]
for col in required_cols:
    if col not in df.columns:
        raise ValueError(f"❌ Required column '{col}' not found in dataset.")

# Unique batsmen and bowlers list
batsmen = sorted(df['batter'].dropna().unique().tolist())
bowlers = sorted(df['bowler'].dropna().unique().tolist())

def plot_batsman_bowler_rivalry(batsman_name, bowler_name):
    rivalry_df = df[(df['batter'] == batsman_name) & (df['bowler'] == bowler_name)]
    if rivalry_df.empty:
        print(f"❌ No data for {batsman_name} vs {bowler_name}")
        return

    # ---- SAFELY AGGREGATE ----
    summary = rivalry_df.groupby('match_id').agg(
        balls_faced=('ball', 'count'),
        runs_scored=('batsman_runs', 'sum'),
        wickets=('is_wicket', 'sum'),
        dismissal_kind=('dismissal_kind', lambda x: ', '.join(x.dropna().unique()))
    ).reset_index()

    # ---- SCATTER PLOT ----
    fig = px.scatter(
        summary,
        x="balls_faced",
        y="runs_scored",
        size="wickets",
        color="runs_scored",
        hover_data=["match_id", "dismissal_kind"],
        title=f"Rivalry: {batsman_name} vs {bowler_name}",
        labels={"balls_faced": "Balls Faced", "runs_scored": "Runs Scored"}
    )

    fig.update_traces(marker=dict(line=dict(width=1, color='DarkSlateGrey')))
    fig.update_layout(
        xaxis_title="Balls Faced",
        yaxis_title="Runs Scored",
        template="plotly_dark",
        title_x=0.5
    )

    fig.show()

# Interactive dropdowns
interact(
    plot_batsman_bowler_rivalry,
    batsman_name=Dropdown(options=batsmen, description="Batsman"),
    bowler_name=Dropdown(options=bowlers, description="Bowler")
)




interactive(children=(Dropdown(description='Batsman', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A …

<function __main__.plot_batsman_bowler_rivalry(batsman_name, bowler_name)>

Analysis:  It shows the balls faced, runs scored, and wickets taken in each match between the pair, along with a scatter plot to represent the rivalry visually.

Analysis: 
Head-to-head metrics help identify strengths and weaknesses of a batsman against a specific bowler.

Dismissal types show whether the bowler is getting the batsman out through pace, spin, LBW, catches, etc.

Runs over seasons indicate consistency or improvement in performance.

## 11) Batsman Run Type Distribution
**Goal:**

This dashboard displays the distribution of run types for a selected batsman:

Dot balls

Singles, Doubles, Triples

Boundaries (Fours and Sixes)

This helps analyze a batsman’s scoring pattern and shot selection

In [25]:
import pandas as pd
import plotly.express as px
from ipywidgets import interact, Dropdown

# Load dataset
df = pd.read_csv(r"c:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv")

# Create run type category including Dot balls
df['run_type'] = df['batsman_runs'].apply(
    lambda x: 'Dot' if x == 0 else
              'Single' if x == 1 else
              'Double' if x == 2 else
              'Triple' if x == 3 else
              'Four' if x == 4 else
              'Six' if x == 6 else 'Other'
)

# Unique batsmen list
batsmen = sorted(df['batter'].dropna().unique().tolist())

def plot_batsman_run_analysis(batsman_name):
    df_batsman = df[df['batter'] == batsman_name]
    
    if df_batsman.empty:
        print(f"❌ No data for {batsman_name}")
        return
    
    # Aggregate by run type
    run_stats = df_batsman.groupby('run_type').agg(
        balls=('batsman_runs', 'count'),
        total_runs=('batsman_runs', 'sum')
    ).reset_index()
    
    # Ensure all categories exist
    run_stats = run_stats.set_index('run_type').reindex(['Dot','Single','Double','Triple','Four','Six','Other'], fill_value=0).reset_index()
    
    # Calculate percentage contribution
    total_balls = run_stats['balls'].sum()
    total_runs = run_stats['total_runs'].sum()
    run_stats['percentage'] = run_stats['balls'] / total_balls * 100 if total_balls>0 else 0
    
    # Strike rate
    strike_rate = (total_runs / total_balls * 100) if total_balls>0 else 0
    
    # Interactive bar chart
    fig = px.bar(
        run_stats,
        x='run_type',
        y='balls',
        text=run_stats['percentage'].apply(lambda x: f"{x:.1f}%"),
        color='balls',
        color_continuous_scale='Plasma',
        hover_data={'total_runs': True, 'percentage': ':.2f'},
        title=f"{batsman_name} — Run Type Distribution & Strike Rate: {strike_rate:.2f}",
        labels={'run_type': 'Run Type', 'balls': 'Balls Faced'}
    )

    fig.update_traces(textposition='outside')
    fig.update_layout(
        template='plotly_dark',
        xaxis=dict(categoryorder='array', categoryarray=['Dot','Single','Double','Triple','Four','Six','Other']),
        yaxis_title='Balls Faced',
        height=500
    )

    fig.show()

# Interactive dropdown
interact(plot_batsman_run_analysis, batsman_name=Dropdown(options=batsmen, description="Batsman"))



interactive(children=(Dropdown(description='Batsman', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A …

<function __main__.plot_batsman_run_analysis(batsman_name)>

Analysis

Dot ball percentage indicates a batsman’s ability to rotate strike or maintain pressure.

Singles and doubles show running ability and strike rotation skills.

Boundaries reflect aggressive play style and power hitting capability.

This helps in player profiling and understanding scoring tendencies.

## 12) Bowler Performance Analysis
Goal

To analyze a bowler’s performance in detail, including:

Run type distribution

Dismissal patterns

Key statistics such as dot balls, maiden overs, hat-tricks, wide balls, and no balls

This allows coaches, analysts, and fans to evaluate how effective a bowler is in different match situations.

In [26]:
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from ipywidgets import interact, Dropdown

# Load dataset
# df = pd.read_csv(r"c:\\Users\\Vishal\\OneDrive\\Desktop\\mini project\\main project\\deliveries (1).csv")

# ---- AUTO DETECT COLUMN NAMES ----
cols = df.columns.tolist()
batsman_col = None

# Detect batsman column automatically
for candidate in ["batsman", "striker", "batter"]:
    if candidate in cols:
        batsman_col = candidate
        break

if batsman_col is None:
    raise ValueError("❌ Could not find batsman column! Please check your CSV file columns.")

# Rename for consistency
if batsman_col != "batsman":
    df.rename(columns={batsman_col: "batsman"}, inplace=True)

# ✅ Now we have a clean dataframe with "batsman" column
# Create run type category
df['run_type'] = df['batsman_runs'].apply(
    lambda x: 'Dot' if x == 0 else
              'Single' if x == 1 else
              'Double' if x == 2 else
              'Triple' if x == 3 else
              'Four' if x == 4 else
              'Six' if x == 6 else 'Other'
)

# Combine all players (batsman + bowler)
all_players = sorted(set(df['bowler'].dropna()) | set(df['batsman'].dropna()))

def plot_player_analysis(player_name):
    df_bowler = df[df['bowler'] == player_name]
    df_batsman = df[df['batsman'] == player_name]

    is_bowler = not df_bowler.empty
    is_batsman = not df_batsman.empty

    if not is_bowler and not is_batsman:
        print(f"❌ No data found for {player_name}")
        return

    # ---- BOWLER ANALYSIS ----
    if is_bowler:
        dot_count = (df_bowler['batsman_runs'] == 0).sum()

        maiden_overs_df = df_bowler.groupby(['match_id', 'over'])['total_runs'].sum().reset_index()
        maiden_count = (maiden_overs_df['total_runs'] == 0).sum()

        # Hat-trick detection (approx: 3+ wickets in a match)
        wickets = df_bowler[df_bowler['is_wicket'] == 1].sort_values(['match_id', 'over', 'ball'])
        hat_trick_count = 0
        if not wickets.empty:
            hat_trick_count = sum(wickets.groupby('match_id').size() >= 3)

        wide_count = df_bowler['wide_runs'].sum() if 'wide_runs' in df_bowler else 0
        noball_count = df_bowler['noball_runs'].sum() if 'noball_runs' in df_bowler else 0

        dismissal_counts = (
            df_bowler[df_bowler['player_dismissed'].notnull()]
            .groupby('dismissal_kind').size().reset_index(name='count')
        )

        run_types = ['Dot', 'Single', 'Double', 'Triple', 'Four', 'Six', 'Other']
        run_counts = (
            df_bowler.groupby('run_type').size()
            .reindex(run_types, fill_value=0)
            .reset_index(name='count')
        )
    else:
        dot_count = maiden_count = hat_trick_count = wide_count = noball_count = 0
        dismissal_counts = pd.DataFrame(columns=['dismissal_kind', 'count'])
        run_counts = pd.DataFrame(columns=['run_type', 'count'])

    # ---- BATSMAN ANALYSIS ----
    if is_batsman:
        batsman_runs = df_batsman['batsman_runs'].sum()
        balls_faced = df_batsman.shape[0]
        fours = (df_batsman['batsman_runs'] == 4).sum()
        sixes = (df_batsman['batsman_runs'] == 6).sum()
        times_out = (df_batsman['player_dismissed'] == player_name).sum()
        strike_rate = (batsman_runs / balls_faced) * 100 if balls_faced > 0 else 0

    # --------- PLOTLY FIGURE ---------
    if is_bowler and not run_counts.empty:
        fig = make_subplots(rows=2, cols=1, subplot_titles=[
            f"Run Type Distribution (Bowling) — {player_name}",
            f"Dismissal Types (Bowling) — {player_name}"
        ])

        fig.add_trace(
            go.Bar(x=run_counts['run_type'], y=run_counts['count'],
                   marker_color='mediumseagreen', name='Runs Given'), row=1, col=1)

        if not dismissal_counts.empty:
            fig.add_trace(
                go.Bar(x=dismissal_counts['dismissal_kind'], y=dismissal_counts['count'],
                       marker_color='salmon', name='Dismissals'), row=2, col=1)

        fig.update_layout(height=700, template='plotly_dark', showlegend=False)
        fig.update_yaxes(title_text="Count", row=1, col=1)
        fig.update_yaxes(title_text="Count", row=2, col=1)
        fig.update_xaxes(title_text="Run Type", row=1, col=1)
        fig.update_xaxes(title_text="Dismissal Kind", row=2, col=1)

        fig.show()

    # ---- PRINT STATS ----
    print(f"📊 Stats for {player_name}:")
    if is_bowler:
        print(f"🎯 Bowler Stats:")
        print(f"Dot Balls: {dot_count}")
        print(f"Maiden Overs: {maiden_count}")
        print(f"Hat-tricks (approx): {hat_trick_count}")
        print(f"Wide Balls: {wide_count}")
        print(f"No Balls: {noball_count}")

    if is_batsman:
        print(f"🏏 Batsman Stats:")
        print(f"Runs: {batsman_runs}, Balls Faced: {balls_faced}")
        print(f"4s: {fours}, 6s: {sixes}")
        print(f"Strike Rate: {strike_rate:.2f}")
        print(f"Times Out: {times_out}")

# Interactive dropdown
interact(plot_player_analysis, player_name=Dropdown(options=all_players, description="Player"))



interactive(children=(Dropdown(description='Player', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A C…

<function __main__.plot_player_analysis(player_name)>

## 13)🏏 IPL Player Winning Percentage by taking 3+ wicket or 30+ score run Dashboard
Batsman Analysis

Identify matches where a batsman scored at least a specified number of runs (default 30).

Calculate win/loss percentage for those matches.

Visualize performance using pie charts and top-match bar charts.

Bowler Analysis

Identify matches where a bowler took at least a specified number of wickets (default 3).

Calculate win/loss percentage for those matches.

Visualize performance using pie charts and top-match bar charts.

In [27]:
import pandas as pd
import plotly.express as px
from ipywidgets import interact, Dropdown, IntSlider

# -------------------------
# Load datasets
# -------------------------
df = pd.read_csv(r"C:\Users\Vishal\OneDrive\Desktop\mini project\deliveries (1).csv")
matches = pd.read_csv(r"C:\Users\Vishal\OneDrive\Desktop\mini project\matches.csv")

# Unique players
batsmen = sorted(df['batter'].dropna().unique().tolist())
bowlers = sorted(df['bowler'].dropna().unique().tolist())

# -------------------------
# Batsman Dashboard
# -------------------------
def batsman_dashboard(batsman_name, min_runs=30):
    runs_df = df[df['batter'] == batsman_name].groupby('match_id')['batsman_runs'].sum().reset_index()
    runs_matches = runs_df[runs_df['batsman_runs'] >= min_runs]['match_id']

    wins = 0
    total_matches = 0

    for match_id in runs_matches:
        match = matches[matches['id'] == match_id].iloc[0]
        player_team = df[(df['match_id']==match_id) & (df['batter']==batsman_name)]['batting_team'].iloc[0]
        if pd.notna(match['winner']):
            total_matches += 1
            if match['winner'] == player_team:
                wins += 1

    losses = total_matches - wins
    win_percentage = (wins / total_matches * 100) if total_matches > 0 else 0

    print(f"\n🏏 {batsman_name} (Min Runs: {min_runs})")
    print(f"Matches Played: {total_matches}")
    print(f"Wins: {wins}, Losses: {losses}")
    print(f"Winning %: {win_percentage:.2f}%, Losing %: {100 - win_percentage:.2f}%")

    # Pie chart
    pie = px.pie(
        names=["Wins", "Losses"],
        values=[wins, losses],
        title=f"{batsman_name} — Win/Loss Percentage",
        color_discrete_sequence=["green", "red"]
    )
    pie.show()

    # Top 10 matches bar chart
    top_matches = runs_df.sort_values('batsman_runs', ascending=False).head(10)
    top_matches = top_matches.merge(matches[['id', 'date', 'team1', 'team2']], 
                                    left_on='match_id', right_on='id', how='left')
    top_matches['match_label'] = top_matches['team1'] + " vs " + top_matches['team2'] + " (" + top_matches['date'].astype(str) + ")"

    bar = px.bar(
        top_matches,
        x='match_label',
        y='batsman_runs',
        text='batsman_runs',
        title=f"Top 10 Matches by {batsman_name} Runs",
        color='batsman_runs',
        color_continuous_scale='Viridis'
    )
    bar.update_traces(textposition='outside')
    bar.update_layout(xaxis_tickangle=-45, xaxis_title="Match", yaxis_title="Runs")
    bar.show()

# -------------------------
# Bowler Dashboard
# -------------------------
def bowler_dashboard(bowler_name, min_wickets=3):
    wickets_df = df[df['bowler'] == bowler_name].groupby('match_id')['is_wicket'].sum().reset_index()
    wickets_matches = wickets_df[wickets_df['is_wicket'] >= min_wickets]['match_id']

    wins = 0
    total_matches = 0

    for match_id in wickets_matches:
        match = matches[matches['id'] == match_id].iloc[0]
        player_team = df[(df['match_id']==match_id) & (df['bowler']==bowler_name)]['bowling_team'].iloc[0]
        if pd.notna(match['winner']):
            total_matches += 1
            if match['winner'] == player_team:
                wins += 1

    losses = total_matches - wins
    win_percentage = (wins / total_matches * 100) if total_matches > 0 else 0

    print(f"\n🎯 {bowler_name} (Min Wickets: {min_wickets})")
    print(f"Matches Played: {total_matches}")
    print(f"Wins: {wins}, Losses: {losses}")
    print(f"Winning %: {win_percentage:.2f}%, Losing %: {100 - win_percentage:.2f}%")

    # Pie chart
    pie = px.pie(
        names=["Wins", "Losses"],
        values=[wins, losses],
        title=f"{bowler_name} — Win/Loss Percentage",
        color_discrete_sequence=["green", "red"]
    )
    pie.show()

    # Top 10 matches bar chart
    top_matches = wickets_df.sort_values('is_wicket', ascending=False).head(10)
    top_matches = top_matches.merge(matches[['id', 'date', 'team1', 'team2']], 
                                    left_on='match_id', right_on='id', how='left')
    top_matches['match_label'] = top_matches['team1'] + " vs " + top_matches['team2'] + " (" + top_matches['date'].astype(str) + ")"

    bar = px.bar(
        top_matches,
        x='match_label',
        y='is_wicket',
        text='is_wicket',
        title=f"Top 10 Matches by {bowler_name} Wickets",
        color='is_wicket',
        color_continuous_scale='Cividis'
    )
    bar.update_traces(textposition='outside')
    bar.update_layout(xaxis_tickangle=-45, xaxis_title="Match", yaxis_title="Wickets")
    bar.show()

# -------------------------
# Interactive Widgets
# -------------------------
print("🏏 Batsman Winning Percentage Dashboard")
interact(
    batsman_dashboard,
    batsman_name=Dropdown(options=batsmen, description="Batsman"),
    min_runs=IntSlider(value=30, min=0, max=150, step=5, description="Min Runs")
)

print("\n🎯 Bowler Winning Percentage Dashboard")
interact(
    bowler_dashboard,
    bowler_name=Dropdown(options=bowlers, description="Bowler"),
    min_wickets=IntSlider(value=3, min=0, max=10, step=1, description="Min Wickets")
)


🏏 Batsman Winning Percentage Dashboard


interactive(children=(Dropdown(description='Batsman', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A …


🎯 Bowler Winning Percentage Dashboard


interactive(children=(Dropdown(description='Bowler', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A C…

<function __main__.bowler_dashboard(bowler_name, min_wickets=3)>

Analysis

Run type distribution gives insight into a bowler’s economy and consistency.

Dismissal patterns reveal strengths in wicket-taking strategies.

Dot balls & maiden overs show the bowler’s ability to build pressure.

Hat-tricks indicate match-changing potential.

Wide and no balls track discipline and control under pressure.

## 14) Score conversion 30+ to 50
Goal

To measure the impact of key performances (batsmen scoring 30+ runs, to convert 50) on match outcomes.

This helps understand how often a player’s high performance leads to team victory.

In [28]:
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, Dropdown
import numpy as np

# --- Load datasets ---
df = pd.read_csv(r"c:\\Users\\Vishal\\OneDrive\\Desktop\\mini project\\deliveries (1).csv")
matches = pd.read_csv(r"c:\\Users\\Vishal\\OneDrive\\Desktop\\mini project\\matches.csv")

# --- Helper function to find first available column ---
def find_col(cols, candidates):
    for c in candidates:
        if c in cols:
            return c
    return None

# --- Standardize batter column ---
batter_col = find_col(df.columns, ["batter", "batsman", "striker"])
if batter_col is None:
    raise ValueError("Could not find a batter/batsman column in deliveries CSV.")
if batter_col != "batter":
    df.rename(columns={batter_col: "batter"}, inplace=True)

# --- Standardize runs column ---
runs_col = find_col(df.columns, ["batsman_runs", "batter_runs", "runs_off_bat", "runs"])
if runs_col is None:
    raise ValueError("Could not find a runs column (batsman_runs) in deliveries CSV.")
if runs_col != "batsman_runs":
    df.rename(columns={runs_col: "batsman_runs"}, inplace=True)

# --- Detect & standardize match id ---
left_match_col = find_col(df.columns, ["match_id", "matchId", "id", "match"])
right_match_col = find_col(matches.columns, ["match_id", "matchId", "id", "match"])

if left_match_col is None:
    raise ValueError("Could not find a match identifier column in deliveries CSV.")

# create standard 'match_id' in both dataframes
df["match_id"] = df[left_match_col] if left_match_col != "match_id" else df["match_id"]

if right_match_col is not None:
    matches["match_id"] = matches[right_match_col] if right_match_col != "match_id" else matches["match_id"]
else:
    matches["match_id"] = np.nan  # fallback if no column found

# --- Detect & standardize season column ---
season_col = find_col(matches.columns, ["season", "year"])
if season_col is not None:
    if season_col != "season":
        matches.rename(columns={season_col: "season"}, inplace=True)
else:
    matches["season"] = np.nan  # fill with NaN if season missing

# --- Merge deliveries with matches to get season info ---
df = df.merge(matches[["match_id", "season"]].drop_duplicates(), on="match_id", how="left")

# --- Ensure numeric type for runs ---
df["batsman_runs"] = pd.to_numeric(df["batsman_runs"], errors="coerce").fillna(0)

# --- Get unique batsmen list ---
batsmen = sorted(df["batter"].dropna().unique().tolist())

def conversion_30_to_50(batsman_name):
    player_df = df[df["batter"] == batsman_name]
    if player_df.empty:
        print(f"❌ No data for {batsman_name}")
        return

    # Group runs per match
    if player_df["season"].notna().any():
        runs_df = player_df.groupby(["match_id", "season"], dropna=False)["batsman_runs"].sum().reset_index()
    else:
        runs_df = player_df.groupby(["match_id"], dropna=False)["batsman_runs"].sum().reset_index()
        runs_df["season"] = np.nan

    matches_30 = runs_df[runs_df["batsman_runs"] >= 30]
    if matches_30.empty:
        print(f"❌ No matches found where {batsman_name} scored 30+ runs.")
        return

    matches_50 = matches_30[matches_30["batsman_runs"] >= 50]

    total_30 = matches_30.shape[0]
    total_50 = matches_50.shape[0]
    conversion_rate = (total_50 / total_30) * 100 if total_30 > 0 else 0

    print(f"\n🎯 {batsman_name} — 30+ to 50+ Conversion")
    print(f"✅ Matches with 30+ runs: {total_30}")
    print(f"🏏 Converted to 50+: {total_50}")
    print(f"📊 Conversion Rate: {conversion_rate:.2f}%")

    fig = go.Figure(go.Sunburst(
        labels=["30+ Runs", "Converted to 50+", "Not Converted (30-49)"],
        parents=["", "30+ Runs", "30+ Runs"],
        values=[total_30, total_50, total_30 - total_50],
        branchvalues="total",
        marker=dict(colors=["#FFD700", "#28a745", "#FF4C4C"], line=dict(color="white", width=2)),
        hovertemplate="<b>%{label}</b><br>Matches: %{value}<br>Percent: %{percentParent:.1%}<extra></extra>"
    ))

    fig.update_layout(
        title=dict(text=f"🌟 {batsman_name} — 30+ to 50+ Conversion", font=dict(size=22), x=0.5),
        margin=dict(t=60, l=20, r=20, b=20),
        paper_bgcolor="#f8f9fa"
    )

    fig.show()

# --- Interactive Dropdown ---
interact(conversion_30_to_50, batsman_name=Dropdown(options=batsmen, description="Batsman"))



interactive(children=(Dropdown(description='Batsman', options=('A Ashish Reddy', 'A Badoni', 'A Chandila', 'A …

<function __main__.conversion_30_to_50(batsman_name)>