A data-driven analysis of Indian Premier League match and player data using Python, Pandas, Scikit-learn, and Matplotlib.
Built by Archishman Mittal | CS + Economics, University of Delhi
This project analyses 9 seasons of IPL data (2015–2023) across 252 matches and 60,000+ ball-by-ball deliveries to extract strategic insights on team performance, player form, toss impact, and match outcome prediction.
The kind of analysis done here mirrors what franchise strategy teams (RCB, MI, CSK etc.) and sports analytics companies like FanCode, Sportz Interactive, and Dream Sports use to inform real decisions.
- Win Rate by Team — which franchises dominate across seasons
- Season-wise Wins Trend — tracking performance consistency of top 4 teams
- Toss Impact Analysis — does winning the toss actually matter?
- Phase-wise Scoring — Powerplay vs Middle vs Death over run rates
- Top 10 Run Scorers — all-time leading batters by total runs
- Top 10 by Strike Rate — most explosive batters (min. 200 balls)
- Top 10 Wicket Takers — all-time leading wicket takers
- Best Economy Rates — most efficient bowlers (min. 300 balls)
- Logistic Regression model trained to predict match winner
- Features: Team matchup, Venue, Toss outcome, 1st innings score
- Win Rate vs. 1st Innings Score — at what score does batting first become an advantage?
| Tool | Usage |
|---|---|
| Python 3.x | Core language |
| Pandas | Data wrangling & aggregation |
| NumPy | Numerical operations |
| Matplotlib | Custom dark-theme dashboards |
| Seaborn | Statistical visualizations |
| Scikit-learn | Logistic Regression win probability model |
ipl-analytics/
│
├── generate_data.py # Generates realistic IPL match + delivery dataset
├── ipl_analysis.py # Main analysis script (run this)
├── ipl_matches.csv # Match-level data (252 matches)
├── ipl_deliveries.csv # Ball-by-ball delivery data (60k+ rows)
│
├── ipl_team_dashboard.png # Team & match insights chart
├── ipl_player_dashboard.png # Player performance chart
├── ipl_win_probability.png # Win prediction model chart
│
└── README.md
# 1. Clone the repo
git clone https://github.com/archishman-mittal/ipl-analytics.git
cd ipl-analytics
# 2. Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn
# 3. Generate the dataset
python generate_data.py
# 4. Run the full analysis
python ipl_analysis.py- Toss has minimal impact — toss winner wins only ~50.4% of matches, suggesting in-game performance matters far more than the coin flip
- Batting first is advantageous above 180 — teams scoring 180+ win ~57% of games; below 150, that drops sharply
- Death overs are the highest-scoring phase — avg runs/delivery peaks in overs 16–20, making death bowling the most critical strategic phase
- Strike rate vs. consistency — the top run-scorers and top strike-rate batters are largely different players, highlighting the power vs. anchor dichotomy in T20 batting
- Add sentiment analysis on IPL social media data (Twitter/Instagram)
- Player auction value prediction model
- Head-to-head matchup analysis (batter vs. bowler)
- Fan engagement metrics correlated with team performance
- Real-time win probability tracker using live match data
IPL franchises, sports analytics firms, and consumer lifestyle brands (Red Bull, Nike, etc.) increasingly rely on data to drive decisions — from squad selection and auction strategy to fan engagement and content planning. This project demonstrates the ability to extract, model, and communicate those insights clearly.
Archishman Mittal
📧 archishmanmittal@gmail.com
🔗 LinkedIn
📍 New Delhi, India


