IPL Data Pipeline & Analysis (Pandas Project)

Overview

This project builds a complete data analysis pipeline on IPL (Indian Premier League) datasets using Python (Pandas).

It simulates a real-world workflow:

Ingest → Clean → Transform → Analyze → Report → Export

Dataset

We use two datasets:

deliveries.csv → Ball-by-ball data
matches.csv → Match-level metadata

📎 Source: Kaggle IPL Dataset (2008–2020)

Tech Stack

Python
Pandas
NumPy
Matplotlib (optional)
Jupyter Notebook (VS Code)

Project Structure

IPL_Pipeline/
│
├── data/
│   ├── deliveries.csv
│   └── matches.csv
│
├── output/
│   ├── runs_per_match.csv
│   ├── top_batters.csv
│   ├── strike_rate.csv
│   ├── economy.csv
│   ├── team_scores.csv
│   ├── death_overs.csv
│   └── ipl_analysis.xlsx
│
├── analysis.ipynb
└── README.md

Pipeline Stages

Stage 1: Data Ingestion

Loaded datasets using Pandas
Inspected shape, columns, and data types

Stage 2: Data Cleaning & Validation

Handled missing values
Fixed data types
Validated match IDs between datasets

Stage 3: Data Transformation

Created derived columns
Standardized column names
Merged datasets into a unified DataFrame

Stage 4: Core Analysis

Key Use Cases:

Total Runs per Match
Runs per Team per Match
Top 10 Batters
Strike Rate of Batters
Top Bowlers by Economy
Most Consistent Batters
Highest Individual Score
Boundary Analysis
Boundary Percentage
Dot Ball Analysis
Runs per Over Analysis
Powerplay Performance
Death Overs Performance
Run Distribution (1st vs 2nd Innings)
Toss Impact Analysis
Player of Match Contribution
Venue-wise Analysis
City-wise Analysis
Season-wise Trends
Winning Team Analysis

Stage 5: Derived Insights

Identified key performers
Compared match phases (powerplay, death overs)
Analyzed venue and season trends

Stage 6: Reporting

Cleaned and formatted DataFrames
Structured outputs for readability

Stage 7: Data Export

Exported results to CSV files
Generated Excel summary file

Key Insights

Death overs (16–20) have the highest scoring rates
Certain venues (e.g., batting-friendly pitches) consistently produce high scores
Toss impact is moderate (~50–55%)
Some players dominate via boundaries, while others rely on consistency
Strike rate is crucial in death overs, while average matters for consistency

How to Run

1. Clone the repository

git clone <your-repo-link>
cd IPL_Pipeline

2. Install dependencies

pip install pandas numpy matplotlib openpyxl

3. Run the notebook

Open analysis.ipynb in VS Code or Jupyter and run all cells.

Outputs

CSV files in /output folder
Excel file: ipl_analysis.xlsx

Project Outcome

This project demonstrates:

End-to-end data pipeline design
Data cleaning and transformation
Multi-source data merging
Efficient aggregation using Pandas
Insight generation from real-world data

Author

Akshat Walvekar

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IPL Data Pipeline & Analysis (Pandas Project)

Overview

Dataset

Tech Stack

Project Structure

Pipeline Stages

Stage 1: Data Ingestion

Stage 2: Data Cleaning & Validation

Stage 3: Data Transformation

Stage 4: Core Analysis

Key Use Cases:

Stage 5: Derived Insights

Stage 6: Reporting

Stage 7: Data Export

Key Insights

How to Run

1. Clone the repository

2. Install dependencies

3. Run the notebook

Outputs

Project Outcome

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

IPL Data Pipeline & Analysis (Pandas Project)

Overview

Dataset

Tech Stack

Project Structure

Pipeline Stages

Stage 1: Data Ingestion

Stage 2: Data Cleaning & Validation

Stage 3: Data Transformation

Stage 4: Core Analysis

Key Use Cases:

Stage 5: Derived Insights

Stage 6: Reporting

Stage 7: Data Export

Key Insights

How to Run

1. Clone the repository

2. Install dependencies

3. Run the notebook

Outputs

Project Outcome

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages