## About Dataset
**Context**
- **The Pakistan Super League (PSL) is a professional Twenty20 cricket league in Asia, primarily held in Pakistan and the UAE. Established in 2015 by the Pakistan Cricket Board (PCB), PSL features six franchise teams representing major cities. The league follows a round-robin format, where teams compete in group-stage matches before advancing to playoffs and the grand final. Known for its thrilling contests, international star players, and passionate fanbase, PSL has grown into one of the most competitive T20 leagues in the world.**
- **This dataset captures the entire history of PSL matches, enabling cricket analysts, data scientists, and fans to explore match trends, player performances, and strategic insights.**

**Content**
- Geography:Pakistan, UAE (Asia)
- Time Period: February 4, 2016 – March 18, 2024
- Unit of Analysis: Pakistan Super League (PSL) Matches

**Variables**

The dataset consists of ball-by-ball records and match summaries, making it ideal for detailed performance analysis. Below is a breakdown of the dataset's key columns:

**Column Name	Description**
- id	Unique identifier for each delivery
- match_id	Unique identifier for each match
- date	Date of the match
- season	PSL season in which the match was played
- venue	Stadium where the match was played
- inning	Inning number
- batting_team	Team currently batting
- bowling_team	Team currently bowling
- over	Over number in the innings (0 to 19)
- ball	Ball number in the over (1 to 6)
- batter	Name of the batsman on strike
- bowler	Name of the bowler delivering the ball
- non_striker	Name of the non-striking batsman
- batsman_runs	Runs scored by the batsman on that delivery
- extra_runs	Runs awarded as extras (wides, no-balls, etc.)
- total_runs	Sum of batsman and extra runs for the delivery
- extras_type	Type of extra run (wide, no-ball, bye, etc.)
- is_wicket	Indicates if a wicket fell on that delivery (1 = Yes, 0 = No)
- player_dismissed	Name of the dismissed player (if any)
- dismissal_kind	Method of dismissal (bowled, caught, run out, etc.)
- fielder	Name of the fielder involved in the dismissal (if applicable)
- winner	Team that won the match
- win_by	Margin of victory (runs or wickets)
- match_type	Type of match (league, playoff, final)
- player_of_match	Name of the best-performing player of the match
- umpire_1	Name of the first on-field umpire
- umpire_2	Name of the second on-field umpire

**Acknowledgements**
- Data Source: Cricsheet

--------

Take a look at:

Match-level data (teams, result, toss, venue, etc.)

Player-level stats (runs, wickets, strike rate, economy)

Team performance summaries (net run rate, points table if available)

🔍 2. EDA Ideas
Here’s what you can explore:

Win % of each team across seasons

Toss win vs match win trends

Best batting/bowling sides

Team form in recent seasons (especially 2023-2024)

Venue-wise performance

Head-to-head results

Use Seaborn, Matplotlib, or Plotly for beautiful visualizations!

🧠 3. Feature Engineering Ideas
You can build features like:

Recent form: last 3 or 5 matches won

Team batting avg or strike rate

Top bowler avg wickets per match

Home ground advantage

Toss win + batting/fielding decision

Powerplay & death overs performance

For match-wise prediction:

python
Copy
Edit
features = ['team1', 'team2', 'venue', 'toss_winner', 'bat_first', 
            'team1_avg_runs', 'team2_avg_runs', 'team1_win_rate', 
            'team2_win_rate', ...]
⚙️ 4. Modeling Approach
Label: Who won the match (team1_win = 1 or 0)

Use models like:

Logistic Regression

Random Forest

XGBoost / LightGBM

Train on matches till 2023

Test on 2024 to validate

🔮 5. Predict PSL 2025 Winner
Option A:
Use the model to predict each 2025 match (simulate entire season, calculate points)

Option B:
Aggregate 2024 season + squad stats → make a probabilistic prediction of who is most likely to win