# Data collection

### About the data

A zip archive contains data files from Cricsheet in JSON format. This
archive contains 2887 ODI matches. A further 146 matches have been withheld
due to either featuring the Afghanistan men's team or being played in the
Afghanistan Premier League, due to the Cricsheet policy to no longer feature
matches involving Afghanistan men or played in Afghanistan Premier League (see
https://cricsheet.org/withheld-matches for more information).


The JSON data files contained in this zip file are version 1.0.0, and 1.1.0
files. You can learn about the structure of these files at
https://cricsheet.org/format/json/


You can find the available downloads at https://cricsheet.org/downloads/, and
you can find the most up-to-date version of this zip file at
https://cricsheet.org/downloads/odis_json.zip


The matches contained in this zip archive are listed below. The first field is
the start date of the match (for test matches or other multi-day matches), or
the actual date (for all other types of match). The second is the type of
teams involved, whether 'club', or 'international'. The third is the type of
match, either Test, ODI, ODM, T20, IT20, MDM, or a club competition code (such
as IPL). The 4th field is the gender of the players involved in the match. The
5th field is the id of the match, and the remainder of the line shows the
teams involved in the match.

# Match Data

In [1]:
# !pip3 install beautifulsoup4 requests pandas lxml -q

In [2]:
import os
import json
import pandas as pd
import numpy as np

In [3]:
json_folder = './odis_json/'
csv_folder = './odi_csv/'
output_folder = './output/'

match_summary_list = []
player_stats_dict = {}
match_timeseries_data = []

if not os.path.exists(csv_folder):
    os.makedirs(csv_folder)

if not os.path.exists(output_folder):
    os.makedirs(output_folder)

In [4]:
print("Starting the process of reading JSON files from the folder:", json_folder)
print(f"Found {len(os.listdir(json_folder))} files")

print("\n\nProcessing data...")
for filename in os.listdir(json_folder):
    if filename.endswith('.json'):
        file_path = os.path.join(json_folder, filename)

        try:
            with open(file_path, 'r') as file:
                data = json.load(file)
                total_fours = 0
                total_sixes = 0

                match_info = {
                    "match_id": filename.split('.')[0],  # Use filename as match ID
                    "created": data.get("meta", {}).get("created"),
                    "city": data.get("info", {}).get("city"),
                    "dates": data.get("info", {}).get("dates", [])[0] if data.get("info", {}).get("dates") else None,
                    "event_name": data.get("info", {}).get("event", {}).get("name"),
                    "match_number": data.get("info", {}).get("event", {}).get("match_number"),
                    "gender": data.get("info", {}).get("gender"),
                    "match_type": data.get("info", {}).get("match_type"),
                    "match_type_number": data.get("info", {}).get("match_type_number"),
                    "toss_winner": data.get("info", {}).get("toss", {}).get("winner"),
                    "toss_decision": data.get("info", {}).get("toss", {}).get("decision"),
                    "outcome_winner": data.get("info", {}).get("outcome", {}).get("winner"),
                    "outcome_by_runs": data.get("info", {}).get("outcome", {}).get("by", {}).get("runs"),
                    "outcome_by_wickets": data.get("info", {}).get("outcome", {}).get("by", {}).get("wickets"),
                    "overs": data.get("info", {}).get("overs"),
                    "player_of_match": data.get("info", {}).get("player_of_match", [])[0] if data.get("info", {}).get("player_of_match") else None,
                    "season": data.get("info", {}).get("season"),
                    "teams": ", ".join(data.get("info", {}).get("teams", [])),
                    "venue": data.get("info", {}).get("venue")
                }

                innings_data = data.get("innings", [])
                match_timeseries_data = []

                # Track match outcome
                match_winner = match_info["outcome_winner"]
                match_loser = None
                if match_winner:
                    match_loser = [team for team in data.get("info", {}).get("teams", []) if team != match_winner][0]

                # Iterate over innings to capture time-series data
                for inning in innings_data:
                    team = inning.get("team")
                    for over_data in inning.get("overs", []):
                        over_number = over_data.get("over")
                        for delivery in over_data.get("deliveries", []):
                            runs_scored = delivery.get("runs", {}).get("batter", 0)
                            if runs_scored == 4:
                                total_fours += 1
                            elif runs_scored == 6:
                                total_sixes += 1
                            delivery_info = {
                                "match_id": match_info["match_id"],
                                "team": team,
                                "over": over_number,
                                "ball": delivery.get("ball"),
                                "batter": delivery.get("batter"),
                                "bowler": delivery.get("bowler"),
                                "non_striker": delivery.get("non_striker"),
                                "runs_batter": delivery.get("runs", {}).get("batter", 0),
                                "runs_extras": delivery.get("runs", {}).get("extras", 0),
                                "runs_total": delivery.get("runs", {}).get("total", 0),
                                "wicket_player_out": delivery.get("wickets", [{}])[0].get("player_out", None),
                                "wicket_kind": delivery.get("wickets", [{}])[0].get("kind", None)
                            }
                            match_timeseries_data.append(delivery_info)

                            # Player stats aggregation for batting
                            batter = delivery.get("batter")
                            if batter:
                                if batter not in player_stats_dict:
                                    player_stats_dict[batter] = {
                                        "player_name": batter,
                                        "role": "Batter",
                                        "total_runs": 0,
                                        "strike_rate": 0,
                                        "total_balls_faced": 0,
                                        "total_wickets_taken": 0,
                                        "total_runs_conceded": 0,
                                        "total_overs_bowled": 0,
                                        "total_matches_played": 0,
                                        "matches_played_as_batter": 0,
                                        "matches_played_as_bowler": 0,
                                        "matches_won": 0,
                                        "matches_lost": 0,
                                        "player_of_match_awards": 0,
                                        "team": team
                                    }
                                player_stats_dict[batter]["total_runs"] += delivery.get("runs", {}).get("batter", 0)
                                player_stats_dict[batter]["total_balls_faced"] += 1
                                player_stats_dict[batter]["strike_rate"] = (player_stats_dict[batter]["total_runs"] / player_stats_dict[batter]["total_balls_faced"]) * 100

                            # Player stats aggregation for bowling
                            bowler = delivery.get("bowler")
                            if bowler:
                                if bowler not in player_stats_dict:
                                    player_stats_dict[bowler] = {
                                        "player_name": bowler,
                                        "role": "Bowler",
                                        "total_runs": 0,
                                        "strike_rate": 0,
                                        "total_balls_faced": 0,
                                        "total_wickets_taken": 0,
                                        "total_runs_conceded": 0,
                                        "total_overs_bowled": 0,
                                        "total_matches_played": 0,
                                        "matches_played_as_batter": 0,
                                        "matches_played_as_bowler": 0,
                                        "matches_won": 0,
                                        "matches_lost": 0,
                                        "player_of_match_awards": 0,
                                        "team": team
                                    }
                                player_stats_dict[bowler]["total_runs_conceded"] += delivery.get("runs", {}).get("total", 0)
                                player_stats_dict[bowler]["total_overs_bowled"] += 1

                                # Wickets taken by bowler
                                if delivery.get("wickets"):
                                    player_stats_dict[bowler]["total_wickets_taken"] += 1
                match_info["total_fours"] = total_fours
                match_info["total_sixes"] = total_sixes
                
                # Update match win/loss for players
                if match_winner:
                    for player in player_stats_dict.values():
                        if player["team"] == match_winner:
                            player["matches_won"] += 1
                        elif player["team"] == match_loser:
                            player["matches_lost"] += 1

                # Update player_of_match for award tracking
                player_of_match = match_info["player_of_match"]
                if player_of_match and player_of_match in player_stats_dict:
                    player_stats_dict[player_of_match]["player_of_match_awards"] += 1

                if match_timeseries_data:
                    df_timeseries = pd.DataFrame(match_timeseries_data)
                    output_csv_path = os.path.join(csv_folder, f"{filename.split('.')[0]}.csv")
                    df_timeseries.to_csv(output_csv_path, index=False)

                match_summary_list.append(match_info)
                
        except json.JSONDecodeError as e:
            print(f"Error: Failed to decode JSON in {filename}. Error: {e}")
        except Exception as e:
            print(f"Error: An unexpected error occurred while processing {filename}. Error: {e}")

print("Extraction completed.")

Starting the process of reading JSON files from the folder: ./odis_json/
Found 2888 files


Processing data...
Extraction completed.


In [5]:
# Create match summary DataFrame
if match_summary_list:
    try:
        df_summary = pd.DataFrame(match_summary_list)
        df_summary = df_summary[df_summary['gender'] == 'male']
        print("Successfully created a DataFrame with the match summary data.")
        print("Summary DataFrame shape:", df_summary.shape)
        # Save the match summary CSV
        df_summary.to_csv(os.path.join(output_folder, 'match_summary.csv'), index=False)
    except Exception as e:
        print(f"Error: Failed to create match summary DataFrame. Error: {e}")

Successfully created a DataFrame with the match summary data.
Summary DataFrame shape: (2400, 21)


In [6]:
# Separate player stats for batters and bowlers
batters_data = []
bowlers_data = []
if player_stats_dict:
    try:
        for player, stats in player_stats_dict.items():
            if stats["role"] == "Batter":
                # Calculate batting average and strike rate for batters
                if stats["total_balls_faced"] > 0:
                    stats["strike_rate"] = (stats["total_runs"] / stats["total_balls_faced"]) * 100
                if stats["total_wickets_taken"] > 0:
                    stats["average"] = stats["total_runs"] / stats["total_wickets_taken"]
                batters_data.append(stats)
            elif stats["role"] == "Bowler":
                # Calculate bowling economy rate and average for bowlers
                if stats["total_overs_bowled"] > 0:
                    stats["economy_rate"] = stats["total_runs_conceded"] / stats["total_overs_bowled"]
                if stats["total_wickets_taken"] > 0:
                    stats["average"] = stats["total_runs_conceded"] / stats["total_wickets_taken"]
                bowlers_data.append(stats)

        # Create DataFrames for batters and bowlers
        batter_df = pd.DataFrame(batters_data)
        bowler_df = pd.DataFrame(bowlers_data)

        batter_df = batter_df.sort_values(by=["total_runs", "strike_rate"], ascending=[False, False])
        bowler_df = bowler_df.sort_values(by=["economy_rate", "total_wickets_taken"], ascending=False)

        # Additional statistics for batters and bowlers
        batter_df["total_matches_played"] = batter_df["matches_won"] + batter_df["matches_lost"]
        batter_df["matches_played_as_batter"] = batter_df["total_matches_played"]

        bowler_df["total_matches_played"] = bowler_df["matches_won"] + bowler_df["matches_lost"]
        bowler_df["matches_played_as_bowler"] = bowler_df["total_matches_played"]

        print("Successfully created a DataFrame with player statistics.")
        print("Batters Stats DataFrame shape:", batter_df.shape)
        print("Bowlers Stats DataFrame shape:", bowler_df.shape)

        # Save the separate CSVs for batters and bowlers
        batter_df.to_csv(os.path.join(output_folder, 'batter_player_stats.csv'), index=False)
        bowler_df.to_csv(os.path.join(output_folder, 'bowler_player_stats.csv'), index=False)

    except Exception as e:
        print(f"Error: Failed to create player statistics DataFrame. Error: {e}")

Successfully created a DataFrame with player statistics.
Batters Stats DataFrame shape: (1540, 16)
Bowlers Stats DataFrame shape: (929, 17)


In [7]:
unique_teams = df_summary["teams"].unique()
all_countries = [country.strip() for pair in unique_teams for country in pair.split(',')]
unique_countries = pd.unique(all_countries)
print(unique_countries)

['India' 'South Africa' 'New Zealand' 'Australia' 'Nepal'
 'United States of America' 'United Arab Emirates' 'West Indies'
 'Sri Lanka' 'Papua New Guinea' 'Zimbabwe' 'Namibia' 'Kenya' 'Ireland'
 'England' 'Pakistan' 'Bangladesh' 'Netherlands' 'Bermuda' 'Scotland'
 'Oman' 'Hong Kong' 'Canada' 'Africa XI' 'Asia XI' 'Jersey' 'ICC World XI']


  unique_countries = pd.unique(all_countries)


In [8]:
allowed_countries = {"India", "Pakistan", "Bangladesh", "New Zealand", 
                     "England", "Australia", "Afghanistan", "South Africa"}

def is_valid_team(pair):
    teams = {country.strip() for country in pair.split(',')}
    return teams.issubset(allowed_countries)

len(df_summary[df_summary["teams"].apply(is_valid_team)])

928

# Get detailed player stats

| **Feature**                            | **Description**                                                  |
| -------------------------------------- | ---------------------------------------------------------------- |
| **Player Name**                        | Name of the player                                               |
| **Match ID**                           | Unique match identifier                                          |
| **Team**                               | Team of the player                                               |
| **Runs Scored (`runs`)**               | Total runs scored by the player                                  |
| **Balls Faced (`balls_faced`)**        | Number of balls faced by the player                              |
| **Fours (`fours`)**                    | Number of fours hit by the player                                |
| **Sixes (`sixes`)**                    | Number of sixes hit by the player                                |
| **Strike Rate (`strike_rate`)**        | Batting strike rate (calculated)                                 |
| **Wickets Taken (`wickets`)**          | Number of wickets taken by the bowler                            |
| **Maiden Over (`maiden`)**             | Number of overs in which no runs are scored                      |
| **Overs Bowled (`overs_bowled`)**      | Number of overs bowled by the player                             |
| **Balls Bowled (`balls_bowled`)**      | Total balls bowled by the player                                 |
| **Runs Conceded (`runs_conceded`)**    | Runs conceded by the bowler                                      |
| **Economy Rate (`economy`)**           | Bowling economy rate (calculated)                                |
| **Catches (`catches`)**                | Number of catches taken by the player                            |
| **Run Outs (`run_outs`)**              | Number of run outs contributed by the player (if available)      |
| **Match Outcome (`match_outcome`)**    | Whether the player’s team won or lost (win/loss)                 |
| **Player Role (Captain/Vice-Captain)** | Whether the player is a captain, vice-captain, or regular player |


In [9]:
def calculate_fantasy_points(stats):
    fantasy_points = 0

    # Batting Points
    fantasy_points += stats["runs"]  # 1 point per run
    fantasy_points += stats["fours"] * 4  # 4 points per boundary
    fantasy_points += stats["sixes"] * 6  # 6 points per six

    if stats["runs"] >= 25:
        fantasy_points += 4
    if stats["runs"] >= 50:
        fantasy_points += 8
    if stats["runs"] >= 75:
        fantasy_points += 12
    if stats["runs"] >= 100:
        fantasy_points += 16
    if stats["runs"] >= 125:
        fantasy_points += 20
    if stats["runs"] >= 150:
        fantasy_points += 24

    if stats["balls_faced"] > 0 and stats["runs"] == 0:  # Duck dismissal
        fantasy_points -= 3

    # Strike Rate Points (min 20 balls faced)
    if stats["balls_faced"] >= 20:
        if stats["strike_rate"] > 140:
            fantasy_points += 6
        elif 120.01 <= stats["strike_rate"] <= 140:
            fantasy_points += 4
        elif 100 <= stats["strike_rate"] <= 120:
            fantasy_points += 2
        elif 40 <= stats["strike_rate"] <= 50:
            fantasy_points -= 2
        elif 30 <= stats["strike_rate"] <= 39.99:
            fantasy_points -= 4
        elif stats["strike_rate"] < 30:
            fantasy_points -= 6

    # Bowling Points
    fantasy_points += stats["wickets"] * 25  # 25 points per wicket
    fantasy_points += stats["maiden"] * 4  # 4 points per maiden over
    fantasy_points += (stats["balls_bowled"] // 3) * 1  # 1 point per 3 dot balls

    if stats["wickets"] >= 4:
        fantasy_points += 4
    if stats["wickets"] >= 5:
        fantasy_points += 8
    if stats["wickets"] >= 6:
        fantasy_points += 12

    # Bonus for LBW or Bowled
    fantasy_points += stats.get("lbw_bowled", 0) * 8

    # Economy Rate Points (min 5 overs bowled)
    if stats["overs_bowled"] >= 5:
        if stats["economy"] < 2.5:
            fantasy_points += 6
        elif 2.5 <= stats["economy"] <= 3.49:
            fantasy_points += 4
        elif 3.5 <= stats["economy"] <= 4.5:
            fantasy_points += 2
        elif 7 <= stats["economy"] <= 8:
            fantasy_points -= 2
        elif 8.01 <= stats["economy"] <= 9:
            fantasy_points -= 4
        elif stats["economy"] > 9:
            fantasy_points -= 6

    # Fielding Points
    fantasy_points += stats["catches"] * 8
    if stats["catches"] >= 3:
        fantasy_points += 4  # 3 Catch Bonus

    fantasy_points += stats["stumps"] * 12
    fantasy_points += stats["run_outs"] * 6
    
    return fantasy_points

In [10]:
data = []

print("Starting the process of reading JSON files from the folder:", json_folder)
print(f"Found {len(os.listdir(json_folder))} files")

print("\n\nProcessing data...")
for file in os.listdir(json_folder):
    if file.endswith(".json"):
        file_path = os.path.join(json_folder, file)
        
        with open(file_path, "r", encoding="utf-8") as f:
            match_data = json.load(f)
            match_id = file.replace(".json", "")
            venue  = match_data.get("info", {}).get("venue")

            if(match_data.get("info", {}).get("gender") == "female"):
                continue
            
            # Get match outcome (winner team)
            outcome = match_data.get("info", {}).get("outcome", {})
            winner_team = outcome.get("winner", None)
            
            team_players = match_data.get("info", {}).get("players", {})
            player_team_map = {player: team for team, players in team_players.items() for player in players}
            teams = list(team_players.keys())
            
            player_stats = {}
            
            for inning in match_data.get("innings", []):
                batting_team = inning.get("team")
                bowling_team = [team for team in teams if team != batting_team][0] if len(teams) > 1 else None
                overs = inning.get("overs", [])
                
                for over in overs:
                    runsinover = 0
                    for delivery in over.get("deliveries", []):
                        batter = delivery["batter"]
                        bowler = delivery["bowler"]
                        runs = delivery["runs"]["batter"]
                        
                        # Initialize player stats if not already present
                        for player in [batter, bowler]:
                            if player not in player_stats:
                                player_stats[player] = {
                                    "match_id": match_id,
                                    "player": player,
                                    "team": player_team_map.get(player, batting_team if player in team_players.get(batting_team, []) else bowling_team),
                                    "runs": 0, "balls_faced": 0, "fours": 0, "sixes": 0,
                                    "wickets": 0, "overs_bowled": 0, "balls_bowled": 0, "runs_conceded": 0,
                                    "catches": 0, "run_outs": 0, "maiden": 0, "stumps": 0, "match_outcome": "loss"  # Default to loss, will update later
                                }
                        
                        # Update batting stats
                        player_stats[batter]["runs"] += runs
                        runsinover += runs
                        
                        player_stats[batter]["balls_faced"] += 1
                        if runs == 4:
                            player_stats[batter]["fours"] += 1
                        if runs == 6:
                            player_stats[batter]["sixes"] += 1

                        # Update bowling stats
                        player_stats[bowler]["balls_bowled"] += 1
                        player_stats[bowler]["runs_conceded"] += runs

                        # Update wickets & catches
                        if "wickets" in delivery:
                            for wicket in delivery["wickets"]:
                                player_out = wicket.get("player_out")
                                kind = wicket.get("kind")
                                fielders = wicket.get("fielders", [])
                        
                                if not player_out or not kind:
                                    continue

                                if bowler not in player_stats:
                                    player_stats[bowler] = {
                                        "match_id": match_id,
                                        "player": bowler,
                                        "team": player_team_map.get(bowler, bowling_team),
                                        "runs": 0, "balls_faced": 0, "fours": 0, "sixes": 0,
                                        "wickets": 0, "overs_bowled": 0, "balls_bowled": 0, "runs_conceded": 0,
                                        "catches": 0, "run_outs": 0, "maiden": 0, "stumps": 0, "match_outcome": "loss"
                                    }
                        
                                if kind in ["bowled", "lbw", "caught", "caught and bowled"]:
                                    player_stats[bowler]["wickets"] += 1  # Bowler gets the wicket
                        
                                if kind == "stumped" and fielders:
                                    for fielder in fielders:
                                        fielder_name = fielder.get("name")
                                        if not fielder_name:
                                            continue  # Skip if name is missing
                        
                                        if fielder_name not in player_stats:
                                            player_stats[fielder_name] = {
                                                "match_id": match_id,
                                                "player": fielder_name,
                                                "team": player_team_map.get(fielder_name, bowling_team),
                                                "runs": 0, "balls_faced": 0, "fours": 0, "sixes": 0,
                                                "wickets": 0, "overs_bowled": 0, "balls_bowled": 0, "runs_conceded": 0,
                                                "catches": 0, "run_outs": 0, "maiden": 0, "stumps": 0, "match_outcome": "loss"
                                            }
                                        player_stats[fielder_name]["stumps"] += 1  # Fielder gets the catch
                        
                                if kind == "caught" and fielders:
                                    for fielder in fielders:
                                        fielder_name = fielder.get("name")
                                        if not fielder_name:
                                            continue  # Skip if name is missing
                        
                                        if fielder_name not in player_stats:
                                            player_stats[fielder_name] = {
                                                "match_id": match_id,
                                                "player": fielder_name,
                                                "team": player_team_map.get(fielder_name, bowling_team),
                                                "runs": 0, "balls_faced": 0, "fours": 0, "sixes": 0,
                                                "wickets": 0, "overs_bowled": 0, "balls_bowled": 0, "runs_conceded": 0,
                                                "catches": 0, "run_outs": 0, "maiden": 0, "stumps": 0, "match_outcome": "loss"
                                            }
                                        player_stats[fielder_name]["catches"] += 1  # Fielder gets the catch
                        
                                if kind == "run out" and fielders:
                                    for fielder in fielders:
                                        fielder_name = fielder.get("name")
                                        if not fielder_name:
                                            continue

                                        if fielder_name not in player_stats:
                                            player_stats[fielder_name] = {
                                                "match_id": match_id,
                                                "player": fielder_name,
                                                "team": player_team_map.get(fielder_name, bowling_team),
                                                "runs": 0, "balls_faced": 0, "fours": 0, "sixes": 0,
                                                "wickets": 0, "overs_bowled": 0, "balls_bowled": 0, "runs_conceded": 0,
                                                "catches": 0, "run_outs": 0, "maiden": 0, "stumps": 0, "match_outcome": "loss"
                                            }
                                        player_stats[fielder_name]["run_outs"] += 1  # Fielder gets the run out
                        
                    if(runsinover == 0):
                        player_stats[bowler]["maiden"] += 1

            # Update match outcome for each player
            for player, stats in player_stats.items():
                if stats["team"] == winner_team:
                    stats["match_outcome"] = "win"

            # Convert player stats to final format
            for player, stats in player_stats.items():
                # Strike Rate Calculation
                stats["strike_rate"] = round((stats["runs"] / stats["balls_faced"]) * 100, 2) if stats["balls_faced"] > 0 else 0
                stats["overs_bowled"] = stats["balls_bowled"] // 6  # Fix over calculation
                stats["economy"] = round(stats["runs_conceded"] / (stats["balls_bowled"] / 6), 2) if stats["balls_bowled"] > 0 else 0
            
                # Initialize strike rate point categories
                stats["strike_rate_above_140"] = 0
                stats["strike_rate_120_140"] = 0
                stats["strike_rate_100_120"] = 0
                stats["strike_rate_40_50"] = 0
                stats["strike_rate_30_39"] = 0
                stats["strike_rate_below_30"] = 0
            
                # Apply Strike Rate Points only if the player has played at least 20 balls
                if stats["balls_faced"] >= 20:
                    if stats["strike_rate"] > 140:
                        stats["strike_rate_above_140"] += 1
                    elif 120.01 <= stats["strike_rate"] <= 140:
                        stats["strike_rate_120_140"] += 1
                    elif 100 <= stats["strike_rate"] <= 120:
                        stats["strike_rate_100_120"] += 1
                    elif 40 <= stats["strike_rate"] <= 50:
                        stats["strike_rate_40_50"] += 1
                    elif 30 <= stats["strike_rate"] <= 39.99:
                        stats["strike_rate_30_39"] += 1
                    elif stats["strike_rate"] < 30:
                        stats["strike_rate_below_30"] += 1
            
                # Initialize economy rate point categories
                stats["economy_below_2.5"] = 0
                stats["economy_2.5_3.49"] = 0
                stats["economy_3.5_4.5"] = 0
                stats["economy_7_8"] = 0
                stats["economy_8.01_9"] = 0
                stats["economy_above_9"] = 0
            
                # Apply Economy Rate Points only if the player has bowled at least 5 overs
                if stats["overs_bowled"] >= 5:
                    if stats["economy"] < 2.5:
                        stats["economy_below_2.5"] += 6
                    elif 2.5 <= stats["economy"] <= 3.49:
                        stats["economy_2.5_3.49"] += 4
                    elif 3.5 <= stats["economy"] <= 4.5:
                        stats["economy_3.5_4.5"] += 2
                    elif 7 <= stats["economy"] <= 8:
                        stats["economy_7_8"] -= 2
                    elif 8.01 <= stats["economy"] <= 9:
                        stats["economy_8.01_9"] -= 4
                    elif stats["economy"] > 9:
                        stats["economy_above_9"] -= 6

                stats["fantasy_points"] = calculate_fantasy_points(stats)
                stats["venue"] = venue
                
                data.append(stats)


print("Finished data extraction")

Starting the process of reading JSON files from the folder: ./odis_json/
Found 2888 files


Processing data...
Finished data extraction


In [11]:
players_detailed_stats = pd.DataFrame(data)

In [12]:
players_detailed_stats = players_detailed_stats.sort_values(by='fantasy_points', ascending=False)
players_detailed_stats.to_csv(os.path.join(output_folder, 'detailed_player_data.csv'), index=False)

## Conclusion

In this notebook, I successfully processed cricket match data stored in JSON files to extract detailed match and player statistics. Here's a summary of what was accomplished:

1. **Data Collection and Processing**:
   - The notebook reads JSON files from the specified folder (`./odis_json/`), each representing a cricket match.
   - It extracts key match information such as match ID, venue, teams, toss details, match outcome, and player performances.
   - The data is processed to calculate various player statistics, including runs scored, balls faced, wickets taken, economy rates, and strike rates.

2. **Data Transformation**:
   - The extracted data is transformed into a structured format suitable for analysis.
   - Time-series data for each match is stored in separate CSV files, capturing ball-by-ball details.
   - Player statistics are aggregated and enriched with additional metrics like fantasy points, strike rate categories, and economy rate categories.

3. **Output**:
   - The processed data is saved into CSV files for further analysis or visualization.
   - Match summaries and detailed player statistics are stored in the `./output/` folder.

4. **Final Output**:
   - A total of **4 CSV files** are generated and stored in the `./output/` folder.
   - The primary output file, `detailed_player_data.csv`, contains comprehensive player statistics sorted by fantasy points.