# Before we do anything:

We must get the rest of the data and split them into training and testing batches.

### Plan of action:
- Download the play_by_play data for each year in CSV format
- Download the injury data for each year in CSV format
- Put all of it in the data directory
- Create a Python script that uses all of the preprocessing (cleaning, constructing, formatting) sections of Data Prep
- Run the Python script on each year for the play_by_play and injury data
- Separate, based on the limits of the data
    - Training data: 2009 - 2017
    - Testing data: 2018 - 2024
- Make a training and testing directory for each

## Imports

In [29]:
import os
import subprocess

## Split into training/testing batches

In [33]:
# Loop through each year for training dataset (2009 - 2017)
for year in range(2009, 2018):
    # Define file paths
    play_by_play_file = f"{os.getcwd()}/data/play_by_play_{year}.csv"
    injuries_file = f"{os.getcwd()}/data/injuries_{year}.csv"

    command = ["./format_data.py", play_by_play_file, injuries_file, "training", str(year)]
    try:
        subprocess.run(command, check=True)
        print(f"Processed files for year {year}")
    except subprocess.CalledProcessError as e:
        print(f"Error processing files for year {year}: {e}")

Processed files for year 2009
Processed files for year 2010
Processed files for year 2011
Processed files for year 2012
Processed files for year 2013
Processed files for year 2014
Processed files for year 2015
Processed files for year 2016
Processed files for year 2017


In [34]:
# Loop through each year for testing dataset (2018 - 2024)
for year in range(2018, 2025):
    # Define file paths
    play_by_play_file = f"{os.getcwd()}/data/play_by_play_{year}.csv"
    injuries_file = f"{os.getcwd()}/data/injuries_{year}.csv"

    command = ["./format_data.py", play_by_play_file, injuries_file, "testing", str(year)]
    try:
        subprocess.run(command, check=True)
        print(f"Processed files for year {year}")
    except subprocess.CalledProcessError as e:
        print(f"Error processing files for year {year}: {e}")

Processed files for year 2018
Processed files for year 2019
Processed files for year 2020
Processed files for year 2021
Processed files for year 2022
Processed files for year 2023
Processed files for year 2024
