# Data Processing with Python

In this notebook, we will load, clean, and save NBA historical data using `pandas` for data manipulation and `logging` for tracking our progress.


In [14]:
# Import necessary libraries
import pandas as pd  # For data manipulation and analysis
import logging  # For logging information, warnings, and errors
import os # For finding and setting the correct file paths

# Configure logging
logging.basicConfig(level=logging.INFO)


## Load Data

First, we will load the historical NBA data from a CSV file.


In [15]:
def load_data(historical_path):
    """
    Load the historical data from a CSV file.
    
    Args:
        historical_path (str): The file path to the historical data CSV.
    
    Returns:
        DataFrame: The loaded data as a pandas DataFrame.
    """
    logging.info(f'Loading historical data from %s {historical_path}')
    return pd.read_csv(historical_path)


In [25]:
BASE_DIR = os.getcwd()
BASE_DIR = os.path.dirname(BASE_DIR)
historical_path= os.path.join(BASE_DIR, 'scraper', 'historical_nba_stats.csv')

In [26]:
df = load_data(historical_path)
df.head()

INFO:root:Loading historical data from %s c:\Users\bartl\Desktop\General-Projects\NBA_Pred\scraper\historical_nba_stats.csv


Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
0,Curly Armstrong,G-F,31,FTW,63,,,144,516,0.279,...,,,,176,,,,217,458,1950
1,Cliff Barker,SG,29,INO,49,,,102,274,0.372,...,,,,109,,,,99,279,1950
2,Leo Barnhorst,SF,25,CHS,67,,,174,499,0.349,...,,,,140,,,,192,438,1950
3,Ed Bartels,F,24,TOT,15,,,22,86,0.256,...,,,,20,,,,29,63,1950
4,Ed Bartels,F,24,DNN,13,,,21,82,0.256,...,,,,20,,,,27,59,1950


## Clean Data

Next, we will clean the data by dropping missing values and converting certain columns to numeric types. We will also add a new column `PER` as an example feature.


In [None]:
def clean_data(df):
    """
    Clean the data by dropping missing values and converting columns to numeric types.
    
    Args:
        df (DataFrame): The data to clean.
    
    Returns:
        DataFrame: The cleaned data.
    """
    logging.info('Cleaning data')
    df = df.dropna()
    df['PTS'] = pd.to_numeric(df['PTS'], errors='coerce')
    df['AST'] = pd.to_numeric(df['AST'], errors='coerce')
    df['REB'] = pd.to_numeric(df['REB'], errors='coerce')
    df['BLK'] = pd.to_numeric(df['BLK'], errors='coerce')
    df['STL'] = pd.to_numeric(df['STL'], errors='coerce')
    df['G'] = pd.to_numeric(df['G'], errors='coerce')
    df['PER'] = (df['PTS'] + df['AST'] + df['REB']) / df['G']  # Example feature
    return df

# Example usage
cleaned_df = clean_data(df)
cleaned_df.head()


## Save Data

Finally, we will save the cleaned data to a new CSV file.


## Complete Data Processing Workflow

We can now run the complete data processing workflow: load, clean, and save data.


In [None]:
# Load data
df = load_data(historical_path)

# Clean data
cleaned_df = clean_data(df)

# Save cleaned data
save_data(cleaned_df)


def save_data(df, filename='cleaned_nba_stats.csv'):
    """
    Save the cleaned data to a CSV file.
    
    Args:
        df (DataFrame): The data to save.
        filename (str): The name of the file to save the data to. Defaults to 'cleaned_nba_stats.csv'.
    
    Returns:
        None
    """
    logging.info('Saving cleaned data to %s', filename)
    df.to_csv(filename, index=False)

# Example usage
save_data(cleaned_df)


## Running the DataProcessor

Now, we will create an instance of `DataProcessor` and run the data processing workflow.


In [7]:
if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    processor = DataProcessor('scraper/historical_nba_stats.csv')
    processor.run()


INFO:root:Loading historical data from scraper/historical_nba_stats.csv


FileNotFoundError: [Errno 2] No such file or directory: 'scraper/historical_nba_stats.csv'