# Data Preprocessing

This notebook demonstrates the steps for cleaning and preprocessing the raw transaction data.

We will:
- Handle missing values
- Encode categorical variables
- Scale numerical features

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder

# Load raw data
data = pd.read_csv('../data/raw_data.csv')

# Display the first few rows
data.head()

## Handle Missing Values

Let's fill any missing values using forward fill method.

In [2]:
# Handle missing values
data.fillna(method='ffill', inplace=True)

# Verify missing values are handled
data.isnull().sum()

## Encode Categorical Variables

Convert categorical variables into numerical format using label encoding.

In [3]:
# Encode categorical variables
le = LabelEncoder()
data['category'] = le.fit_transform(data['category'])

# Display the first few rows after encoding
data.head()

## Normalize Numerical Features

Scale the numerical features to standardize the data.

In [4]:
# Normalize numerical features
scaler = StandardScaler()
data[['amount', 'balance']] = scaler.fit_transform(data[['amount', 'balance']])

# Display the first few rows after scaling
data.head()

## Save Processed Data

Finally, let's save the processed data to a CSV file for further analysis and model training.

In [5]:
# Save processed data
data.to_csv('../data/processed_data.csv', index=False)
print("Processed data saved to '../data/processed_data.csv'")