# Model Training

This notebook guides you through the process of training and evaluating the fraud detection model using the processed data.

We will:
- Load the processed data
- Split the data into training and test sets
- Train a Random Forest classifier
- Evaluate the model's performance

In [1]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load processed data
data = pd.read_csv('../data/processed_data.csv')

# Display the first few rows
data.head()

## Split the Data

We'll split the data into training and test sets to evaluate the model's performance.

In [2]:
# Split data into features (X) and target (y)
X = data.drop('fraudulent', axis=1)
y = data['fraudulent']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Verify the split
print("Training set size:", X_train.shape[0])
print("Test set size:", X_test.shape[0])

## Train the Model

We'll train a Random Forest classifier on the training data.

In [3]:
# Initialize and train the Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Evaluate the model on the test set
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Model accuracy: {accuracy:.2f}')

## Evaluate the Model

Let's evaluate the model's performance using a classification report and a confusion matrix.

In [4]:
# Classification report
print("Classification Report:\n", classification_report(y_test, y_pred))

# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Non-Fraudulent', 'Fraudulent'], yticklabels=['Non-Fraudulent', 'Fraudulent'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

## Save the Model

We can save the trained model for future use.

In [5]:
import joblib

# Save the trained model
joblib.dump(model, '../src/fraud_detection_model.pkl')
print("Model saved to '../src/fraud_detection_model.pkl'")