# Neural Network for Credit Classification

This notebook demonstrates a simple feedforward neural network for credit default prediction using the UCI German Credit dataset.

## Model Overview

**Neural Networks** are composed of layers of interconnected nodes (neurons) that learn hierarchical representations of the data through backpropagation.

### Pros
- Can learn complex, non-linear patterns
- Flexible architecture (depth, width, activations)
- Universal function approximators
- Automatically learns feature representations
- Scales well with data (benefits from more data)

### Cons
- Overkill for small/tabular datasets (tree-based models often perform better)
- Requires more hyperparameter tuning
- Less interpretable than simpler models
- Prone to overfitting without regularisation
- Computationally expensive to train

### When to Use
- When you have large amounts of data
- For complex pattern recognition tasks
- When other methods have been exhausted
- To demonstrate deep learning capability in a portfolio

## Setup

In [None]:
import sys
sys.path.insert(0, '../src')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch

from creditclass.preprocessing import prepare_data
from creditclass.training import get_model, train_model, save_model, NeuralNetworkClassifier
from creditclass.evaluation import (
    evaluate_model,
    compute_shap_values,
)
from creditclass.plots import (
    set_plot_style,
    plot_confusion_matrix,
    plot_roc_curve,
    plot_precision_recall,
    plot_calibration,
    plot_shap_summary,
)

set_plot_style()
RANDOM_STATE = 42

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## Load Data

In [None]:
data = prepare_data(
    target_type='default',
    encoding_method='onehot',
    test_size=0.2,
    random_state=RANDOM_STATE,
    scale=True,  # Neural networks benefit from scaling
)

X_train = data['X_train_scaled']
X_test = data['X_test_scaled']
y_train = data['y_train'].values
y_test = data['y_test'].values
feature_names = data['feature_names']

print(f"Training set: {X_train.shape[0]} samples, {X_train.shape[1]} features")
print(f"Test set: {X_test.shape[0]} samples")

## Training

In [None]:
model = get_model('neural_network', params={
    'hidden_sizes': [64, 32],
    'dropout': 0.2,
    'learning_rate': 0.001,
    'epochs': 100,
    'batch_size': 32,
    'random_state': RANDOM_STATE,
})

model = train_model(model, X_train, y_train)

print("Model trained successfully!")
print(f"Architecture: Input({X_train.shape[1]}) -> 64 -> 32 -> Output(2)")
print(f"Dropout: {model.dropout}")
print(f"Epochs: {model.epochs}")

## Evaluation

In [None]:
metrics = evaluate_model(model, X_test, y_test)

print("Performance Metrics:")
print("-" * 30)
for name, value in metrics.items():
    if value is not None:
        print(f"{name.capitalize():12} {value:.4f}")

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

plot_confusion_matrix(
    model, X_test, y_test,
    class_names=['Good Credit', 'Bad Credit'],
    ax=axes[0],
    title='Neural Network - Confusion Matrix'
)

plot_roc_curve(model, X_test, y_test, ax=axes[1], label='Neural Network')

plt.tight_layout()
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(7, 6))
plot_precision_recall(model, X_test, y_test, ax=ax, label='Neural Network')
plt.tight_layout()
plt.show()

## Architecture Comparison

Let's compare different network architectures.

In [None]:
architectures = [
    {'name': 'Shallow (32)', 'hidden_sizes': [32]},
    {'name': 'Medium (64-32)', 'hidden_sizes': [64, 32]},
    {'name': 'Deep (128-64-32)', 'hidden_sizes': [128, 64, 32]},
    {'name': 'Wide (128)', 'hidden_sizes': [128]},
]

arch_results = []

for arch in architectures:
    nn = NeuralNetworkClassifier(
        hidden_sizes=arch['hidden_sizes'],
        epochs=100,
        random_state=RANDOM_STATE,
    )
    nn.fit(X_train, y_train)
    metrics = evaluate_model(nn, X_test, y_test)
    arch_results.append({'Architecture': arch['name'], **metrics})

arch_df = pd.DataFrame(arch_results)
print(arch_df.to_string(index=False))

## Interpretability

Neural networks are often called "black boxes", but SHAP can help explain their predictions.

In [None]:
# SHAP values (using KernelExplainer - may be slow)
print("Computing SHAP values (this may take a moment)...")
shap_data = compute_shap_values(model, X_test, feature_names=feature_names, max_samples=50)

fig, ax = plt.subplots(figsize=(10, 8))
plot_shap_summary(shap_data, plot_type='bar', max_display=15)
plt.title('Neural Network - SHAP Feature Importance')
plt.tight_layout()
plt.show()

## Effect of Dropout

In [None]:
dropout_values = [0.0, 0.1, 0.2, 0.3, 0.5]
dropout_results = []

for dropout in dropout_values:
    nn = NeuralNetworkClassifier(
        hidden_sizes=[64, 32],
        dropout=dropout,
        epochs=100,
        random_state=RANDOM_STATE,
    )
    nn.fit(X_train, y_train)
    metrics = evaluate_model(nn, X_test, y_test)
    dropout_results.append({'Dropout': dropout, **metrics})

dropout_df = pd.DataFrame(dropout_results)
print(dropout_df.to_string(index=False))

In [None]:
fig, ax = plt.subplots(figsize=(8, 6))

ax.plot(dropout_df['Dropout'], dropout_df['accuracy'], 'o-', label='Accuracy')
ax.plot(dropout_df['Dropout'], dropout_df['f1'], 's-', label='F1')
ax.plot(dropout_df['Dropout'], dropout_df['auc'], '^-', label='AUC')

ax.set_xlabel('Dropout Rate')
ax.set_ylabel('Score')
ax.set_title('Neural Network Performance vs. Dropout')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Calibration

In [None]:
fig, ax = plt.subplots(figsize=(7, 6))
plot_calibration(model, X_test, y_test, ax=ax, label='Neural Network')
plt.tight_layout()
plt.show()

## Save Model

In [None]:
save_path = save_model(model, 'neural_network')
print(f"Model saved to: {save_path}")

## Summary

### Key Takeaways

1. **Performance**: On this small tabular dataset, neural networks may not outperform tree-based methods
2. **Architecture**: Deeper isn't always better - match complexity to data size
3. **Regularisation**: Dropout helps prevent overfitting, especially important for small datasets
4. **Interpretability**: SHAP provides valuable insights into feature importance

### Recommendations

- For tabular data with < 10,000 samples, consider tree-based models first
- Use dropout and batch normalisation to prevent overfitting
- Start with a simple architecture and increase complexity gradually
- Monitor training/validation loss to detect overfitting early
- Neural networks shine with larger datasets and complex patterns