# Demo 3: Deep Learning with TensorFlow/Keras

## Learning Objectives
- Build neural networks using TensorFlow/Keras
- Understand the Sequential API
- Train models and monitor progress
- Evaluate model performance
- Visualize training history
- Compare deep learning with traditional ML

## Setup

**Important:** This demo requires Python 3.13 or earlier. When creating your virtual environment with `uv`, use: `uv venv --python python3.13`

This ensures TensorFlow can be installed. If you're using Python 3.14, TensorFlow is not yet available.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import altair as alt
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

TensorFlow version: 2.20.0
Keras version: 3.12.0


## Part 1: Generate Classification Dataset

For deep learning, we'll create a more complex classification problem that benefits from neural networks' ability to learn non-linear decision boundaries.

In [2]:
# Generate a complex 2D classification dataset
n_samples = 10000
n_features = 20  # Higher dimensional for neural networks

np.random.seed(42)

# Create features with some structure
X = np.random.randn(n_samples, n_features)

# Create complex non-linear target
# Mix of linear and non-linear relationships
y_linear = (X[:, 0] + X[:, 1] - X[:, 2] > 0).astype(int)
y_nonlinear = ((X[:, 3]**2 + X[:, 4]**2) < 2).astype(int)
y_interaction = ((X[:, 5] * X[:, 6]) > 0.5).astype(int)

# Combine with some noise
y = ((y_linear + y_nonlinear + y_interaction) >= 2).astype(int)

# Add some random noise to make it more realistic
flip_indices = np.random.choice(n_samples, size=int(0.1 * n_samples), replace=False)
y[flip_indices] = 1 - y[flip_indices]

# Convert to DataFrame for easier handling
df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(n_features)])
df['target'] = y

print("Dataset shape:", df.shape)
print(f"\nTarget distribution:")
print(df['target'].value_counts())
print(f"\nClass balance: {df['target'].mean():.2%} positive class")

Dataset shape: (10000, 21)

Target distribution:
target
0    5567
1    4433
Name: count, dtype: int64

Class balance: 44.33% positive class


## Part 2: Data Preprocessing

Neural networks work best with scaled features. Let's prepare our data.

Neural networks are sensitive to the scale of input features. Unlike tree-based models (Random Forest, XGBoost) which can handle different scales, neural networks use gradient descent optimization that works much better when all features are on a similar scale.

**Why scaling matters:**
- Features with larger values can dominate the learning process
- Gradient descent converges faster with scaled features
- Activation functions work better when inputs are in a reasonable range
- Without scaling, some features might be ignored or cause training instability

In [3]:
# Split into features and target
X = df.drop('target', axis=1).values
y = df['target'].values

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Scale features (important for neural networks!)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set: {X_train_scaled.shape}")
print(f"Test set: {X_test_scaled.shape}")
print(f"\nFeature statistics (after scaling):")
print(f"Mean: {X_train_scaled.mean(axis=0)[:5]}")  # Should be ~0
print(f"Std: {X_train_scaled.std(axis=0)[:5]}")    # Should be ~1

Training set: (8000, 20)
Test set: (2000, 20)

Feature statistics (after scaling):
Mean: [ 5.15143483e-17 -2.08305595e-17  6.89726054e-18 -2.61596300e-18
 -1.00960906e-17]
Std: [1. 1. 1. 1. 1.]


**StandardScaler** transforms features to have mean=0 and standard deviation=1. Notice we fit the scaler on training data only, then transform both training and test data. This prevents data leakage - the test set statistics shouldn't influence the scaling.

## Part 3: Build Your First Neural Network

Let's create a simple neural network using Keras Sequential API.

In [4]:
# Build a simple neural network
model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(n_features,), name='hidden1'),
    keras.layers.Dense(32, activation='relu', name='hidden2'),
    keras.layers.Dense(1, activation='sigmoid', name='output')  # Binary classification
])

# Display model architecture
print("=== Model Architecture ===")
model.summary()

# Visualize model (optional, requires graphviz)
# keras.utils.plot_model(model, show_shapes=True, show_layer_names=True)

=== Model Architecture ===


**Understanding the architecture:**
- **Input layer**: 20 features (automatically created)
- **Hidden layer 1**: 64 neurons with ReLU activation
- **Hidden layer 2**: 32 neurons with ReLU activation
- **Output layer**: 1 neuron with sigmoid activation (for binary classification)

## Part 4: Compile the Model

Before training, we need to specify the optimizer, loss function, and metrics.

Before training, we need to configure three key components:

1. **Optimizer**: How the model updates its weights during training (Adam is a popular choice)
2. **Loss function**: What the model tries to minimize (binary_crossentropy for classification)
3. **Metrics**: What we track during training (accuracy tells us how often predictions are correct)

In [5]:
# Compile the model
model.compile(
    optimizer='adam',  # Adaptive learning rate optimizer
    loss='binary_crossentropy',  # For binary classification
    metrics=['accuracy']  # Track accuracy during training
)

print("Model compiled successfully!")
print(f"Optimizer: {model.optimizer.get_config()['name']}")
print(f"Loss function: {model.loss}")
print(f"Metrics: {[m.name for m in model.metrics]}")

Model compiled successfully!
Optimizer: adam
Loss function: binary_crossentropy
Metrics: ['loss', 'compile_metrics']


**Understanding these choices:**
- **Adam optimizer**: Adapts the learning rate for each parameter, making training more efficient
- **Binary crossentropy**: Appropriate for binary classification (two classes)
- **Accuracy**: Simple metric - percentage of correct predictions. For imbalanced classes, you might also track precision/recall.

## Part 5: Train the Model

Now let's train the model and watch it learn!

In [6]:
# Train the model
history = model.fit(
    X_train_scaled, y_train,
    epochs=50,  # Number of training iterations
    batch_size=32,  # Number of samples per gradient update
    validation_split=0.2,  # Use 20% of training data for validation
    verbose=1  # Show progress
)

Epoch 1/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1:04[0m 324ms/step - accuracy: 0.5312 - loss: 0.6827

[1m140/200[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 361us/step - accuracy: 0.5786 - loss: 0.6734  

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 742us/step - accuracy: 0.6356 - loss: 0.6415 - val_accuracy: 0.7038 - val_loss: 0.5979


Epoch 2/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 5ms/step - accuracy: 0.7188 - loss: 0.5005

[1m199/200[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 253us/step - accuracy: 0.7076 - loss: 0.5921

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 379us/step - accuracy: 0.7123 - loss: 0.5882 - val_accuracy: 0.7325 - val_loss: 0.5710


Epoch 3/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.7812 - loss: 0.4821

[1m192/200[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 263us/step - accuracy: 0.7425 - loss: 0.5551

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 381us/step - accuracy: 0.7452 - loss: 0.5484 - val_accuracy: 0.7550 - val_loss: 0.5401


Epoch 4/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.7500 - loss: 0.4595

[1m187/200[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 270us/step - accuracy: 0.7665 - loss: 0.5151

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 415us/step - accuracy: 0.7670 - loss: 0.5122 - val_accuracy: 0.7663 - val_loss: 0.5250


Epoch 5/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.8125 - loss: 0.4389

[1m182/200[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 278us/step - accuracy: 0.7763 - loss: 0.4922

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 400us/step - accuracy: 0.7748 - loss: 0.4929 - val_accuracy: 0.7656 - val_loss: 0.5199


Epoch 6/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 5ms/step - accuracy: 0.8125 - loss: 0.4238

[1m186/200[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 271us/step - accuracy: 0.7862 - loss: 0.4802

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 388us/step - accuracy: 0.7847 - loss: 0.4817 - val_accuracy: 0.7694 - val_loss: 0.5177


Epoch 7/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.8438 - loss: 0.4117

[1m190/200[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 265us/step - accuracy: 0.7924 - loss: 0.4713

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 380us/step - accuracy: 0.7916 - loss: 0.4731 - val_accuracy: 0.7656 - val_loss: 0.5161


Epoch 8/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.8438 - loss: 0.4022

[1m190/200[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 265us/step - accuracy: 0.7997 - loss: 0.4638

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 388us/step - accuracy: 0.7973 - loss: 0.4656 - val_accuracy: 0.7663 - val_loss: 0.5159


Epoch 9/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.8438 - loss: 0.3946

[1m186/200[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 271us/step - accuracy: 0.8036 - loss: 0.4566

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 391us/step - accuracy: 0.8016 - loss: 0.4588 - val_accuracy: 0.7650 - val_loss: 0.5162


Epoch 10/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.8438 - loss: 0.3856

[1m190/200[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 266us/step - accuracy: 0.8069 - loss: 0.4501

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 407us/step - accuracy: 0.8037 - loss: 0.4523 - val_accuracy: 0.7706 - val_loss: 0.5150


Epoch 11/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8438 - loss: 0.3778

[1m165/200[0m [32m━━━━━━━━━━━━━━━━[0m[37m━━━━[0m [1m0s[0m 306us/step - accuracy: 0.8110 - loss: 0.4432

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 438us/step - accuracy: 0.8083 - loss: 0.4461 - val_accuracy: 0.7713 - val_loss: 0.5144


Epoch 12/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8438 - loss: 0.3734

[1m177/200[0m [32m━━━━━━━━━━━━━━━━━[0m[37m━━━[0m [1m0s[0m 285us/step - accuracy: 0.8144 - loss: 0.4369

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 447us/step - accuracy: 0.8114 - loss: 0.4400 - val_accuracy: 0.7731 - val_loss: 0.5139


Epoch 13/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8438 - loss: 0.3677

[1m148/200[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 341us/step - accuracy: 0.8204 - loss: 0.4304

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 549us/step - accuracy: 0.8153 - loss: 0.4339 - val_accuracy: 0.7763 - val_loss: 0.5146


Epoch 14/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 10ms/step - accuracy: 0.8438 - loss: 0.3698

[1m 86/200[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m0s[0m 589us/step - accuracy: 0.8250 - loss: 0.4195

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 649us/step - accuracy: 0.8183 - loss: 0.4282 - val_accuracy: 0.7738 - val_loss: 0.5147


Epoch 15/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8125 - loss: 0.3682

[1m140/200[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 360us/step - accuracy: 0.8243 - loss: 0.4193

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 530us/step - accuracy: 0.8217 - loss: 0.4227 - val_accuracy: 0.7769 - val_loss: 0.5147


Epoch 16/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8125 - loss: 0.3635

[1m138/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 365us/step - accuracy: 0.8237 - loss: 0.4135

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 513us/step - accuracy: 0.8247 - loss: 0.4170 - val_accuracy: 0.7775 - val_loss: 0.5149


Epoch 17/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8125 - loss: 0.3626

[1m136/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 371us/step - accuracy: 0.8271 - loss: 0.4086

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 515us/step - accuracy: 0.8292 - loss: 0.4118 - val_accuracy: 0.7794 - val_loss: 0.5150


Epoch 18/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8125 - loss: 0.3624

[1m154/200[0m [32m━━━━━━━━━━━━━━━[0m[37m━━━━━[0m [1m0s[0m 327us/step - accuracy: 0.8317 - loss: 0.4041

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 484us/step - accuracy: 0.8330 - loss: 0.4069 - val_accuracy: 0.7825 - val_loss: 0.5159


Epoch 19/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8125 - loss: 0.3598

[1m136/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 373us/step - accuracy: 0.8336 - loss: 0.3991

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 526us/step - accuracy: 0.8353 - loss: 0.4021 - val_accuracy: 0.7862 - val_loss: 0.5165


Epoch 20/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8125 - loss: 0.3566

[1m140/200[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 360us/step - accuracy: 0.8372 - loss: 0.3945

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 504us/step - accuracy: 0.8373 - loss: 0.3975 - val_accuracy: 0.7887 - val_loss: 0.5179


Epoch 21/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8125 - loss: 0.3535

[1m154/200[0m [32m━━━━━━━━━━━━━━━[0m[37m━━━━━[0m [1m0s[0m 327us/step - accuracy: 0.8413 - loss: 0.3900

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 467us/step - accuracy: 0.8398 - loss: 0.3931 - val_accuracy: 0.7856 - val_loss: 0.5195


Epoch 22/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8125 - loss: 0.3475

[1m146/200[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 345us/step - accuracy: 0.8436 - loss: 0.3857

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 491us/step - accuracy: 0.8428 - loss: 0.3889 - val_accuracy: 0.7881 - val_loss: 0.5206


Epoch 23/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8125 - loss: 0.3432

[1m130/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 388us/step - accuracy: 0.8466 - loss: 0.3806

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 545us/step - accuracy: 0.8450 - loss: 0.3846 - val_accuracy: 0.7844 - val_loss: 0.5217


Epoch 24/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 8ms/step - accuracy: 0.8125 - loss: 0.3388

[1m104/200[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m0s[0m 486us/step - accuracy: 0.8476 - loss: 0.3742

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 732us/step - accuracy: 0.8455 - loss: 0.3805 - val_accuracy: 0.7862 - val_loss: 0.5233


Epoch 25/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 12ms/step - accuracy: 0.8750 - loss: 0.3356

[1m104/200[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m0s[0m 489us/step - accuracy: 0.8510 - loss: 0.3694

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 713us/step - accuracy: 0.8477 - loss: 0.3762 - val_accuracy: 0.7869 - val_loss: 0.5252


Epoch 26/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 10ms/step - accuracy: 0.9062 - loss: 0.3313

[1m110/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 460us/step - accuracy: 0.8536 - loss: 0.3656

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 663us/step - accuracy: 0.8489 - loss: 0.3722 - val_accuracy: 0.7887 - val_loss: 0.5269


Epoch 27/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 10ms/step - accuracy: 0.9062 - loss: 0.3277

[1m115/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 439us/step - accuracy: 0.8534 - loss: 0.3620

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 610us/step - accuracy: 0.8512 - loss: 0.3682 - val_accuracy: 0.7856 - val_loss: 0.5289


Epoch 28/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.9062 - loss: 0.3237

[1m125/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 405us/step - accuracy: 0.8551 - loss: 0.3586

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 600us/step - accuracy: 0.8531 - loss: 0.3642 - val_accuracy: 0.7837 - val_loss: 0.5316


Epoch 29/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 9ms/step - accuracy: 0.9062 - loss: 0.3266

[1m117/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 431us/step - accuracy: 0.8578 - loss: 0.3541

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 631us/step - accuracy: 0.8555 - loss: 0.3602 - val_accuracy: 0.7844 - val_loss: 0.5346


Epoch 30/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 8ms/step - accuracy: 0.9062 - loss: 0.3291

[1m116/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 438us/step - accuracy: 0.8604 - loss: 0.3504

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 615us/step - accuracy: 0.8578 - loss: 0.3565 - val_accuracy: 0.7837 - val_loss: 0.5377


Epoch 31/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.9062 - loss: 0.3269

[1m118/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 430us/step - accuracy: 0.8619 - loss: 0.3465

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 637us/step - accuracy: 0.8605 - loss: 0.3527 - val_accuracy: 0.7844 - val_loss: 0.5414


Epoch 32/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 9ms/step - accuracy: 0.8750 - loss: 0.3280

[1m113/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 449us/step - accuracy: 0.8615 - loss: 0.3432

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 854us/step - accuracy: 0.8612 - loss: 0.3492 - val_accuracy: 0.7825 - val_loss: 0.5448


Epoch 33/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 12ms/step - accuracy: 0.8750 - loss: 0.3284

[1m 57/200[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 905us/step - accuracy: 0.8662 - loss: 0.3326

[1m126/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 808us/step - accuracy: 0.8641 - loss: 0.3408

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 760us/step - accuracy: 0.8640 - loss: 0.3423

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.8637 - loss: 0.3459 - val_accuracy: 0.7831 - val_loss: 0.5478


Epoch 34/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 8ms/step - accuracy: 0.8750 - loss: 0.3252

[1m109/200[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m0s[0m 465us/step - accuracy: 0.8669 - loss: 0.3365

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 661us/step - accuracy: 0.8658 - loss: 0.3424 - val_accuracy: 0.7806 - val_loss: 0.5513


Epoch 35/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.3239

[1m126/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 402us/step - accuracy: 0.8661 - loss: 0.3341

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 581us/step - accuracy: 0.8666 - loss: 0.3391 - val_accuracy: 0.7788 - val_loss: 0.5566


Epoch 36/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.3210

[1m126/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 403us/step - accuracy: 0.8681 - loss: 0.3313

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 580us/step - accuracy: 0.8681 - loss: 0.3361 - val_accuracy: 0.7781 - val_loss: 0.5605


Epoch 37/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 8ms/step - accuracy: 0.8750 - loss: 0.3191

[1m125/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 404us/step - accuracy: 0.8694 - loss: 0.3279

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 578us/step - accuracy: 0.8697 - loss: 0.3328 - val_accuracy: 0.7775 - val_loss: 0.5641


Epoch 38/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.3132

[1m127/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 399us/step - accuracy: 0.8719 - loss: 0.3245

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 576us/step - accuracy: 0.8709 - loss: 0.3295 - val_accuracy: 0.7750 - val_loss: 0.5689


Epoch 39/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.3160

[1m134/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 376us/step - accuracy: 0.8733 - loss: 0.3224

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 547us/step - accuracy: 0.8734 - loss: 0.3266 - val_accuracy: 0.7738 - val_loss: 0.5731


Epoch 40/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 8ms/step - accuracy: 0.8750 - loss: 0.3138

[1m120/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 421us/step - accuracy: 0.8729 - loss: 0.3190

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 590us/step - accuracy: 0.8734 - loss: 0.3237 - val_accuracy: 0.7725 - val_loss: 0.5769


Epoch 41/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.3091

[1m138/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 366us/step - accuracy: 0.8750 - loss: 0.3166

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 547us/step - accuracy: 0.8753 - loss: 0.3206 - val_accuracy: 0.7744 - val_loss: 0.5810


Epoch 42/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 8ms/step - accuracy: 0.8750 - loss: 0.3105

[1m 91/200[0m [32m━━━━━━━━━[0m[37m━━━━━━━━━━━[0m [1m0s[0m 561us/step - accuracy: 0.8758 - loss: 0.3115

[1m182/200[0m [32m━━━━━━━━━━━━━━━━━━[0m[37m━━[0m [1m0s[0m 559us/step - accuracy: 0.8760 - loss: 0.3143

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 896us/step - accuracy: 0.8758 - loss: 0.3177 - val_accuracy: 0.7763 - val_loss: 0.5860


Epoch 43/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 10ms/step - accuracy: 0.8750 - loss: 0.3047

[1m104/200[0m [32m━━━━━━━━━━[0m[37m━━━━━━━━━━[0m [1m0s[0m 486us/step - accuracy: 0.8764 - loss: 0.3094

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 685us/step - accuracy: 0.8773 - loss: 0.3148 - val_accuracy: 0.7756 - val_loss: 0.5899


Epoch 44/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 9ms/step - accuracy: 0.8750 - loss: 0.2989

[1m117/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 433us/step - accuracy: 0.8755 - loss: 0.3074

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 601us/step - accuracy: 0.8784 - loss: 0.3122 - val_accuracy: 0.7750 - val_loss: 0.5944


Epoch 45/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.2976

[1m125/200[0m [32m━━━━━━━━━━━━[0m[37m━━━━━━━━[0m [1m0s[0m 406us/step - accuracy: 0.8791 - loss: 0.3045

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 569us/step - accuracy: 0.8792 - loss: 0.3089 - val_accuracy: 0.7719 - val_loss: 0.5985


Epoch 46/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8750 - loss: 0.2970

[1m138/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 367us/step - accuracy: 0.8787 - loss: 0.3022

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 526us/step - accuracy: 0.8803 - loss: 0.3061 - val_accuracy: 0.7700 - val_loss: 0.6032


Epoch 47/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 6ms/step - accuracy: 0.8750 - loss: 0.2889

[1m130/200[0m [32m━━━━━━━━━━━━━[0m[37m━━━━━━━[0m [1m0s[0m 388us/step - accuracy: 0.8814 - loss: 0.2989

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 546us/step - accuracy: 0.8823 - loss: 0.3028 - val_accuracy: 0.7731 - val_loss: 0.6066


Epoch 48/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.2878

[1m117/200[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m0s[0m 433us/step - accuracy: 0.8858 - loss: 0.2956

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 574us/step - accuracy: 0.8844 - loss: 0.2999 - val_accuracy: 0.7719 - val_loss: 0.6109


Epoch 49/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.2833

[1m141/200[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 359us/step - accuracy: 0.8869 - loss: 0.2936

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 515us/step - accuracy: 0.8873 - loss: 0.2969 - val_accuracy: 0.7688 - val_loss: 0.6156


Epoch 50/50


[1m  1/200[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 7ms/step - accuracy: 0.8750 - loss: 0.2809

[1m141/200[0m [32m━━━━━━━━━━━━━━[0m[37m━━━━━━[0m [1m0s[0m 361us/step - accuracy: 0.8876 - loss: 0.2906

[1m200/200[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 623us/step - accuracy: 0.8873 - loss: 0.2940 - val_accuracy: 0.7681 - val_loss: 0.6202


**Understanding training:**
- **Epoch**: One pass through the entire training dataset
- **Batch size**: Number of samples processed before updating weights
- **Validation split**: Hold out some training data to monitor overfitting

## Part 6: Evaluate Model Performance

Let's see how well our model performs on the test set.

In [7]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"=== Test Set Performance ===")
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

# Make predictions
y_pred_proba = model.predict(X_test_scaled, verbose=0)
y_pred = (y_pred_proba > 0.5).astype(int).flatten()

# Classification report
print("\n=== Classification Report ===")
print(classification_report(y_test, y_pred))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("\n=== Confusion Matrix ===")
print("                Predicted")
print("              Negative  Positive")
print(f"Actual Negative    {cm[0,0]:4d}     {cm[0,1]:4d}")
print(f"        Positive    {cm[1,0]:4d}     {cm[1,1]:4d}")

=== Test Set Performance ===
Test Loss: 0.6540
Test Accuracy: 0.7410 (74.10%)

=== Classification Report ===
              precision    recall  f1-score   support

           0       0.75      0.81      0.78      1113
           1       0.73      0.65      0.69       887

    accuracy                           0.74      2000
   macro avg       0.74      0.73      0.73      2000
weighted avg       0.74      0.74      0.74      2000


=== Confusion Matrix ===
                Predicted
              Negative  Positive
Actual Negative     904      209
        Positive     309      578


## Part 7: Visualize Training History

Let's plot how the model learned over time.

In [8]:
# Extract training history
history_df = pd.DataFrame(history.history)
history_df['epoch'] = range(1, len(history_df) + 1)

print("=== Training History ===")
print(history_df.tail())

# Plot training curves
history_long = history_df.melt(
    id_vars='epoch',
    value_vars=['loss', 'val_loss', 'accuracy', 'val_accuracy'],
    var_name='metric',
    value_name='value'
)

# Separate loss and accuracy
loss_data = history_long[history_long['metric'].isin(['loss', 'val_loss'])]
acc_data = history_long[history_long['metric'].isin(['accuracy', 'val_accuracy'])]

# Loss plot
loss_chart = alt.Chart(loss_data).mark_line(point=True).encode(
    x=alt.X('epoch:Q', title='Epoch'),
    y=alt.Y('value:Q', title='Loss'),
    color='metric:N',
    strokeDash=alt.condition(alt.datum.metric == 'val_loss', alt.value([5, 5]), alt.value([0]))
).properties(
    width=400,
    height=250,
    title='Training and Validation Loss'
)

# Accuracy plot
acc_chart = alt.Chart(acc_data).mark_line(point=True).encode(
    x=alt.X('epoch:Q', title='Epoch'),
    y=alt.Y('value:Q', title='Accuracy', scale=alt.Scale(domain=[0, 1])),
    color='metric:N',
    strokeDash=alt.condition(alt.datum.metric == 'val_accuracy', alt.value([5, 5]), alt.value([0]))
).properties(
    width=400,
    height=250,
    title='Training and Validation Accuracy'
)

# Combine charts
alt.vconcat(loss_chart, acc_chart)

=== Training History ===
    accuracy      loss  val_accuracy  val_loss  epoch
45  0.880313  0.306063      0.770000  0.603204     46
46  0.882344  0.302800      0.773125  0.606598     47
47  0.884375  0.299872      0.771875  0.610895     48
48  0.887344  0.296949      0.768750  0.615599     49
49  0.887344  0.294002      0.768125  0.620240     50


**What to look for:**
- **Loss decreasing**: Model is learning
- **Validation loss tracking training loss**: No overfitting
- **Gap between train/val**: If validation loss increases while training decreases, you're overfitting

## Part 8: Compare with Traditional ML

Let's see how deep learning compares to traditional ML methods on this dataset.

In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb

# Logistic Regression
lr = LogisticRegression(max_iter=1000, random_state=42)
lr.fit(X_train_scaled, y_train)
lr_pred = lr.predict(X_test_scaled)
lr_acc = accuracy_score(y_test, lr_pred)

# Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
rf.fit(X_train_scaled, y_train)
rf_pred = rf.predict(X_test_scaled)
rf_acc = accuracy_score(y_test, rf_pred)

# XGBoost
xgb_clf = xgb.XGBClassifier(n_estimators=100, random_state=42, n_jobs=-1)
xgb_clf.fit(X_train_scaled, y_train)
xgb_pred = xgb_clf.predict(X_test_scaled)
xgb_acc = accuracy_score(y_test, xgb_pred)

# Compare
comparison = pd.DataFrame({
    'Model': ['Logistic Regression', 'Random Forest', 'XGBoost', 'Neural Network'],
    'Accuracy': [lr_acc, rf_acc, xgb_acc, test_accuracy]
})

print("=== Model Comparison ===")
print(comparison.to_string(index=False))

# Visualize
alt.Chart(comparison).mark_bar().encode(
    x=alt.X('Model:N', title='Model', sort='-y'),
    y=alt.Y('Accuracy:Q', title='Test Accuracy', scale=alt.Scale(domain=[0, 1]))
).properties(
    width=400,
    height=300
)

=== Model Comparison ===
              Model  Accuracy
Logistic Regression    0.6795
      Random Forest    0.7805
            XGBoost    0.7840
     Neural Network    0.7410


**Key insight**: On tabular data, traditional ML (especially XGBoost) often performs as well or better than deep learning, with less complexity and faster training!

## Part 9: Experiment with Architecture

Let's try different architectures to see how they affect performance.

In [10]:
# Build a deeper network
model_deep = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(n_features,)),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model_deep.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train deeper model
history_deep = model_deep.fit(
    X_train_scaled, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Evaluate
deep_test_loss, deep_test_acc = model_deep.evaluate(X_test_scaled, y_test, verbose=0)

# Build a wider network
model_wide = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(n_features,)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model_wide.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train wider model
history_wide = model_wide.fit(
    X_train_scaled, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Evaluate
wide_test_loss, wide_test_acc = model_wide.evaluate(X_test_scaled, y_test, verbose=0)

# Compare architectures
arch_comparison = pd.DataFrame({
    'Architecture': ['Original (64-32)', 'Deep (128-64-32-16)', 'Wide (256-128)'],
    'Test Accuracy': [test_accuracy, deep_test_acc, wide_test_acc],
    'Parameters': [model.count_params(), model_deep.count_params(), model_wide.count_params()]
})

print("=== Architecture Comparison ===")
print(arch_comparison.to_string(index=False))

=== Architecture Comparison ===
       Architecture  Test Accuracy  Parameters
   Original (64-32)         0.7410        3457
Deep (128-64-32-16)         0.6845       13569
     Wide (256-128)         0.7405       38401


**Insights:**
- More layers (depth) doesn't always mean better performance
- More neurons (width) increases model capacity but also risk of overfitting
- Find the right balance for your specific problem

## Part 10: Regularization Techniques

Let's add dropout and L2 regularization to prevent overfitting.

In [11]:
# Model with regularization
model_regularized = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(n_features,),
                       kernel_regularizer=keras.regularizers.l2(0.01)),
    keras.layers.Dropout(0.3),  # Drop 30% of neurons randomly
    keras.layers.Dense(32, activation='relu',
                       kernel_regularizer=keras.regularizers.l2(0.01)),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(1, activation='sigmoid')
])

model_regularized.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train with regularization
history_reg = model_regularized.fit(
    X_train_scaled, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Evaluate
reg_test_loss, reg_test_acc = model_regularized.evaluate(X_test_scaled, y_test, verbose=0)

print("=== Regularization Comparison ===")
print(f"Original model - Test Accuracy: {test_accuracy:.4f}")
print(f"Regularized model - Test Accuracy: {reg_test_acc:.4f}")

# Compare training curves
history_reg_df = pd.DataFrame(history_reg.history)
history_reg_df['epoch'] = range(1, len(history_reg_df) + 1)

# Plot validation loss comparison
val_loss_comparison = pd.DataFrame({
    'epoch': history_df['epoch'],
    'original': history_df['val_loss'],
    'regularized': history_reg_df['val_loss']
}).melt(
    id_vars='epoch',
    value_vars=['original', 'regularized'],
    var_name='model',
    value_name='val_loss'
)

alt.Chart(val_loss_comparison).mark_line(point=True).encode(
    x='epoch:Q',
    y='val_loss:Q',
    color='model:N'
).properties(
    width=400,
    height=250,
    title='Validation Loss: Original vs Regularized'
)

=== Regularization Comparison ===
Original model - Test Accuracy: 0.7410
Regularized model - Test Accuracy: 0.7795


**Regularization techniques:**
- **L2 regularization**: Penalizes large weights
- **Dropout**: Randomly disables neurons during training (prevents co-adaptation)
- Both help prevent overfitting

## Key Takeaways

1. **Sequential API**: Simple way to build linear stacks of layers
2. **Data scaling**: Always scale features for neural networks
3. **Compile step**: Specify optimizer, loss, and metrics
4. **Training**: Monitor both training and validation metrics
5. **Architecture matters**: Experiment with depth and width
6. **Regularization**: Use dropout and L2 to prevent overfitting
7. **Deep learning isn't always better**: For tabular data, traditional ML often wins
8. **Use deep learning when**: You have images, text, sequences, or massive datasets

## When to Use Deep Learning

- ✅ **Images**: Computer vision (CNNs)
- ✅ **Text**: Natural language processing (RNNs, Transformers)
- ✅ **Sequences**: Time series, audio (RNNs, LSTMs)
- ✅ **Massive datasets**: Millions of examples
- ❌ **Tabular data**: Often better with XGBoost
- ❌ **Small datasets**: Deep learning needs lots of data
- ❌ **Need interpretability**: Neural networks are black boxes

## Next Steps

- Explore different activation functions (tanh, LeakyReLU)
- Try different optimizers (RMSprop, SGD with momentum)
- Learn about callbacks (EarlyStopping, ModelCheckpoint)
- Experiment with different architectures
- Explore PyTorch for more flexibility
