# Demo 3: Deep Learning with TensorFlow/Keras

## Learning Objectives
- Build neural networks using TensorFlow/Keras
- Understand the Sequential API
- Train models and monitor progress
- Evaluate model performance
- Visualize training history
- Compare deep learning with traditional ML

## Setup

**Important:** This demo requires Python 3.13 or earlier. When creating your virtual environment with `uv`, use: `uv venv --python python3.13`

This ensures TensorFlow can be installed. If you're using Python 3.14, TensorFlow is not yet available.

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import altair as alt
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

TensorFlow version: 2.20.0
Keras version: 3.12.0


## Part 1: Load Real Classification Dataset

For deep learning, we'll use the Wine Quality dataset - a real-world dataset containing chemical properties of wines and their quality ratings. We'll convert this to a binary classification problem.

In [2]:
# Load Wine Quality dataset from scikit-learn
from sklearn.datasets import load_wine

# Fetch the dataset
wine_data = load_wine(as_frame=True)
df = wine_data.frame

# The dataset contains 13 features describing wine chemical properties:
# - Alcohol, Malic acid, Ash, Alkalinity of ash, Magnesium
# - Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins
# - Color intensity, Hue, OD280/OD315, Proline
# - Target: wine class (0, 1, or 2 - three types of wine)

# Convert to binary classification: class 0 vs others
df['target'] = (wine_data.target == 0).astype(int)

print("Dataset shape:", df.shape)
print("\nFeature names:", wine_data.feature_names)
print("\nFirst few rows:")
print(df.head())
print(f"\nTarget distribution:")
print(df['target'].value_counts())
print(f"\nClass balance: {df['target'].mean():.2%} positive class (wine type 0)")
print("\nSummary statistics:")
print(df.describe())

Dataset shape: (178, 14)

Feature names: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

First few rows:
   alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0    14.23        1.71  2.43               15.6      127.0           2.80   
1    13.20        1.78  2.14               11.2      100.0           2.65   
2    13.16        2.36  2.67               18.6      101.0           2.80   
3    14.37        1.95  2.50               16.8      113.0           3.85   
4    13.24        2.59  2.87               21.0      118.0           2.80   

   flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  \
0        3.06                  0.28             2.29             5.64  1.04   
1        2.76                  0.26             1.28             4.38  1.05   
2        3.24                  0

## Part 2: Data Preprocessing

Neural networks work best with scaled features. Let's prepare our data.

Neural networks are sensitive to the scale of input features. Unlike tree-based models (Random Forest, XGBoost) which can handle different scales, neural networks use gradient descent optimization that works much better when all features are on a similar scale.

**Why scaling matters:**
- Features with larger values can dominate the learning process
- Gradient descent converges faster with scaled features
- Activation functions work better when inputs are in a reasonable range
- Without scaling, some features might be ignored or cause training instability

In [3]:
# Split into features and target
# Use all wine chemical properties as features
feature_cols = wine_data.feature_names
X = df[feature_cols].values
y = df['target'].values

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Scale features (important for neural networks!)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set: {X_train_scaled.shape}")
print(f"Test set: {X_test_scaled.shape}")
print(f"\nFeature statistics (after scaling):")
print(f"Mean: {X_train_scaled.mean(axis=0)[:5]}")  # Should be ~0
print(f"Std: {X_train_scaled.std(axis=0)[:5]}")    # Should be ~1

Training set: (142, 13)
Test set: (36, 13)

Feature statistics (after scaling):
Mean: [ 4.35957999e-15  1.10475009e-15  2.02576610e-15  2.13287916e-15
 -3.12738880e-18]
Std: [1. 1. 1. 1. 1.]


**StandardScaler** transforms features to have mean=0 and standard deviation=1. Notice we fit the scaler on training data only, then transform both training and test data. This prevents data leakage - the test set statistics shouldn't influence the scaling.

## Part 3: Build Your First Neural Network

Let's create a simple neural network using Keras Sequential API.

In [4]:
# Build a simple neural network
n_features = X_train.shape[1]  # Number of input features (13 for wine dataset)

model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(n_features,), name='hidden1'),
    keras.layers.Dense(32, activation='relu', name='hidden2'),
    keras.layers.Dense(1, activation='sigmoid', name='output')  # Binary classification
])

# Display model architecture
print("=== Model Architecture ===")
model.summary()

# Visualize model (optional, requires graphviz)
# keras.utils.plot_model(model, show_shapes=True, show_layer_names=True)

=== Model Architecture ===


**Understanding the architecture:**
- **Input layer**: 20 features (automatically created)
- **Hidden layer 1**: 64 neurons with ReLU activation
- **Hidden layer 2**: 32 neurons with ReLU activation
- **Output layer**: 1 neuron with sigmoid activation (for binary classification)

## Part 4: Compile the Model

Before training, we need to specify the optimizer, loss function, and metrics.

Before training, we need to configure three key components:

1. **Optimizer**: How the model updates its weights during training (Adam is a popular choice)
2. **Loss function**: What the model tries to minimize (binary_crossentropy for classification)
3. **Metrics**: What we track during training (accuracy tells us how often predictions are correct)

In [5]:
# Compile the model
model.compile(
    optimizer='adam',  # Adaptive learning rate optimizer
    loss='binary_crossentropy',  # For binary classification
    metrics=['accuracy']  # Track accuracy during training
)

print("Model compiled successfully!")
print(f"Optimizer: {model.optimizer.get_config()['name']}")
print(f"Loss function: {model.loss}")
print(f"Metrics: {[m.name for m in model.metrics]}")

Model compiled successfully!
Optimizer: adam
Loss function: binary_crossentropy
Metrics: ['loss', 'compile_metrics']


**Understanding these choices:**
- **Adam optimizer**: Adapts the learning rate for each parameter, making training more efficient
- **Binary crossentropy**: Appropriate for binary classification (two classes)
- **Accuracy**: Simple metric - percentage of correct predictions. For imbalanced classes, you might also track precision/recall.

## Part 5: Train the Model

Now let's train the model and watch it learn!

In [6]:
# Train the model
history = model.fit(
    X_train_scaled, y_train,
    epochs=50,  # Number of training iterations
    batch_size=32,  # Number of samples per gradient update
    validation_split=0.2,  # Use 20% of training data for validation
    verbose=1  # Show progress
)

Epoch 1/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m1s[0m 356ms/step - accuracy: 0.3750 - loss: 0.8322

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - accuracy: 0.3274 - loss: 0.8059 - val_accuracy: 0.5862 - val_loss: 0.6793


Epoch 2/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.5000 - loss: 0.7149

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.5310 - loss: 0.6964 - val_accuracy: 0.6897 - val_loss: 0.6058


Epoch 3/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 6ms/step - accuracy: 0.6875 - loss: 0.6219

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7080 - loss: 0.6063 - val_accuracy: 0.7931 - val_loss: 0.5379


Epoch 4/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.7812 - loss: 0.5422

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8584 - loss: 0.5292 - val_accuracy: 0.9310 - val_loss: 0.4772


Epoch 5/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.8438 - loss: 0.4730

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9292 - loss: 0.4604 - val_accuracy: 1.0000 - val_loss: 0.4223


Epoch 6/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.9375 - loss: 0.4105

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9646 - loss: 0.3984 - val_accuracy: 1.0000 - val_loss: 0.3718


Epoch 7/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.9375 - loss: 0.3553

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9735 - loss: 0.3421 - val_accuracy: 1.0000 - val_loss: 0.3253


Epoch 8/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 6ms/step - accuracy: 0.9375 - loss: 0.3060

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9735 - loss: 0.2919 - val_accuracy: 1.0000 - val_loss: 0.2839


Epoch 9/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 6ms/step - accuracy: 0.9375 - loss: 0.2631

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9735 - loss: 0.2482 - val_accuracy: 1.0000 - val_loss: 0.2484


Epoch 10/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.9375 - loss: 0.2256

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9735 - loss: 0.2106 - val_accuracy: 1.0000 - val_loss: 0.2184


Epoch 11/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 6ms/step - accuracy: 0.9375 - loss: 0.1937

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9735 - loss: 0.1789 - val_accuracy: 1.0000 - val_loss: 0.1931


Epoch 12/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 7ms/step - accuracy: 0.9375 - loss: 0.1667

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9735 - loss: 0.1525 - val_accuracy: 1.0000 - val_loss: 0.1719


Epoch 13/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 0.9688 - loss: 0.1439

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9823 - loss: 0.1308 - val_accuracy: 1.0000 - val_loss: 0.1539


Epoch 14/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 6ms/step - accuracy: 1.0000 - loss: 0.1245

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.9912 - loss: 0.1128 - val_accuracy: 1.0000 - val_loss: 0.1387


Epoch 15/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.1079

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0979 - val_accuracy: 1.0000 - val_loss: 0.1256


Epoch 16/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0937

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0854 - val_accuracy: 1.0000 - val_loss: 0.1146


Epoch 17/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0816

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0749 - val_accuracy: 1.0000 - val_loss: 0.1051


Epoch 18/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0712

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0661 - val_accuracy: 1.0000 - val_loss: 0.0970


Epoch 19/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0622

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0586 - val_accuracy: 1.0000 - val_loss: 0.0900


Epoch 20/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0545

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0522 - val_accuracy: 1.0000 - val_loss: 0.0840


Epoch 21/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0479

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0466 - val_accuracy: 1.0000 - val_loss: 0.0788


Epoch 22/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0423

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0418 - val_accuracy: 1.0000 - val_loss: 0.0744


Epoch 23/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0375

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0377 - val_accuracy: 1.0000 - val_loss: 0.0705


Epoch 24/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0335

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0341 - val_accuracy: 1.0000 - val_loss: 0.0672


Epoch 25/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0300

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0309 - val_accuracy: 1.0000 - val_loss: 0.0642


Epoch 26/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0270

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0281 - val_accuracy: 1.0000 - val_loss: 0.0616


Epoch 27/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0244

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0257 - val_accuracy: 1.0000 - val_loss: 0.0591


Epoch 28/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0221

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0236 - val_accuracy: 1.0000 - val_loss: 0.0570


Epoch 29/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0202

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0217 - val_accuracy: 1.0000 - val_loss: 0.0549


Epoch 30/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0185

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0200 - val_accuracy: 1.0000 - val_loss: 0.0531


Epoch 31/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0170

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0185 - val_accuracy: 1.0000 - val_loss: 0.0514


Epoch 32/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 6ms/step - accuracy: 1.0000 - loss: 0.0156

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0172 - val_accuracy: 1.0000 - val_loss: 0.0499


Epoch 33/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0144

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0160 - val_accuracy: 1.0000 - val_loss: 0.0485


Epoch 34/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0134

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 1.0000 - loss: 0.0149 - val_accuracy: 1.0000 - val_loss: 0.0473


Epoch 35/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 11ms/step - accuracy: 1.0000 - loss: 0.0125

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 1.0000 - loss: 0.0139 - val_accuracy: 1.0000 - val_loss: 0.0462


Epoch 36/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0116

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0131 - val_accuracy: 1.0000 - val_loss: 0.0451


Epoch 37/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0109

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0123 - val_accuracy: 1.0000 - val_loss: 0.0442


Epoch 38/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0102

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0115 - val_accuracy: 1.0000 - val_loss: 0.0433


Epoch 39/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0096

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0109 - val_accuracy: 1.0000 - val_loss: 0.0425


Epoch 40/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0090

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0103 - val_accuracy: 0.9655 - val_loss: 0.0417


Epoch 41/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0085

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0097 - val_accuracy: 0.9655 - val_loss: 0.0410


Epoch 42/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0080

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0092 - val_accuracy: 0.9655 - val_loss: 0.0403


Epoch 43/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0076

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0087 - val_accuracy: 0.9655 - val_loss: 0.0397


Epoch 44/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0072

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0083 - val_accuracy: 0.9655 - val_loss: 0.0391


Epoch 45/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0068

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0078 - val_accuracy: 0.9655 - val_loss: 0.0385


Epoch 46/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 5ms/step - accuracy: 1.0000 - loss: 0.0065

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0075 - val_accuracy: 0.9655 - val_loss: 0.0380


Epoch 47/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0061

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0071 - val_accuracy: 0.9655 - val_loss: 0.0376


Epoch 48/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0058

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0068 - val_accuracy: 0.9655 - val_loss: 0.0371


Epoch 49/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0056

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0065 - val_accuracy: 0.9655 - val_loss: 0.0367


Epoch 50/50


[1m1/4[0m [32m━━━━━[0m[37m━━━━━━━━━━━━━━━[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0053

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 1.0000 - loss: 0.0062 - val_accuracy: 0.9655 - val_loss: 0.0363


**Understanding training:**
- **Epoch**: One pass through the entire training dataset
- **Batch size**: Number of samples processed before updating weights
- **Validation split**: Hold out some training data to monitor overfitting

## Part 6: Evaluate Model Performance

Let's see how well our model performs on the test set.

In [7]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"=== Test Set Performance ===")
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")

# Make predictions
y_pred_proba = model.predict(X_test_scaled, verbose=0)
y_pred = (y_pred_proba > 0.5).astype(int).flatten()

# Classification report
print("\n=== Classification Report ===")
print(classification_report(y_test, y_pred))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("\n=== Confusion Matrix ===")
print("                Predicted")
print("              Negative  Positive")
print(f"Actual Negative    {cm[0,0]:4d}     {cm[0,1]:4d}")
print(f"        Positive    {cm[1,0]:4d}     {cm[1,1]:4d}")

=== Test Set Performance ===
Test Loss: 0.0649
Test Accuracy: 0.9722 (97.22%)



=== Classification Report ===


              precision    recall  f1-score   support

           0       1.00      0.96      0.98        24
           1       0.92      1.00      0.96        12

    accuracy                           0.97        36
   macro avg       0.96      0.98      0.97        36
weighted avg       0.97      0.97      0.97        36


=== Confusion Matrix ===
                Predicted
              Negative  Positive
Actual Negative      23        1
        Positive       0       12


## Part 7: Visualize Training History

Let's plot how the model learned over time.

In [8]:
# Extract training history
history_df = pd.DataFrame(history.history)
history_df['epoch'] = range(1, len(history_df) + 1)

print("=== Training History ===")
print(history_df.tail())

# Plot training curves
history_long = history_df.melt(
    id_vars='epoch',
    value_vars=['loss', 'val_loss', 'accuracy', 'val_accuracy'],
    var_name='metric',
    value_name='value'
)

# Separate loss and accuracy
loss_data = history_long[history_long['metric'].isin(['loss', 'val_loss'])]
acc_data = history_long[history_long['metric'].isin(['accuracy', 'val_accuracy'])]

# Loss plot
loss_chart = alt.Chart(loss_data).mark_line(point=True).encode(
    x=alt.X('epoch:Q', title='Epoch'),
    y=alt.Y('value:Q', title='Loss'),
    color='metric:N',
    strokeDash=alt.condition(alt.datum.metric == 'val_loss', alt.value([5, 5]), alt.value([0]))
).properties(
    width=400,
    height=250,
    title='Training and Validation Loss'
)

# Accuracy plot
acc_chart = alt.Chart(acc_data).mark_line(point=True).encode(
    x=alt.X('epoch:Q', title='Epoch'),
    y=alt.Y('value:Q', title='Accuracy', scale=alt.Scale(domain=[0, 1])),
    color='metric:N',
    strokeDash=alt.condition(alt.datum.metric == 'val_accuracy', alt.value([5, 5]), alt.value([0]))
).properties(
    width=400,
    height=250,
    title='Training and Validation Accuracy'
)

# Combine charts
alt.vconcat(loss_chart, acc_chart)

=== Training History ===
    accuracy      loss  val_accuracy  val_loss  epoch
45       1.0  0.007458      0.965517  0.038047     46
46       1.0  0.007103      0.965517  0.037583     47
47       1.0  0.006771      0.965517  0.037135     48
48       1.0  0.006460      0.965517  0.036713     49
49       1.0  0.006169      0.965517  0.036322     50


**What to look for:**
- **Loss decreasing**: Model is learning
- **Validation loss tracking training loss**: No overfitting
- **Gap between train/val**: If validation loss increases while training decreases, you're overfitting

## Part 8: Compare with Traditional ML

Let's see how deep learning compares to traditional ML methods on this dataset.

In [9]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb

# Logistic Regression
lr = LogisticRegression(max_iter=1000, random_state=42)
lr.fit(X_train_scaled, y_train)
lr_pred = lr.predict(X_test_scaled)
lr_acc = accuracy_score(y_test, lr_pred)

# Random Forest
rf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
rf.fit(X_train_scaled, y_train)
rf_pred = rf.predict(X_test_scaled)
rf_acc = accuracy_score(y_test, rf_pred)

# XGBoost
xgb_clf = xgb.XGBClassifier(n_estimators=100, random_state=42, n_jobs=-1)
xgb_clf.fit(X_train_scaled, y_train)
xgb_pred = xgb_clf.predict(X_test_scaled)
xgb_acc = accuracy_score(y_test, xgb_pred)

# Compare
comparison = pd.DataFrame({
    'Model': ['Logistic Regression', 'Random Forest', 'XGBoost', 'Neural Network'],
    'Accuracy': [lr_acc, rf_acc, xgb_acc, test_accuracy]
})

print("=== Model Comparison ===")
print(comparison.to_string(index=False))

# Visualize
alt.Chart(comparison).mark_bar().encode(
    x=alt.X('Model:N', title='Model', sort='-y'),
    y=alt.Y('Accuracy:Q', title='Test Accuracy', scale=alt.Scale(domain=[0, 1]))
).properties(
    width=400,
    height=300
)

=== Model Comparison ===
              Model  Accuracy
Logistic Regression  0.972222
      Random Forest  0.944444
            XGBoost  0.944444
     Neural Network  0.972222


**Key insight**: On tabular data, traditional ML (especially XGBoost) often performs as well or better than deep learning, with less complexity and faster training!

## Part 9: Experiment with Architecture

Let's try different architectures to see how they affect performance.

In [10]:
# Build a deeper network
model_deep = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(n_features,)),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(16, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model_deep.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train deeper model
history_deep = model_deep.fit(
    X_train_scaled, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Evaluate
deep_test_loss, deep_test_acc = model_deep.evaluate(X_test_scaled, y_test, verbose=0)

# Build a wider network
model_wide = keras.Sequential([
    keras.layers.Dense(256, activation='relu', input_shape=(n_features,)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(1, activation='sigmoid')
])

model_wide.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train wider model
history_wide = model_wide.fit(
    X_train_scaled, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Evaluate
wide_test_loss, wide_test_acc = model_wide.evaluate(X_test_scaled, y_test, verbose=0)

# Compare architectures
arch_comparison = pd.DataFrame({
    'Architecture': ['Original (64-32)', 'Deep (128-64-32-16)', 'Wide (256-128)'],
    'Test Accuracy': [test_accuracy, deep_test_acc, wide_test_acc],
    'Parameters': [model.count_params(), model_deep.count_params(), model_wide.count_params()]
})

print("=== Architecture Comparison ===")
print(arch_comparison.to_string(index=False))

=== Architecture Comparison ===
       Architecture  Test Accuracy  Parameters
   Original (64-32)       0.972222        3009
Deep (128-64-32-16)       0.944444       12673
     Wide (256-128)       0.972222       36609


**Insights:**
- More layers (depth) doesn't always mean better performance
- More neurons (width) increases model capacity but also risk of overfitting
- Find the right balance for your specific problem

## Part 10: Regularization Techniques

Let's add dropout and L2 regularization to prevent overfitting.

In [11]:
# Model with regularization
model_regularized = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(n_features,),
                       kernel_regularizer=keras.regularizers.l2(0.01)),
    keras.layers.Dropout(0.3),  # Drop 30% of neurons randomly
    keras.layers.Dense(32, activation='relu',
                       kernel_regularizer=keras.regularizers.l2(0.01)),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(1, activation='sigmoid')
])

model_regularized.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Train with regularization
history_reg = model_regularized.fit(
    X_train_scaled, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Evaluate
reg_test_loss, reg_test_acc = model_regularized.evaluate(X_test_scaled, y_test, verbose=0)

print("=== Regularization Comparison ===")
print(f"Original model - Test Accuracy: {test_accuracy:.4f}")
print(f"Regularized model - Test Accuracy: {reg_test_acc:.4f}")

# Compare training curves
history_reg_df = pd.DataFrame(history_reg.history)
history_reg_df['epoch'] = range(1, len(history_reg_df) + 1)

# Plot validation loss comparison
val_loss_comparison = pd.DataFrame({
    'epoch': history_df['epoch'],
    'original': history_df['val_loss'],
    'regularized': history_reg_df['val_loss']
}).melt(
    id_vars='epoch',
    value_vars=['original', 'regularized'],
    var_name='model',
    value_name='val_loss'
)

alt.Chart(val_loss_comparison).mark_line(point=True).encode(
    x='epoch:Q',
    y='val_loss:Q',
    color='model:N'
).properties(
    width=400,
    height=250,
    title='Validation Loss: Original vs Regularized'
)

=== Regularization Comparison ===
Original model - Test Accuracy: 0.9722
Regularized model - Test Accuracy: 0.9722


**Regularization techniques:**
- **L2 regularization**: Penalizes large weights
- **Dropout**: Randomly disables neurons during training (prevents co-adaptation)
- Both help prevent overfitting

## Key Takeaways

1. **Sequential API**: Simple way to build linear stacks of layers
2. **Data scaling**: Always scale features for neural networks
3. **Compile step**: Specify optimizer, loss, and metrics
4. **Training**: Monitor both training and validation metrics
5. **Architecture matters**: Experiment with depth and width
6. **Regularization**: Use dropout and L2 to prevent overfitting
7. **Deep learning isn't always better**: For tabular data, traditional ML often wins
8. **Use deep learning when**: You have images, text, sequences, or massive datasets

## When to Use Deep Learning

- ✅ **Images**: Computer vision (CNNs)
- ✅ **Text**: Natural language processing (RNNs, Transformers)
- ✅ **Sequences**: Time series, audio (RNNs, LSTMs)
- ✅ **Massive datasets**: Millions of examples
- ❌ **Tabular data**: Often better with XGBoost
- ❌ **Small datasets**: Deep learning needs lots of data
- ❌ **Need interpretability**: Neural networks are black boxes

## Next Steps

- Explore different activation functions (tanh, LeakyReLU)
- Try different optimizers (RMSprop, SGD with momentum)
- Learn about callbacks (EarlyStopping, ModelCheckpoint)
- Experiment with different architectures
- Explore PyTorch for more flexibility
