# Module 9: Neural Networks and Deep Learning Introduction

---

Neural networks are the foundation of deep learning. This module builds your understanding from the ground up — starting with a single perceptron, progressing to multi-layer networks, and concluding with practical implementations using Keras/TensorFlow.

---

## Table of Contents

1. [The Perceptron](#1.-The-Perceptron)
2. [Activation Functions](#2.-Activation-Functions)
3. [Multi-Layer Perceptron (MLP) from Scratch](#3.-Multi-Layer-Perceptron-from-Scratch)
4. [Neural Networks with Scikit-learn](#4.-Neural-Networks-with-Scikit-learn)
5. [Introduction to Keras/TensorFlow](#5.-Introduction-to-Keras/TensorFlow)
6. [Convolutional Neural Networks — Overview](#6.-Convolutional-Neural-Networks)
7. [Exercises](#7.-Exercises)
8. [Summary and Further Reading](#8.-Summary-and-Further-Reading)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_breast_cancer, load_digits, make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report

plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

---

## 1. The Perceptron

A perceptron is the simplest neural network — a single neuron. It takes weighted inputs, sums them, adds a bias, and passes the result through an activation function.

$$z = w_1 x_1 + w_2 x_2 + \ldots + w_n x_n + b$$
$$\hat{y} = \text{activation}(z)$$

In [None]:
# Implement a perceptron from scratch
class Perceptron:
    """A simple perceptron classifier."""
    
    def __init__(self, learning_rate=0.01, n_iterations=100):
        self.lr = learning_rate
        self.n_iter = n_iterations
        self.weights = None
        self.bias = None
        self.errors = []
    
    def fit(self, X, y):
        n_features = X.shape[1]
        self.weights = np.zeros(n_features)
        self.bias = 0.0
        
        for _ in range(self.n_iter):
            errors = 0
            for xi, yi in zip(X, y):
                prediction = self.predict_single(xi)
                update = self.lr * (yi - prediction)
                self.weights += update * xi
                self.bias += update
                errors += int(update != 0.0)
            self.errors.append(errors)
        return self
    
    def predict_single(self, x):
        return 1 if np.dot(x, self.weights) + self.bias >= 0 else 0
    
    def predict(self, X):
        return np.array([self.predict_single(xi) for xi in X])

# Test on linearly separable data
from sklearn.datasets import make_classification
X_lin, y_lin = make_classification(n_samples=200, n_features=2, n_informative=2,
                                    n_redundant=0, n_clusters_per_class=1, random_state=42)

ppn = Perceptron(learning_rate=0.1, n_iterations=50)
ppn.fit(X_lin, y_lin)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Convergence
axes[0].plot(ppn.errors, 'o-', color='#FF5722', linewidth=2, markersize=4)
axes[0].set_xlabel('Epoch', fontsize=13)
axes[0].set_ylabel('Number of Misclassifications', fontsize=13)
axes[0].set_title('Perceptron Convergence', fontsize=14, fontweight='bold')

# Decision boundary
from matplotlib.colors import ListedColormap
h = 0.02
x_min, x_max = X_lin[:, 0].min() - 0.5, X_lin[:, 0].max() + 0.5
y_min, y_max = X_lin[:, 1].min() - 0.5, X_lin[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = ppn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
axes[1].contourf(xx, yy, Z, alpha=0.3, cmap=ListedColormap(['#BBDEFB', '#FFCCBC']))
axes[1].scatter(X_lin[:, 0], X_lin[:, 1], c=y_lin, cmap=ListedColormap(['#1565C0', '#E64A19']),
                s=30, edgecolors='white', linewidth=0.5)
axes[1].set_title('Perceptron Decision Boundary', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print(f"Perceptron accuracy: {accuracy_score(y_lin, ppn.predict(X_lin)):.4f}")
print("The perceptron converges when data is linearly separable.")

---

## 2. Activation Functions

Activation functions introduce non-linearity, allowing neural networks to learn complex patterns.

In [None]:
# Common activation functions
z = np.linspace(-5, 5, 200)

activations = {
    'Sigmoid': (1 / (1 + np.exp(-z)), 'Output range: (0, 1). Used in output layer for binary classification.'),
    'Tanh': (np.tanh(z), 'Output range: (-1, 1). Zero-centered, often better than sigmoid for hidden layers.'),
    'ReLU': (np.maximum(0, z), 'Output range: [0, inf). Most popular for hidden layers. Fast, avoids vanishing gradient.'),
    'Leaky ReLU': (np.where(z > 0, z, 0.01 * z), 'Fixes the "dying ReLU" problem by allowing small gradients for negative inputs.'),
}

fig, axes = plt.subplots(1, 4, figsize=(20, 4))
colors = ['#2196F3', '#FF5722', '#4CAF50', '#9C27B0']

for idx, (name, (values, desc)) in enumerate(activations.items()):
    ax = axes[idx]
    ax.plot(z, values, linewidth=2.5, color=colors[idx])
    ax.axhline(y=0, color='gray', linewidth=0.5, linestyle='--')
    ax.axvline(x=0, color='gray', linewidth=0.5, linestyle='--')
    ax.set_title(name, fontsize=13, fontweight='bold')
    ax.set_xlabel('z')
    ax.set_ylabel('f(z)')

plt.suptitle('Common Activation Functions', fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

for name, (_, desc) in activations.items():
    print(f"{name}: {desc}")

---

## 3. Multi-Layer Perceptron (MLP) from Scratch

Here we implement a simple 2-layer neural network from scratch to understand forward propagation, loss computation, and backpropagation.

In [None]:
# Simple 2-layer neural network from scratch
class SimpleNeuralNetwork:
    """A 2-layer neural network: input -> hidden (sigmoid) -> output (sigmoid)."""
    
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.1):
        self.lr = learning_rate
        # Initialize weights with small random values
        self.W1 = np.random.randn(input_size, hidden_size) * 0.5
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.5
        self.b2 = np.zeros((1, output_size))
        self.losses = []
    
    def sigmoid(self, z):
        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))
    
    def sigmoid_derivative(self, a):
        return a * (1 - a)
    
    def forward(self, X):
        self.z1 = X @ self.W1 + self.b1
        self.a1 = self.sigmoid(self.z1)  # hidden layer output
        self.z2 = self.a1 @ self.W2 + self.b2
        self.a2 = self.sigmoid(self.z2)  # final output
        return self.a2
    
    def backward(self, X, y):
        m = X.shape[0]
        
        # Output layer error
        dz2 = self.a2 - y.reshape(-1, 1)  # derivative of BCE loss * sigmoid
        dW2 = (self.a1.T @ dz2) / m
        db2 = np.sum(dz2, axis=0, keepdims=True) / m
        
        # Hidden layer error
        dz1 = (dz2 @ self.W2.T) * self.sigmoid_derivative(self.a1)
        dW1 = (X.T @ dz1) / m
        db1 = np.sum(dz1, axis=0, keepdims=True) / m
        
        # Update weights
        self.W2 -= self.lr * dW2
        self.b2 -= self.lr * db2
        self.W1 -= self.lr * dW1
        self.b1 -= self.lr * db1
    
    def compute_loss(self, y):
        m = y.shape[0]
        y = y.reshape(-1, 1)
        return -np.mean(y * np.log(self.a2 + 1e-8) + (1 - y) * np.log(1 - self.a2 + 1e-8))
    
    def fit(self, X, y, epochs=500):
        for epoch in range(epochs):
            self.forward(X)
            loss = self.compute_loss(y)
            self.losses.append(loss)
            self.backward(X, y)
        return self
    
    def predict(self, X):
        return (self.forward(X) >= 0.5).astype(int).ravel()

# Train on Moons data
X_moons, y_moons = make_moons(n_samples=300, noise=0.2, random_state=42)
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(X_moons, y_moons, test_size=0.2, random_state=42)

nn = SimpleNeuralNetwork(input_size=2, hidden_size=16, output_size=1, learning_rate=1.0)
nn.fit(X_train_m, y_train_m, epochs=1000)

print(f"Custom Neural Network — Moons Dataset")
print(f"  Train accuracy: {accuracy_score(y_train_m, nn.predict(X_train_m)):.4f}")
print(f"  Test accuracy:  {accuracy_score(y_test_m, nn.predict(X_test_m)):.4f}")

In [None]:
# Visualize loss curve and decision boundary
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss curve
axes[0].plot(nn.losses, linewidth=2, color='#FF5722')
axes[0].set_xlabel('Epoch', fontsize=13)
axes[0].set_ylabel('Binary Cross-Entropy Loss', fontsize=13)
axes[0].set_title('Training Loss Convergence', fontsize=14, fontweight='bold')

# Decision boundary
h = 0.02
x_min, x_max = X_moons[:, 0].min() - 0.5, X_moons[:, 0].max() + 0.5
y_min, y_max = X_moons[:, 1].min() - 0.5, X_moons[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = nn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
axes[1].contourf(xx, yy, Z, alpha=0.3, cmap=ListedColormap(['#BBDEFB', '#FFCCBC']))
axes[1].scatter(X_moons[:, 0], X_moons[:, 1], c=y_moons,
                cmap=ListedColormap(['#1565C0', '#E64A19']), s=30, edgecolors='white', linewidth=0.5)
axes[1].set_title('Neural Network Decision Boundary', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("Unlike the perceptron, the MLP can learn non-linear decision boundaries.")

---

## 4. Neural Networks with Scikit-learn

Scikit-learn provides `MLPClassifier` and `MLPRegressor` for quick neural network experiments without the need for deep learning frameworks.

In [None]:
from sklearn.neural_network import MLPClassifier

# Compare different architectures on Breast Cancer dataset
cancer = load_breast_cancer()
X_train_c, X_test_c, y_train_c, y_test_c = train_test_split(
    cancer.data, cancer.target, test_size=0.2, random_state=42, stratify=cancer.target
)
scaler = StandardScaler()
X_train_cs = scaler.fit_transform(X_train_c)
X_test_cs = scaler.transform(X_test_c)

architectures = {
    '1 layer (50)': (50,),
    '2 layers (50, 25)': (50, 25),
    '3 layers (100, 50, 25)': (100, 50, 25),
}

print("Scikit-learn MLPClassifier — Architecture Comparison")
print("=" * 60)

mlp_results = []
for name, arch in architectures.items():
    mlp = MLPClassifier(hidden_layer_sizes=arch, max_iter=1000,
                        random_state=42, early_stopping=True, validation_fraction=0.1)
    mlp.fit(X_train_cs, y_train_c)
    train_acc = mlp.score(X_train_cs, y_train_c)
    test_acc = mlp.score(X_test_cs, y_test_c)
    mlp_results.append({'Architecture': name, 'Train': train_acc, 'Test': test_acc,
                        'Iterations': mlp.n_iter_})
    print(f"  {name:>25s}: Train={train_acc:.4f}  Test={test_acc:.4f}  Epochs={mlp.n_iter_}")

# Plot loss curve for the best architecture
mlp_best = MLPClassifier(hidden_layer_sizes=(100, 50, 25), max_iter=1000,
                          random_state=42)
mlp_best.fit(X_train_cs, y_train_c)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(mlp_best.loss_curve_, linewidth=2, color='#2196F3')
ax.set_xlabel('Iteration', fontsize=13)
ax.set_ylabel('Loss', fontsize=13)
ax.set_title('MLP Training Loss Curve', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

---

## 5. Introduction to Keras/TensorFlow

For production-grade deep learning, we use frameworks like **TensorFlow** with the **Keras** API. Keras provides a clean, intuitive interface for building neural networks.

Note: TensorFlow must be installed (`pip install tensorflow`). If not available, this section will demonstrate the concepts with code that can be run once installed.

In [None]:
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
    
    print(f"TensorFlow version: {tf.__version__}")
    
    # Build a neural network with Keras Sequential API
    model = keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=(X_train_cs.shape[1],)),
        layers.Dropout(0.3),
        layers.Dense(32, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(1, activation='sigmoid')
    ])
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    model.summary()
    
    # Train the model
    history = model.fit(
        X_train_cs, y_train_c,
        epochs=100,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    # Evaluate
    test_loss, test_acc = model.evaluate(X_test_cs, y_test_c, verbose=0)
    print(f"\nKeras Model Test Accuracy: {test_acc:.4f}")
    
    # Plot training history
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    axes[0].plot(history.history['loss'], label='Training Loss', color='#2196F3', linewidth=2)
    axes[0].plot(history.history['val_loss'], label='Validation Loss', color='#FF5722', linewidth=2)
    axes[0].set_xlabel('Epoch', fontsize=13)
    axes[0].set_ylabel('Loss', fontsize=13)
    axes[0].set_title('Loss Over Training', fontsize=14, fontweight='bold')
    axes[0].legend(fontsize=11)
    
    axes[1].plot(history.history['accuracy'], label='Training Accuracy', color='#2196F3', linewidth=2)
    axes[1].plot(history.history['val_accuracy'], label='Validation Accuracy', color='#FF5722', linewidth=2)
    axes[1].set_xlabel('Epoch', fontsize=13)
    axes[1].set_ylabel('Accuracy', fontsize=13)
    axes[1].set_title('Accuracy Over Training', fontsize=14, fontweight='bold')
    axes[1].legend(fontsize=11)
    
    plt.tight_layout()
    plt.show()

except ImportError:
    print("TensorFlow is not installed. To install, run: pip install tensorflow")
    print("\nBelow is the code that would be used (for reference):")
    print("""
    import tensorflow as tf
    from tensorflow.keras import layers
    
    model = tf.keras.Sequential([
        layers.Dense(64, activation='relu', input_shape=(n_features,)),
        layers.Dropout(0.3),
        layers.Dense(32, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(1, activation='sigmoid')
    ])
    
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
    """)

---

## 6. Convolutional Neural Networks — Overview

**CNNs** are specialized neural networks designed for image data. They use convolutional layers that apply learned filters to detect features like edges, textures, and patterns.

Key concepts:
- **Convolutional Layer**: Applies filters that slide across the input to detect local features
- **Pooling Layer**: Reduces spatial dimensions (e.g., max pooling)
- **Flatten**: Converts 2D feature maps into a 1D vector for the dense layers

In [None]:
# Visualize the digits dataset as images
digits = load_digits()

fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='gray_r')
    ax.set_title(f'Label: {digits.target[i]}', fontsize=11)
    ax.axis('off')

plt.suptitle('Handwritten Digits (8x8 pixels)', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print(f"Each digit image is {digits.images[0].shape} pixels.")
print(f"When flattened, each image becomes a vector of {digits.data.shape[1]} features.")

In [None]:
# CNN with Keras on Digits (if TensorFlow is available)
try:
    import tensorflow as tf
    from tensorflow.keras import layers
    
    # Prepare data (reshape for CNN: samples, height, width, channels)
    X_digits = digits.images.reshape(-1, 8, 8, 1) / 16.0  # normalize
    y_digits = digits.target
    
    X_train_d, X_test_d, y_train_d, y_test_d = train_test_split(
        X_digits, y_digits, test_size=0.2, random_state=42, stratify=y_digits
    )
    
    cnn = tf.keras.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(8, 8, 1), padding='same'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dropout(0.3),
        layers.Dense(10, activation='softmax')  # 10 classes
    ])
    
    cnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    
    history_cnn = cnn.fit(X_train_d, y_train_d, epochs=30, batch_size=32,
                          validation_split=0.15, verbose=0)
    
    test_loss_cnn, test_acc_cnn = cnn.evaluate(X_test_d, y_test_d, verbose=0)
    print(f"CNN Test Accuracy on Digits: {test_acc_cnn:.4f}")
    
    # Plot training history
    fig, ax = plt.subplots(figsize=(10, 5))
    ax.plot(history_cnn.history['accuracy'], label='Train', color='#2196F3', linewidth=2)
    ax.plot(history_cnn.history['val_accuracy'], label='Validation', color='#FF5722', linewidth=2)
    ax.set_xlabel('Epoch', fontsize=13)
    ax.set_ylabel('Accuracy', fontsize=13)
    ax.set_title('CNN Training History (Digits)', fontsize=14, fontweight='bold')
    ax.legend(fontsize=11)
    plt.tight_layout()
    plt.show()

except ImportError:
    print("TensorFlow is not installed. The CNN example requires TensorFlow.")
    print("Install with: pip install tensorflow")

---

## 7. Exercises

### Exercise 1: Hidden Layer Size Experiment

In [None]:
# Exercise 1: Using MLPClassifier on the Breast Cancer dataset:
# 1. Try hidden_layer_sizes: (10,), (50,), (100,), (50, 25), (100, 50, 25)
# 2. Record training and test accuracy for each
# 3. Plot a bar chart comparing architectures
# 4. Does adding more layers always help? Why or why not?

# Your code here:


### Exercise 2: XOR Problem

In [None]:
# Exercise 2: The XOR problem cannot be solved by a single perceptron.
# Show that a neural network can solve it:
# 
# XOR truth table:
# [0,0] -> 0, [0,1] -> 1, [1,0] -> 1, [1,1] -> 0
#
# 1. Create the XOR dataset
# 2. Try to solve it with a perceptron (show it fails)
# 3. Solve it with an MLPClassifier with 1 hidden layer
# 4. Plot the decision boundary

X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_xor = np.array([0, 1, 1, 0])

# Your code here:


### Exercise 3: Digit Classification

In [None]:
# Exercise 3: Using the Digits dataset:
# 1. Scale the data
# 2. Train an MLPClassifier with architecture (128, 64, 32)
# 3. Print accuracy and classification report
# 4. Plot some misclassified digits with their true and predicted labels

# Your code here:


---

## 8. Summary and Further Reading

### What We Covered

- **Perceptron**: The simplest neuron — linear classifier with step activation.
- **Activation Functions**: Sigmoid, Tanh, ReLU, Leaky ReLU — what they do and when to use them.
- **MLP from Scratch**: Forward propagation, backpropagation, and gradient descent.
- **Scikit-learn MLPClassifier**: Quick neural network prototyping.
- **Keras/TensorFlow**: Building, training, and evaluating deep learning models.
- **CNNs**: Convolutional layers for image data.

### Recommended Reading

- [Keras Documentation](https://keras.io/)
- [TensorFlow Tutorials](https://www.tensorflow.org/tutorials)
- Chapter 10-14 of Aurélien Géron, *Hands-On Machine Learning* (Neural Networks)
- Michael Nielsen, *Neural Networks and Deep Learning* (free online book)

### Next Module

In **Module 10: Advanced Topics and Best Practices**, we will cover practical deployment considerations, handling imbalanced data, and working with real-world ML pipelines.

---