# Module 6: Neural Networks from Scratch (Interactive)
## Brain-Inspired Intuition to Backpropagation

**Goal:** Build intuition for neural networks from first principles using visual, interactive demos.

This notebook is concept-first and intentionally light on math. You will control inputs with sliders and watch how predictions and learning change in real time.

### Learning objectives
1. Explain how neural network ideas were inspired by biological neurons.
2. Understand perceptron inputs, weights, bias, threshold, and output.
3. Compare common activation functions and when they help.
4. Build intuition for loss functions as "what the model is trying to minimize."
5. See backpropagation as a practical error-correction mechanism.
6. Understand local minima vs global minima using a visual loss landscape.

## Section 0: Clinical framing (Why this matters)
Imagine a triage model that estimates patient risk from vitals and labs.
- Inputs: heart rate, blood pressure, oxygen saturation, age, etc.
- Output: probability of high-risk deterioration.

A neural network is one way to map those inputs to a clinically useful prediction.
Before using deep models in real medicine, you should understand these building blocks.

## Helper Functions
Run this once at the start. It handles local/Colab setup and dependency checks.

In [1]:
import os
import re
import sys
import subprocess
from pathlib import Path
from importlib import import_module
from importlib.metadata import PackageNotFoundError, version


def setup_repo_for_colab(
    repo_url='https://github.com/aaekay/Medical-AI-101.git',
    repo_dir='/content/Medical-AI-101',
    notebook_dir='chapters',
):
    if 'google.colab' not in sys.modules:
        print(f'Local runtime detected. Working directory: {Path.cwd()}')
        return

    repo_path = Path(repo_dir)
    if not repo_path.exists():
        print('Cloning Medical-AI-101 into /content ...')
        subprocess.check_call(['git', 'clone', repo_url, str(repo_path)])

    target = repo_path / notebook_dir
    os.chdir(target)
    print(f'Colab ready. Working directory: {Path.cwd()}')


def _parse_version(version_str):
    parts = [int(p) for p in re.findall(r'\d+', version_str)]
    return tuple((parts + [0, 0, 0])[:3])


def ensure_dependency(package_name, import_name=None, min_version=None):
    import_name = import_name or package_name

    needs_install = False
    install_reason = ''

    try:
        import_module(import_name)
    except ImportError:
        needs_install = True
        install_reason = 'not installed'

    installed_version = None
    if not needs_install:
        try:
            installed_version = version(package_name)
        except PackageNotFoundError:
            needs_install = True
            install_reason = 'distribution metadata missing'

    if not needs_install and min_version and installed_version:
        if _parse_version(installed_version) < _parse_version(min_version):
            needs_install = True
            install_reason = f'version {installed_version} < required {min_version}'

    requirement = f'{package_name}>={min_version}' if min_version else package_name

    if needs_install:
        print(f'Installing {requirement} ({install_reason}) ...')
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', requirement])
        import_module(import_name)
        installed_version = version(package_name)

    print(f'{package_name} ready: {installed_version}')


def configure_plotly_renderer():
    import plotly.io as pio

    renderers_obj = pio.renderers
    available = set()

    names = getattr(renderers_obj, 'names', None)
    if names:
        available = set(names)
    elif hasattr(renderers_obj, 'keys'):
        try:
            available = set(renderers_obj.keys())
        except Exception:
            available = set()

    if not available:
        try:
            available = set(list(renderers_obj))
        except Exception:
            available = set()

    candidates = []
    if 'google.colab' in sys.modules:
        candidates.append('colab')
    candidates.extend(['plotly_mimetype', 'notebook_connected', 'notebook', 'browser'])

    renderer = next((name for name in candidates if name in available), None)
    if renderer is None:
        renderer = renderers_obj.default or 'browser'

    try:
        renderers_obj.default = renderer
        print(f'Plotly renderer set to: {renderer}')
    except Exception:
        fallback = 'browser' if 'browser' in available else renderers_obj.default
        renderers_obj.default = fallback
        print(f'Plotly renderer set to fallback: {fallback}')


setup_repo_for_colab()
ensure_dependency('numpy')
ensure_dependency('matplotlib')
ensure_dependency('ipywidgets')
ensure_dependency('plotly')
ensure_dependency('nbformat', min_version='4.2.0')
ensure_dependency('ipython', import_name='IPython')
configure_plotly_renderer()

Local runtime detected. Working directory: /Users/aaekay/Documents/projects/Medical-AI-101/chapters
numpy ready: 2.3.0
matplotlib ready: 3.10.3
ipywidgets ready: 8.1.7
plotly ready: 6.5.2
nbformat ready: 5.10.4
ipython ready: 9.3.0
Plotly renderer set to: plotly_mimetype


In [2]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from IPython.display import display, Markdown

try:
    import ipywidgets as widgets
except ImportError as exc:
    raise ImportError('ipywidgets is required for this notebook. Install with `pip install ipywidgets`.') from exc

plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

COLOR_BG = '#f8fafc'
COLOR_MAIN = '#2563eb'
COLOR_ACCENT = '#f59e0b'
COLOR_GOOD = '#059669'
COLOR_BAD = '#dc2626'

## Section 1: From Brain Neuron to AI Neuron
Biological intuition:
- **Dendrites** receive signals.
- **Cell body (soma)** integrates signals.
- If signal is strong enough, neuron **fires** through the axon.

AI simplification:
- Inputs are numbers.
- We combine them into one score.
- If score crosses a threshold, output activates.

In [3]:
def biological_to_ai_neuron_demo(excitatory_a=4, excitatory_b=3, inhibitory=2, threshold=5):
    total_signal = excitatory_a + excitatory_b - inhibitory
    fires = total_signal >= threshold

    fig, axes = plt.subplots(1, 2, figsize=(13, 4.5))
    fig.patch.set_facecolor(COLOR_BG)

    ax = axes[0]
    ax.set_xlim(0, 10)
    ax.set_ylim(0, 6)
    ax.axis('off')

    soma = patches.Circle((5, 3), 1.1, facecolor='#fde68a', edgecolor='#92400e', linewidth=2)
    ax.add_patch(soma)

    ax.annotate('', xy=(4.1, 3.6), xytext=(1.2, 5.1), arrowprops=dict(arrowstyle='->', lw=3, color=COLOR_MAIN))
    ax.annotate('', xy=(4.0, 2.7), xytext=(1.0, 3.0), arrowprops=dict(arrowstyle='->', lw=3, color=COLOR_MAIN))
    ax.annotate('', xy=(4.1, 2.0), xytext=(1.3, 1.0), arrowprops=dict(arrowstyle='->', lw=3, color=COLOR_BAD))

    ax.text(0.4, 5.2, f'+{excitatory_a}', fontsize=12, color=COLOR_MAIN, weight='bold')
    ax.text(0.2, 3.1, f'+{excitatory_b}', fontsize=12, color=COLOR_MAIN, weight='bold')
    ax.text(0.5, 0.8, f'-{inhibitory}', fontsize=12, color=COLOR_BAD, weight='bold')

    axon_color = COLOR_GOOD if fires else '#6b7280'
    ax.annotate('', xy=(9.2, 3), xytext=(6.1, 3), arrowprops=dict(arrowstyle='->', lw=5, color=axon_color))
    ax.text(9.25, 3.05, 'Spike' if fires else 'No spike', color=axon_color, fontsize=11, va='center')

    ax.text(4.3, 3.0, 'Soma', fontsize=11, color='#78350f')
    ax.set_title('Biological neuron (intuition)', fontsize=13, weight='bold')

    ax2 = axes[1]
    labels = ['Excitatory A', 'Excitatory B', 'Inhibitory']
    values = [excitatory_a, excitatory_b, -inhibitory]
    colors = [COLOR_MAIN, COLOR_MAIN, COLOR_BAD]
    ax2.bar(labels, values, color=colors)
    ax2.axhline(0, color='black', linewidth=1)
    ax2.axhline(threshold, color=COLOR_ACCENT, linestyle='--', linewidth=2, label=f'Threshold = {threshold}')
    ax2.scatter(['Total'], [total_signal], color=COLOR_GOOD if fires else COLOR_BAD, s=120, zorder=3)
    ax2.text(2.8, total_signal + 0.2, f'Total = {total_signal:.1f}', fontsize=11)
    ax2.set_ylabel('Signal strength')
    ax2.set_title('Signal integration', fontsize=13, weight='bold')
    ax2.legend(loc='upper right')

    plt.tight_layout()
    plt.show()

    state = 'FIRES' if fires else 'DOES NOT FIRE'
    display(Markdown(f"**Integrated signal:** `{total_signal:.2f}`  |  **Threshold:** `{threshold:.2f}`  |  **Result:** `{state}`"))


widgets.interact(
    biological_to_ai_neuron_demo,
    excitatory_a=widgets.IntSlider(min=0, max=10, step=1, value=4, description='Exc A'),
    excitatory_b=widgets.IntSlider(min=0, max=10, step=1, value=3, description='Exc B'),
    inhibitory=widgets.IntSlider(min=0, max=10, step=1, value=2, description='Inhibitory'),
    threshold=widgets.IntSlider(min=0, max=12, step=1, value=5, description='Threshold'),
);

interactive(children=(IntSlider(value=4, description='Exc A', max=10), IntSlider(value=3, description='Exc B',…

### Try this (Section 1)
1. Increase inhibitory signal until the neuron no longer fires.
2. Keep inputs fixed and move threshold up/down.
3. Explain in one sentence: what does threshold represent clinically?

## Section 2: Perceptron Basics (Inputs -> Weights -> Bias -> Threshold -> Output)
A perceptron computes:
`z = w1*x1 + w2*x2 + bias`
Then applies a step rule:
`output = 1 if z >= threshold else 0`

In [None]:
def perceptron_playground(x1=0.5, x2=-0.2, w1=1.5, w2=-1.0, bias=0.1, threshold=0.0):
    z = w1 * x1 + w2 * x2 + bias
    y = 1 if z >= threshold else 0

    fig, axes = plt.subplots(1, 2, figsize=(13, 5))
    fig.patch.set_facecolor(COLOR_BG)

    ax = axes[0]
    ax.axis('off')
    ax.set_xlim(0, 10)
    ax.set_ylim(0, 8)

    for pos, label, val in [((1.5, 6), 'x1', x1), ((1.5, 2), 'x2', x2)]:
        circ = patches.Circle(pos, 0.65, facecolor='#dbeafe', edgecolor='#1d4ed8', linewidth=2)
        ax.add_patch(circ)
        ax.text(pos[0], pos[1], f'{label}\n{val:.2f}', ha='center', va='center', fontsize=10)

    neuron = patches.Circle((6.2, 4), 1.25, facecolor='#fee2e2', edgecolor='#b91c1c', linewidth=2)
    ax.add_patch(neuron)
    ax.text(6.2, 4, 'Perceptron', ha='center', va='center', fontsize=11, weight='bold')

    ax.annotate('', xy=(5.0, 4.9), xytext=(2.2, 6), arrowprops=dict(arrowstyle='->', lw=2.8, color=COLOR_MAIN))
    ax.annotate('', xy=(5.0, 3.1), xytext=(2.2, 2), arrowprops=dict(arrowstyle='->', lw=2.8, color=COLOR_MAIN))
    ax.text(3.2, 5.7, f'w1={w1:.2f}', color=COLOR_MAIN)
    ax.text(3.2, 2.2, f'w2={w2:.2f}', color=COLOR_MAIN)

    out_color = COLOR_GOOD if y == 1 else COLOR_BAD
    ax.annotate('', xy=(9.1, 4), xytext=(7.45, 4), arrowprops=dict(arrowstyle='->', lw=4.0, color=out_color))
    ax.text(9.15, 4.0, f'Output={y}', color=out_color, va='center', fontsize=11, weight='bold')

    ax.text(4.6, 7.2, f'z = w1*x1 + w2*x2 + b = {z:.3f}', fontsize=11)
    ax.text(4.6, 6.4, f'Threshold = {threshold:.3f}', fontsize=11, color=COLOR_ACCENT)
    ax.set_title('Perceptron mechanics', fontsize=13, weight='bold')

    ax2 = axes[1]
    xx, yy = np.meshgrid(np.linspace(-1, 1, 220), np.linspace(-1, 1, 220))
    zz = w1 * xx + w2 * yy + bias - threshold
    ax2.contourf(xx, yy, zz >= 0, levels=[-1, 0, 1], colors=['#fee2e2', '#dcfce7'], alpha=0.75)

    if abs(w2) > 1e-8:
        x_line = np.linspace(-1, 1, 200)
        y_line = (threshold - bias - w1 * x_line) / w2
        ax2.plot(x_line, y_line, color='black', linewidth=2, label='Decision boundary')
    elif abs(w1) > 1e-8:
        x_const = (threshold - bias) / w1
        ax2.axvline(x_const, color='black', linewidth=2, label='Decision boundary')

    ax2.scatter([x1], [x2], color=out_color, s=110, edgecolor='black', linewidth=1.2, zorder=3)
    ax2.set_xlim(-1, 1)
    ax2.set_ylim(-1, 1)
    ax2.set_xlabel('x1')
    ax2.set_ylabel('x2')
    ax2.set_title('Classification region (step output)', fontsize=13, weight='bold')
    ax2.legend(loc='upper right')

    plt.tight_layout()
    plt.show()


widgets.interact(
    perceptron_playground,
    x1=widgets.FloatSlider(min=-1, max=1, step=0.05, value=0.5, description='x1'),
    x2=widgets.FloatSlider(min=-1, max=1, step=0.05, value=-0.2, description='x2'),
    w1=widgets.FloatSlider(min=-3, max=3, step=0.1, value=1.5, description='w1'),
    w2=widgets.FloatSlider(min=-3, max=3, step=0.1, value=-1.0, description='w2'),
    bias=widgets.FloatSlider(min=-2, max=2, step=0.1, value=0.1, description='bias'),
    threshold=widgets.FloatSlider(min=-2, max=2, step=0.1, value=0.0, description='threshold'),
);

interactive(children=(FloatSlider(value=0.5, description='x1', max=1.0, min=-1.0, step=0.05), FloatSlider(valu…

### Try this (Section 2)
1. Set `w1` close to zero and observe how `x1` loses influence.
2. Increase bias while keeping inputs fixed. What happens to output?
3. Move threshold higher and explain the clinical meaning (more conservative trigger).

## Section 3: Activation Functions
Step functions are rigid. Modern neural networks use smooth activations so learning is easier.

Common activations:
- Sigmoid: output between 0 and 1
- Tanh: output between -1 and 1
- ReLU: zero for negative inputs, linear for positive inputs
- Leaky ReLU: small negative slope to avoid dead neurons

In [None]:
def activation_values(name, x, alpha=0.1):
    if name == 'Step':
        return (x >= 0).astype(float), 'f(z)=1 if z>=0 else 0'
    if name == 'Sigmoid':
        return 1 / (1 + np.exp(-x)), 'f(z)=1/(1+exp(-z))'
    if name == 'Tanh':
        return np.tanh(x), 'f(z)=tanh(z)'
    if name == 'ReLU':
        return np.maximum(0, x), 'f(z)=max(0,z)'
    if name == 'Leaky ReLU':
        return np.where(x >= 0, x, alpha * x), f'f(z)=z if z>=0 else {alpha:.2f}*z'
    raise ValueError('Unknown activation name')


def activation_explorer(activation='Sigmoid', z_point=0.5, alpha=0.1):
    x = np.linspace(-6, 6, 500)
    y, formula = activation_values(activation, x, alpha=alpha)
    y_point, _ = activation_values(activation, np.array([z_point]), alpha=alpha)
    y_point = float(y_point[0])

    fig, ax = plt.subplots(figsize=(9, 4.5))
    fig.patch.set_facecolor(COLOR_BG)
    ax.plot(x, y, color=COLOR_MAIN, linewidth=3)
    ax.scatter([z_point], [y_point], color=COLOR_ACCENT, s=100, zorder=5)
    ax.axvline(0, color='gray', linestyle='--', linewidth=1)
    ax.axhline(0, color='gray', linestyle='--', linewidth=1)
    ax.set_title(f'{activation} activation', fontsize=14, weight='bold')
    ax.set_xlabel('Input z')
    ax.set_ylabel('Output f(z)')
    ax.text(0.02, 0.95, formula, transform=ax.transAxes, va='top', fontsize=11,
            bbox=dict(facecolor='white', edgecolor='#cbd5e1', boxstyle='round,pad=0.3'))
    ax.text(0.02, 0.84, f'At z={z_point:.2f}, output={y_point:.4f}', transform=ax.transAxes, va='top', fontsize=11)
    plt.tight_layout()
    plt.show()


widgets.interact(
    activation_explorer,
    activation=widgets.Dropdown(options=['Step', 'Sigmoid', 'Tanh', 'ReLU', 'Leaky ReLU'], value='Sigmoid', description='Function'),
    z_point=widgets.FloatSlider(min=-6, max=6, step=0.1, value=0.5, description='z'),
    alpha=widgets.FloatSlider(min=0.01, max=0.4, step=0.01, value=0.1, description='alpha'),
);

interactive(children=(Dropdown(description='Function', index=1, options=('Step', 'Sigmoid', 'Tanh', 'ReLU', 'L…

### Try this (Section 3)
1. Compare sigmoid vs ReLU at large positive and negative `z`.
2. Use Leaky ReLU and change `alpha`; observe negative-side behavior.
3. Why might a hard step function be harder to train than sigmoid/ReLU?

## Section 4: Loss Functions (How wrong are we?)
Loss is a number that measures prediction error. Training tries to reduce this number.

- Regression examples: **MAE**, **MSE**
- Classification examples: **Binary Cross-Entropy (BCE)**

In [6]:
def regression_loss_demo(true_value=0.7, pred_value=0.4):
    x = np.linspace(0, 1, 400)
    mse_curve = (x - true_value) ** 2
    mae_curve = np.abs(x - true_value)

    mse = (pred_value - true_value) ** 2
    mae = abs(pred_value - true_value)

    fig, ax = plt.subplots(figsize=(9, 4.5))
    fig.patch.set_facecolor(COLOR_BG)
    ax.plot(x, mse_curve, color='#7c3aed', linewidth=2.5, label='MSE')
    ax.plot(x, mae_curve, color='#16a34a', linewidth=2.5, label='MAE')
    ax.scatter([pred_value], [mse], color='#7c3aed', s=90)
    ax.scatter([pred_value], [mae], color='#16a34a', s=90)
    ax.axvline(true_value, color=COLOR_ACCENT, linestyle='--', linewidth=2, label=f'True value={true_value:.2f}')
    ax.set_title('Regression losses vs prediction', fontsize=13, weight='bold')
    ax.set_xlabel('Prediction')
    ax.set_ylabel('Loss')
    ax.legend(loc='upper left')
    plt.tight_layout()
    plt.show()

    display(Markdown(f"**MAE:** `{mae:.4f}` | **MSE:** `{mse:.4f}`"))


widgets.interact(
    regression_loss_demo,
    true_value=widgets.FloatSlider(min=0, max=1, step=0.01, value=0.7, description='True y'),
    pred_value=widgets.FloatSlider(min=0, max=1, step=0.01, value=0.4, description='Pred y'),
);

interactive(children=(FloatSlider(value=0.7, description='True y', max=1.0, step=0.01), FloatSlider(value=0.4,…

In [7]:
def classification_loss_demo(true_label=1, pred_prob=0.8):
    eps = 1e-9
    p = np.linspace(0.001, 0.999, 500)
    bce_y1 = -np.log(p)
    bce_y0 = -np.log(1 - p)

    y = int(true_label)
    bce = -(y * np.log(pred_prob + eps) + (1 - y) * np.log(1 - pred_prob + eps))

    fig, ax = plt.subplots(figsize=(9, 4.5))
    fig.patch.set_facecolor(COLOR_BG)
    ax.plot(p, bce_y1, color='#2563eb', linewidth=2.5, label='BCE when true label=1')
    ax.plot(p, bce_y0, color='#dc2626', linewidth=2.5, label='BCE when true label=0')
    ax.scatter([pred_prob], [bce], color=COLOR_ACCENT, s=120, zorder=5)
    ax.set_ylim(0, 6)
    ax.set_xlabel('Predicted probability for class 1')
    ax.set_ylabel('BCE loss')
    ax.set_title('Binary cross-entropy intuition', fontsize=13, weight='bold')
    ax.legend(loc='upper center')
    plt.tight_layout()
    plt.show()

    display(Markdown(f"**True label:** `{y}` | **Pred prob:** `{pred_prob:.3f}` | **BCE:** `{bce:.4f}`"))


widgets.interact(
    classification_loss_demo,
    true_label=widgets.ToggleButtons(options=[0, 1], value=1, description='True class'),
    pred_prob=widgets.FloatSlider(min=0.001, max=0.999, step=0.001, value=0.8, description='Pred prob'),
);

interactive(children=(ToggleButtons(description='True class', index=1, options=(0, 1), value=1), FloatSlider(v…

### Try this (Section 4)
1. In regression, move prediction away from true value and compare MAE vs MSE sensitivity.
2. In BCE, set true label to 1 and push predicted probability toward 0. What happens?
3. Why might wrong-but-confident predictions be dangerous in medicine?

## Section 5: Backpropagation (Error correction)
High-level idea:
1. Make prediction
2. Compute loss
3. Compute gradients (how each parameter affected error)
4. Update weights/bias in the opposite direction of gradient

In this demo, we train a **single sigmoid neuron** using binary cross-entropy.

In [8]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


def run_single_neuron_training(x1, x2, y_true, w1, w2, b, lr, steps):
    eps = 1e-9
    history = []

    for step in range(steps + 1):
        z = w1 * x1 + w2 * x2 + b
        y_hat = sigmoid(z)
        loss = -(y_true * np.log(y_hat + eps) + (1 - y_true) * np.log(1 - y_hat + eps))

        history.append({
            'step': step,
            'w1': w1,
            'w2': w2,
            'b': b,
            'y_hat': y_hat,
            'loss': loss,
        })

        if step == steps:
            break

        dz = y_hat - y_true
        dw1 = dz * x1
        dw2 = dz * x2
        db = dz

        w1 = w1 - lr * dw1
        w2 = w2 - lr * dw2
        b = b - lr * db

    return history


def backprop_demo(x1=0.7, x2=0.2, y_true=1, w1=-0.5, w2=0.6, b=0.0, lr=0.4, steps=20):
    history = run_single_neuron_training(x1, x2, y_true, w1, w2, b, lr, steps)

    losses = [h['loss'] for h in history]
    step_idx = [h['step'] for h in history]

    start = history[0]
    end = history[-1]

    fig, axes = plt.subplots(1, 2, figsize=(13, 4.8))
    fig.patch.set_facecolor(COLOR_BG)

    axes[0].plot(step_idx, losses, color=COLOR_MAIN, linewidth=2.8)
    axes[0].scatter([0, steps], [start['loss'], end['loss']], color=[COLOR_BAD, COLOR_GOOD], s=90)
    axes[0].set_title('Loss decreases as we update parameters', fontsize=13, weight='bold')
    axes[0].set_xlabel('Training step')
    axes[0].set_ylabel('BCE loss')

    labels = ['w1', 'w2', 'bias']
    start_params = [start['w1'], start['w2'], start['b']]
    end_params = [end['w1'], end['w2'], end['b']]
    x = np.arange(len(labels))
    width = 0.33

    axes[1].bar(x - width / 2, start_params, width=width, label='Start', color='#93c5fd')
    axes[1].bar(x + width / 2, end_params, width=width, label='After training', color='#34d399')
    axes[1].set_xticks(x)
    axes[1].set_xticklabels(labels)
    axes[1].set_title('Parameter updates from backpropagation', fontsize=13, weight='bold')
    axes[1].legend()

    plt.tight_layout()
    plt.show()

    summary = (
        f"**Before training**: prediction=`{start['y_hat']:.4f}`, loss=`{start['loss']:.4f}`  \
"
        f"**After {steps} steps**: prediction=`{end['y_hat']:.4f}`, loss=`{end['loss']:.4f}`  \
"
        "Update rule intuition: `parameter_new = parameter_old - learning_rate * gradient`"
    )
    display(Markdown(summary))


widgets.interact(
    backprop_demo,
    x1=widgets.FloatSlider(min=0, max=1, step=0.01, value=0.7, description='x1'),
    x2=widgets.FloatSlider(min=0, max=1, step=0.01, value=0.2, description='x2'),
    y_true=widgets.ToggleButtons(options=[0, 1], value=1, description='True y'),
    w1=widgets.FloatSlider(min=-2, max=2, step=0.05, value=-0.5, description='w1'),
    w2=widgets.FloatSlider(min=-2, max=2, step=0.05, value=0.6, description='w2'),
    b=widgets.FloatSlider(min=-2, max=2, step=0.05, value=0.0, description='bias'),
    lr=widgets.FloatSlider(min=0.01, max=1.0, step=0.01, value=0.4, description='learn rate'),
    steps=widgets.IntSlider(min=1, max=100, step=1, value=20, description='steps'),
);

interactive(children=(FloatSlider(value=0.7, description='x1', max=1.0, step=0.01), FloatSlider(value=0.2, des…

### Try this (Section 5)
1. Start with a very high learning rate (`~1.0`) and observe instability.
2. Use a very low learning rate (`~0.01`) and observe slow learning.
3. Change true label from 1 to 0 and explain how gradient direction changes.

## Section 6: Local Minima vs Global Minima
Neural network optimization searches for low-loss regions.
- **Global minimum:** best loss overall.
- **Local minimum:** good in a local neighborhood, but not best overall.

Real deep networks are high-dimensional; this 2D surface is a visual intuition tool.

In [9]:
def landscape_loss(w, b):
    return 0.25 * (w ** 4 + b ** 4) - 1.2 * (w ** 2 + b ** 2) + 0.5 * w * b + 2.0


def landscape_grad(w, b):
    d_w = w ** 3 - 2.4 * w + 0.5 * b
    d_b = b ** 3 - 2.4 * b + 0.5 * w
    return d_w, d_b


def descent_path(start_w, start_b, lr, steps):
    w, b = start_w, start_b
    path = [(w, b, landscape_loss(w, b))]
    for _ in range(steps):
        d_w, d_b = landscape_grad(w, b)
        w = w - lr * d_w
        b = b - lr * d_b
        path.append((w, b, landscape_loss(w, b)))
    return np.array(path)


def minima_demo(start_w=-1.8, start_b=1.5, lr=0.05, steps=30):
    w_axis = np.linspace(-2.4, 2.4, 220)
    b_axis = np.linspace(-2.4, 2.4, 220)
    W, B = np.meshgrid(w_axis, b_axis)
    Z = landscape_loss(W, B)

    path = descent_path(start_w, start_b, lr, steps)

    g_idx = np.argmin(Z)
    g_w = W.ravel()[g_idx]
    g_b = B.ravel()[g_idx]
    g_loss = Z.ravel()[g_idx]

    fig, ax = plt.subplots(figsize=(9, 7))
    fig.patch.set_facecolor(COLOR_BG)

    contour = ax.contourf(W, B, Z, levels=40, cmap='viridis')
    plt.colorbar(contour, ax=ax, label='Loss')
    ax.contour(W, B, Z, levels=12, colors='white', linewidths=0.5, alpha=0.5)

    ax.plot(path[:, 0], path[:, 1], color='#f97316', linewidth=2.5, marker='o', markersize=3, label='Gradient descent path')
    ax.scatter(path[0, 0], path[0, 1], color='#ef4444', s=90, edgecolor='black', linewidth=1, label='Start')
    ax.scatter(path[-1, 0], path[-1, 1], color='#22c55e', s=90, edgecolor='black', linewidth=1, label='End')
    ax.scatter(g_w, g_b, color='gold', marker='*', s=220, edgecolor='black', linewidth=1, label='Approx global minimum')

    ax.set_xlabel('Parameter w')
    ax.set_ylabel('Parameter b')
    ax.set_title('Loss landscape: local vs global minima intuition', fontsize=13, weight='bold')
    ax.legend(loc='upper right')
    plt.tight_layout()
    plt.show()

    end_loss = path[-1, 2]
    display(Markdown(f"Start loss: `{path[0,2]:.4f}` | End loss: `{end_loss:.4f}` | Approx global minimum loss (grid): `{g_loss:.4f}`"))


widgets.interact(
    minima_demo,
    start_w=widgets.FloatSlider(min=-2.2, max=2.2, step=0.1, value=-1.8, description='start w'),
    start_b=widgets.FloatSlider(min=-2.2, max=2.2, step=0.1, value=1.5, description='start b'),
    lr=widgets.FloatSlider(min=0.005, max=0.2, step=0.005, value=0.05, description='learn rate'),
    steps=widgets.IntSlider(min=1, max=80, step=1, value=30, description='steps'),
);

interactive(children=(FloatSlider(value=-1.8, description='start w', max=2.2, min=-2.2), FloatSlider(value=1.5…

### Try this (Section 6)
1. Keep learning rate fixed and move only the starting point. Do you always end at the same place?
2. Increase learning rate and observe overshooting/divergence patterns.
3. In one sentence: why can initialization matter in neural networks?

## Wrap-up
You now have the core mental model:
1. Neuron inspiration -> perceptron mechanics
2. Activation functions make networks expressive and trainable
3. Loss measures error
4. Backpropagation adjusts parameters to reduce loss
5. Optimization can get trapped in local minima

Next step in the curriculum: stack many neurons into hidden layers and train a small network end-to-end on a medical dataset.