# Day 18 — "Fully Connected Layers vs Convolution Layers: Geometry & Gradient Differences"

Fully connected (FC) layers treat every input independently; convolution layers exploit local patterns and reuse them spatially.

## 1. Core Intuition

- FC: global mixing, dense parameter matrix.
- Conv: local receptive fields with shared filters (translation equivariance).

## 2. Geometry

- FC `y = Wx + b` has every neuron see the entire input.
- Conv preserves spatial layout: sliding windows keep locality and translation behavior.

## 3. Python — FC vs Conv Gradients

`days/day18/code/fc_vs_conv.py` implements the helper functions below.

In [1]:
from __future__ import annotations

import sys
from pathlib import Path
import numpy as np


def find_repo_root(marker: str = "days") -> Path:
    path = Path.cwd()
    while path != path.parent:
        if (path / marker).exists():
            return path
        path = path.parent
    raise RuntimeError("Run this notebook from inside the repository tree.")

REPO_ROOT = find_repo_root()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

from days.day18.code.fc_vs_conv import fc_backward, conv_weight_grad

# FC example
x = np.random.randn(5)
W = np.random.randn(3, 5)
dy_fc = np.random.randn(3)
dW_fc, dx_fc = fc_backward(x, W, dy_fc)
print('FC dW shape:', dW_fc.shape, 'dx shape:', dx_fc.shape)

# Conv example
img = np.random.randn(6, 6)
kernel = np.random.randn(3, 3)
dy_conv = np.random.randn(4, 4)
dW_conv = conv_weight_grad(img, kernel, dy_conv)
print('Conv dW:\n', dW_conv)


FC dW shape: (3, 5) dx shape: (5,)
Conv dW:
 [[ 1.41792147  2.11409303  4.65769077]
 [ 2.34234268 -0.17878304  1.78907954]
 [-7.00217241  1.38537899 -4.29486399]]


## 4. Visualization — Gradient Coverage

`days/day18/code/visualizations.py` animates FC global influence vs Conv local sliding influence.

In [2]:
from days.day18.code.visualizations import anim_fc_vs_conv

RUN_ANIMATIONS = False

if RUN_ANIMATIONS:
    gif = anim_fc_vs_conv()
    print('Saved animation →', gif)
else:
    print('Set RUN_ANIMATIONS = True to regenerate Day 18 figures in days/day18/outputs/.')


Set RUN_ANIMATIONS = True to regenerate Day 18 figures in days/day18/outputs/.


## 5. Optimization & Parameters

- FC: `m×n` parameters, dense gradients, easy to overfit on images.
- Conv: `k×k×C_in×C_out` parameters, gradients accumulate locally, more stable.
- CNNs keep spatial geometry; FC layers destroy it.

## 8. Mini Exercises

1. Compute parameter counts for FC vs Conv on 224×224 images.
2. Visualize gradient norms for FC vs Conv.
3. Replace first FC layer with 1×1 convolution and compare accuracy.
4. Modify kernel size and observe weight gradient patterns.
5. Compare flattened FC outputs vs CNN outputs for the same image.

## 9. Key Takeaways

| Point | Meaning |
| --- | --- |
| FC layers mix all inputs globally | huge parameter count, no spatial structure. |
| Conv layers learn local patterns and share them | efficient, stable gradients. |
| FC gradients are outer products | dense updates; prone to overfit. |
| Conv gradients accumulate from many positions | smooth updates; preserve geometry. |

> Fully connected layers learn anything; convolution layers learn everything that matters.