# VECTORIZATION

*We will simulate a small batch of low-resolution images to demonstrate Vectorization.*

* The goal is to transform 3D image data (Height $\times$ Width $\times$ Channels) into the matrix formats required for processing.

1.  *Key transformation: Reshaping 3D arrays into 1D vectors.*

2.  *Key conflict: You will see firsthand how the "Theory" notation from your transcript ($n_x, m$) conflicts with the "Practice" notation of Scikit-Learn/PyTorch ($m, n_x$).2.* 

3.  *Technical Deep Dive: The Code Scenario Setup Let's pretend we have 10 images, 
each 64x64 pixels with 3 color channels (RGB).$m = 10$$n_x = 64 \times 64 \times 3 = 12,288$*

# NUMPY EXAMPLE 

NumPy is used here to manually construct the matrix exactly as described in your course materials (Column-Major).

In [27]:
import numpy as np

# 1. Simulate the data: 10 images, 64x64, 3 channels
# Random pixel values between 0-255
images_raw = np.random.randint(0, 256, size=(10, 64, 64, 3)) 

print(f"Original Batch Shape: {images_raw.shape}") 
# Output: (10, 64, 64, 3) -> (m, height, width, channels)

# 2. Unroll ONE image (The "Sweater" analogy)
single_image = images_raw[0]
# reshape(-1) tells numpy to figure out the dimension length automatically
# reshape(-1, 1) forces it into a column vector (nx, 1)
single_vector = single_image.reshape(-1, 1) 

print(f"Single Vector Shape: {single_vector.shape}") 
# Output: (12288, 1) -> Matches course notation x vector

# 3. Vectorize the ENTIRE batch into Matrix X
# We want shape (nx, m) -> (12288, 10)
# Step A: Flatten the image dimensions (64*64*3)
# Step B: Transpose (.T) to turn rows (examples) into columns
X_theory = images_raw.reshape(images_raw.shape[0], -1).T

print(f"Matrix X Shape (Course Notation): {X_theory.shape}")
# Output: (12288, 10) -> (Features, Examples)

Original Batch Shape: (10, 64, 64, 3)
Single Vector Shape: (12288, 1)
Matrix X Shape (Course Notation): (12288, 10)


In [3]:
images_raw

array([[[[ 58, 207,  42],
         [244, 210, 172],
         [122,  52,  31],
         ...,
         [ 38,  99,  31],
         [232,  67,  23],
         [137, 104, 140]],

        [[190, 171, 156],
         [ 17,  42,  57],
         [158,  77, 134],
         ...,
         [184,  50, 150],
         [219, 101, 146],
         [ 84,  78,  72]],

        [[203, 183, 100],
         [150,   4,  98],
         [ 62,  11,   3],
         ...,
         [230, 169, 251],
         [210,  27, 186],
         [ 45, 113,  84]],

        ...,

        [[145,  93, 241],
         [217,   9, 107],
         [245,  33,  40],
         ...,
         [134,  95, 143],
         [206, 242,  43],
         [153, 200,  61]],

        [[142, 166, 254],
         [140, 225, 159],
         [163, 110,  73],
         ...,
         [ 45, 206,   5],
         [ 64, 240,  68],
         [225,  21, 177]],

        [[194, 109, 114],
         [ 13, 238, 240],
         [150, 174,  13],
         ...,
         [ 26, 172, 234],
        

# TORCH

PyTorch is built for GPUs. It prefers keeping the Batch Dimension ($m$) as the first dimension.

In [22]:
import torch

# 1. Convert NumPy data to PyTorch Tensor
# PyTorch images are usually (Batch, Channels, Height, Width), but we'll stick to our raw input
tensor_images = torch.tensor(images_raw, dtype=torch.float32)

# 2. The "Flatten" Operation
# start_dim=1 means: Keep dim 0 (the batch of 10) intact, flatten everything else.
X_pytorch = torch.flatten(tensor_images, start_dim=1)

print(f"PyTorch Input Shape: {X_pytorch.shape}")
# Output: (10, 12288) -> (Batch, Features)

# NOTE: This is the OPPOSITE of your transcript.
# To force it to match your transcript (for math operations):
X_pytorch_transposed = X_pytorch.T
print(f"PyTorch Shape (Transposed to match theory): {X_pytorch_transposed.shape}")
# Output: (12288, 10)

PyTorch Input Shape: torch.Size([10, 12288])
PyTorch Shape (Transposed to match theory): torch.Size([12288, 10])


In [23]:
X_pytorch_transposed

tensor([[ 58.,  92., 127.,  ...,  73., 183., 129.],
        [207., 228., 107.,  ..., 219., 105.,  77.],
        [ 42., 184., 227.,  ..., 123., 175.,  82.],
        ...,
        [198., 121.,  43.,  ...,  78.,  44., 166.],
        [250., 161., 104.,  ...,   6., 116., 139.],
        [ 68.,  16.,  46.,  ..., 197.,  78.,  18.]])

# Scikit-Learn (The Classic ML Standard)

Scikit-Learn strictly enforces $(Samples, Features)$. It will often throw an error if you provide $(Features, Samples)$.

In [26]:
from sklearn.linear_model import LogisticRegression

# Scikit-Learn expects X to be (n_samples, n_features)
X_sklearn = images_raw.reshape(10, -1) # Shape: (10, 12288)
Y_labels = np.random.randint(0, 2, size=(10,)) # Binary labels

# Initialize model
clf = LogisticRegression()

# If we tried passing X_theory (12288, 10), this would CRASH.
# We must pass the row-major format:
clf.fit(X_sklearn, Y_labels)

print("Scikit-Learn model trained successfully on (Batch, Features) format.")

Scikit-Learn model trained successfully on (Batch, Features) format.


# EXPERT NOTES

## Vectorization Framework Comparison: NumPy vs. Scikit-Learn vs. PyTorch

### 1. Executive Summary
"Vectorization" refers to processing an entire batch of data (Matrix $X$) simultaneously rather than looping through examples one by one.

* **NumPy** is the **Fundamental Choice**: It is the best tool for *understanding* the linear algebra (like in your current course), but it is limited to the CPU.
* **Scikit-Learn** is the **Classic Choice**: It abstracts vectorization away completely. You don't write the matrix math; you just call `.fit()`. It is excellent for traditional ML (SVMs, Random Forests) but insufficient for custom Deep Learning.
* **PyTorch** is the **Optimal Choice for Deep Learning**: It mimics NumPy's syntax but adds two superpowers: **GPU Acceleration** (parallel processing) and **Autograd** (calculating gradients automatically).

**Verdict:** For learning the *mechanics* (now), use **NumPy**. For building *production Neural Networks* (later), **PyTorch** is strictly superior.

---

### 2. Technical Deep Dive: The Comparison Matrix

| Feature | **NumPy** | **Scikit-Learn** | **PyTorch** |
| :--- | :--- | :--- | :--- |
| **Primary Role** | Numerical Computing & Linear Algebra | Traditional Machine Learning Algorithms | Deep Learning & Tensor Computation |
| **Hardware** | **CPU Only** (Serial/SIMD) | **CPU Only** (mostly) | **GPU & TPU** (Massive Parallelism) |
| **Vectorization** | Manual (You write `np.dot`) | Hidden (Internally optimized) | Manual (You write `torch.matmul`) |
| **Gradients** | Manual (You write the derivatives) | N/A (Handled internally) | **Automatic** (Autograd engine) |
| **Data Shape** | Agnostic (Row or Column major) | Strict Row-Major $(m, n_x)$ | Agnostic (Prefer Batch-First) |
| **Best For** | Prototyping, Data Cleaning, Math Theory | Quick Baselines (Logistic Reg, SVM) | **Training Neural Networks** |

---

### 3. Analysis: When to Use Which?

###  A. NumPy: "The Whiteboard Simulator"
**Use this when:** You are learning the math (like in this module) or preprocessing data.
* **Why:** NumPy forces you to define the dimensions exactly as you write them in LaTeX equations ($Z = WX + b$). It is lightweight and installed everywhere.
* **Limit:** It cannot run on a Graphics Card (GPU). If you try to vectorize 1 million images in NumPy, your CPU will choke.

### B. Scikit-Learn: "The Black Box"
**Use this when:** You need a quick baseline result or are doing non-neural ML (e.g., Random Forest).
* **Why:** It handles the vectorization internally. You don't need to initialize weights or define forward propagation.
* **Limit:** It is too rigid for Deep Learning. You cannot easily change the architecture (e.g., adding a specific "Skip Connection" or changing activation functions layer-by-layer). It also focuses on row-major data $(m, n_x)$, which conflicts with your current course notation.

### C. PyTorch: "The Heavy Lifter"
**Use this when:** You are training Deep Learning models.
* **Why:** It is essentially "NumPy for GPUs."
    1.  **Speed:** Matrix multiplication on a GPU is $100\times$ faster than a CPU for large matrices.
    2.  **Autograd:** You write the Forward Pass, and PyTorch *automatically* calculates the Backward Pass (gradients). In NumPy, you have to hand-derive the calculus for backprop.
* **Limit:** It has a slightly higher learning curve and stricter type checking (e.g., `float32` vs `float64`).

---

## 4. Performance Benchmark (The "Why")

Imagine vectorizing the dot product $Z = WX$ for a massive layer.

* **Matrix Sizes:** $W$ is $(1000 \times 1000)$, $X$ is $(1000 \times 10000)$.
* **Operation:** Matrix Multiplication.

1.  **NumPy (CPU):**
    * The CPU processes this sequentially or with limited parallelism (SIMD).
    * *Time:* ~0.5 seconds.
2.  **PyTorch (GPU):**
    * The GPU has thousands of cores. It breaks the matrix into tiny tiles and computes them all at the exact same instant.
    * *Time:* ~0.005 seconds.

**Conclusion:** PyTorch is **orders of magnitude more optimal** for the specific vectorization required in Deep Learning.

---

## 5. Expert Nuance: The "Ecosystem" Workflow

Expert Data Scientists rarely choose *just* one. We use them in a specific pipeline:

1.  **NumPy:** Used to load raw data, reshape it, and normalize it (Data Preprocessing).
2.  **PyTorch:** The NumPy array is converted to a PyTorch Tensor (`torch.from_numpy()`). The Tensor is moved to the GPU (`.to('cuda')`) for the heavy vectorization and training.
3.  **Scikit-Learn:** Used afterwards to calculate metrics like Confusion Matrices or F1-Scores on the predictions.

**Recommendation:** Stick to **NumPy** for this specific week of your course to master the algebra. Switch to **PyTorch** as soon as you start building networks deeper than 2 layers.