In [1]:
import numpy as np

In [9]:
# 1. THE BATCH (3 Pizzas, 3 Features each)
# Shape: (3, 3) -> (Batch_Size, Input_Features)
inputs = np.array([
    [0.9, 0.5, 0.2], # Pizza 1
    [0.1, 0.2, 0.9], # Pizza 2
    [0.5, 0.5, 0.5]  # Pizza 3
])

# 2. THE WEIGHTS (1 Critic's preferences)
# Shape: (3,) -> (Input_Features,)
weights = np.array([5.0, 1.0, 0.5])

# 3. THE BIAS
bias = 1.0

In [13]:
# 4. Matrix Multiplication
# We still use np.dot. NumPy automatically handles the matrix-vector multiplication.
outputs = np.dot(inputs, weights) + bais
outputs

array([6.1 , 2.15, 4.25])

In [14]:
# 1. INPUTS (Batch of 3 Pizzas)
inputs = np.array([
    [0.9, 0.5, 0.2],
    [0.1, 0.2, 0.9],
    [0.5, 0.5, 0.5]
])

# 2. WEIGHTS (3 Critics now!)
# Column 0: Critic 1 (Loves Cheese)
# Column 1: Critic 2 (Loves Crust)
# Column 2: Critic 3 (Loves Sauce)
weights = np.array([
    [5.0, 0.1, 0.2], # Weights for Cheese
    [1.0, 5.0, 0.5], # Weights for Crust
    [0.5, 0.2, 5.0]  # Weights for Sauce
])

# 3. BIASES (One bias per Critic)
biases = np.array([1.0, 0.5, 0.1])

In [16]:
output = np.dot(inputs, weights) + biases
output

array([[6.1 , 3.13, 1.53],
       [2.15, 1.69, 4.72],
       [4.25, 3.15, 2.95]])

In [17]:
# Your Task: Write a Python script using numpy that performs exactly what we just did manually.
# Initialize the input vector x (shape 1x2).
# Initialize the weights matrix W (shape 2x3) with random numbers.
# Initialize the bias vector b (shape 1x3) with random numbers.
# Perform the dot product.
# Add the bias.
# Print the final output and its shape.

In [34]:
x = np.array([1.2, 5.4])
W = np.array([[1.0, -1.2, 0.7],
             [-0.2, 4.0, -0.95]])
b = np.array([1, 2, 3])
output = np.dot(x,W) + b
print(output)

[ 1.12 22.16 -1.29]


### **The "Shape Error" Analysis**

**Question:** What happens if you change `inputs` to have 4 numbers but keep `weights` at 3? Try it and tell me the error message. This is the "Shape Error" your worst enemy in AI, and we need to meet it today.

In AI Engineering, we call this a **Dimension Mismatch**.
When you tried to dot product a vector of size `(4,)` with weights of size `(3,)`, NumPy looked for a partner for the 4th number and found nothing.

**The Rule of Algebra:**
To multiply two things, their "inner" dimensions must match.
$$A (M \times \textcolor{red}{N}) \cdot B (\textcolor{red}{N} \times K) = C (M \times K)$$

If the inner numbers ($\textcolor{red}{N}$ and $\textcolor{red}{N}$) are different, the code explodes.

**Question for you:**
Look at the `outputs` variable.
You fed in **3** pizzas.
Did you get **1** score or **3** scores out?
Why does this result make sense for a neural network processing a batch of images?


You got **3 scores** because you processed a **Batch of 3**.
In Deep Learning, we never feed one image at a time. We feed a batch (e.g., 32 images) to the GPU, and the GPU performs this exact matrix multiplication to give us 32 predictions instantly.


### **Level Up: The "Dense Layer" (Multiple Neurons)**

Real neural networks don't just have one neuron (one critic). They have many.
Imagine we now have **3 Critics** rating the same pizzas:

1.  **Critic 1:** Loves Cheese.
2.  **Critic 2:** Loves Crust.
3.  **Critic 3:** Loves Sauce.

This means our `Weights` are no longer a vector. They become a **Matrix** too.

**The Math of Shapes:**

  * Inputs: `(3, 3)` [3 Pizzas, 3 Features]
  * Weights: `(3, 3)` [3 Features, 3 Critics]
  * **Result:** `(3, 3)` [3 Pizzas, 3 Scores (one from each critic)]

**Your Task:**
Modify your code to handle multiple critics.

In [38]:
X_batch = np.array([[1.2, 5.4], [0.5, 0.1], [-1.0, 2.0]])
W = np.array([[1.0, -1.2, 0.7],
             [-0.2, 4.0, -0.95]])
b = np.array([1, 2, 3])
output = np.dot(X_batch,W) + b
print(output)

[[ 1.12  22.16  -1.29 ]
 [ 1.48   1.8    3.255]
 [-0.4   11.2    0.4  ]]


**Crucial Question**: You don't need to change W or b. Why does the math still work?

**Answer**:
The maths works because of the shape sizes.
```
X_batch = 3x2
W       = 2x3
Output  = 3x3
b       = 3x1
```

### üõ†Ô∏è One Critical Correction (The Shape of Bias)
The bias `b` corresponds to the **Neurons**, not the **Samples**.
* We have **3 Neurons**, so we have **3 Biases**.
* We have **3 Samples** (the batch), but they **share** the same neurons (and thus the same biases).

If `b` were `3x1` (a vertical column), NumPy would try to add `bias[0]` to the first *sample*, `bias[1]` to the second *sample*, etc.
Instead, `b` acts as **`1x3`** (a row). NumPy takes that single row of biases and "copies" (broadcasts) it down to match the 3 rows of your input batch.

$$
\text{Output (3x3)} = \text{Dot Product (3x3)} + \text{Bias (1x3 } \xrightarrow{\text{broadcast}} \text{ 3x3)}
$$

## **The Switch (Activation Functions)**
### 1. The Problem: The "Linearity Trap"

Right now, your neural network is just doing this:
$$Output = \text{Input} \times \text{Weight} + \text{Bias}$$

This is a **Linear Equation** (like $y = mx + c$). It draws a straight line.

**The Fatal Flaw:**
If you stack 100 of these layers on top of each other, mathematics plays a trick on you.
* Layer 1: Multiply by 2.
* Layer 2: Multiply by 3.
* **Result:** You just multiplied by 6.

No matter how deep you make the network, if it's only linear layers, it collapses into **one single linear layer**. It can only draw straight lines. It cannot learn curves, shapes, or complex boundaries (like distinguishing a cat from a dog).



### 2. The Solution: The "Non-Linear" Switch

To fix this, we insert a "Switch" (Activation Function) after every linear layer.

This switch adds a "bend" or a "kink" to the math. It allows the neural network to warp space and draw complex shapes.

### 3. Meet ReLU (Rectified Linear Unit)

In modern AI (including the LLMs we use today), the most popular switch is **ReLU**.

Despite the complex name, it is shockingly simple. It mimics a biological neuron:
* **Biological Neuron:** Does it receive enough electrical signal?
    * **No:** Stay silent.
    * **Yes:** Fire!
* **ReLU:** Is the number positive?
    * **No (Negative):** Return 0 (Silence).
    * **Yes (Positive):** Return the number (Fire).

**The Graph:**


Notice the shape? It's not a straight line anymore. It's a "hinge." By combining millions of these hinges, a neural network can approximate *any* shape in the universe.

### 4. Applying it to your Matrix

Let's look at the `Output` matrix you calculated earlier:

$$
\begin{bmatrix}
1.12 & 22.16 & \mathbf{-1.29} \\
1.48 & 1.80 & 3.255 \\
\mathbf{-0.4} & 11.2 & 0.40
\end{bmatrix}
$$

I've highlighted the **negative numbers**. In a biological sense, these neurons are receiving "inhibitory" signals. They shouldn't be firing.

**The ReLU Operation:**

It sounds fancy, but it is the simplest function in Deep Learning:
$$f(x) = \max(0, x)$$

* If $x > 0$, keep $x$.
* If $x \le 0$, make it $0$.
* 
We pass every single number through $max(0, x)$.

1.  Take `22.16`. Is it $> 0$? **Yes.** Keep it: `22.16`.
2.  Take `-1.29`. Is it $> 0$? **No.** Kill it: `0`.

**Your Final Task for Week 1:**

Take your previous `output` matrix (the 3x3 one).
Write a single line of NumPy code to apply the ReLU activation to it.

1.  What happens to the negative values (like `-1.29` and `-0.4`)?
2.  Print the result.

In [42]:
np.maximum(0, output)

array([[ 1.12 , 22.16 ,  0.   ],
       [ 1.48 ,  1.8  ,  3.255],
       [ 0.   , 11.2  ,  0.4  ]])

### üìù The Test Specification: "The 3-Layer Mini-Network"

Instead of just one layer, I want you to build a small "feed-forward" pass through a network with **3 distinct layers**. This mimics a real deep learning structure.

**The Architecture:**
1.  **Input Data ($X$):** Batch size of `4`, with `5` features per sample. (Shape: `4x5`)
2.  **Layer 1 (Hidden):** `10` neurons. Activation: **ReLU**.
3.  **Layer 2 (Hidden):** `5` neurons. Activation: **ReLU**.
4.  **Layer 3 (Output):** `2` neurons. Activation: **None** (Linear output, often called "logits").

**Requirements:**
1.  **Random Initialization:** Initialize weights ($W$) and biases ($b$) for all 3 layers using `np.random.randn`.
2.  **The Forward Function:** Write a function `forward_pass(X, weights, biases)` (or similar structure) that takes the input and passes it through all three layers.
3.  **Shapes Matter:** Ensure the matrix multiplication shapes align correctly at every step.
4.  **Output:** Print the final output matrix and its shape.

**Expected Final Shape:**
Since you have a batch of `4` and the last layer has `2` neurons, your final output *must* be `(4, 2)`.

In [62]:
x = np.array([
    [1, 5, 1, 2, 6],
    [2, 2, 3, 9, 0],
    [-1, 6, 3, -5, 9],
    [4, 6, 8, 0, -3]
])
## Function, direct from theory
def forward_pass(X, weights, biases):
    dot = np.dot(X, weights) + biases
    return np.maximum(0, dot)

In [64]:
# ## L1 - 10 neurons, so bias = 10
w = np.random.randn(5, 10) ## Some random array as per requirements, which can be multiplied with a 4x5 matrix
b = np.random.randn(1, 10)
L1 = forward_pass(x, W, b)
print(np.shape(x), np.shape(W))
print(np.shape(L1), "\n", L1)

(4, 5) (5, 10)
(4, 10) 
 [[ 0.          3.72950843  1.30907066 13.16673361  3.37015803  2.11177928
   2.75738917  0.62562173  2.95360309  6.8628299 ]
 [ 0.          4.67728786  0.23982357  5.16344835 12.7527942   5.15526872
   0.          6.66838567 11.18854154 22.26352515]
 [ 0.          7.31886963  4.33816347 13.28614583  0.          0.
   6.72953664  0.          0.          0.        ]
 [10.03889561  6.0929095  10.80423429  2.02475444  8.37290042  0.
   8.80705251  0.          0.          7.73938613]]


In [65]:
# ## L2 - 5 neurons, so bias = 5
W = np.random.randn(10, 5) ## Should be multipliable with a 4x10 matrix
b = np.random.randn(1, 5)
L2 = forward_pass(L1, W, b)
print(np.shape(L1), np.shape(W))
print(np.shape(L2), "\n", L2)

(4, 10) (10, 5)
(4, 5) 
 [[ 21.37681674   0.           0.           0.          37.59631992]
 [ 52.62467193   8.19712596   0.           5.71111985 100.99535762]
 [  1.04399736   1.37717937   0.           0.           9.97131074]
 [ 10.48526949  27.47192144   0.           0.          53.5324209 ]]


In [67]:
# ## L3 - 2 neurons, so bias = 2
W = np.random.randn(5, 2)
b = np.random.randn(1, 2)
L3 = np.dot(L2, W) + b ## We do not use forward_pass(L2, W, b) as the activation in input is None
print(np.shape(L3), np.shape(W))
print(np.shape(L3), "\n", L3)

(4, 2) (5, 2)
(4, 2) 
 [[ -59.48792607  -17.02371587]
 [-161.55987726  -34.33465641]
 [ -15.74368026   -5.15299628]
 [ -85.4833848    -7.70420187]]


This is **excellent**.

You correctly identified the shape transformations, handled the data flow from one layer to the next, and‚Äîcrucially‚Äîyou spotted the requirement for **Layer 3** (no activation) and wrote custom code for it instead of using your function.

### Code Review: 9/10

You have passed the test. The logic is flawless. I have one minor "clean code" observation and one optimization tip for your future career.

**Optimization: The "Class" Structure**
Right now, you are managing `W1`, `b1`, `W2`, `b2`, etc., manually. As networks get deeper, this becomes a nightmare. This is why libraries like PyTorch use **Classes**.

*(No need to code this now, just read it to see where we are heading in Week 3)*:

```python
class DenseLayer:
    def __init__(self, n_inputs, n_neurons):
        self.W = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.b = np.zeros((1, n_neurons))

    def forward(self, inputs):
        return np.dot(inputs, self.W) + self.b
```

-----

### üèÅ Week 1 Debrief & Next Step

You have successfully:

1.  Understood the Dot Product & Broadcasting.
2.  Implemented a Deep Neural Network forward pass from scratch.
3.  Debugged dimension mismatches (the \#1 error in Deep Learning).