
#### 1. Linear Function 

In [8]:
import numpy as np
# Weight matrix (2 outputs, 2 inputs)
W = np.array([
    [2.0,1.0],
    [0.0, 3.0]
])

# Input vector
x = np.array([1.0,2.0])
# Forward pass
y = W@x
print("Input x:", x)
print("Input y = W.x:", y)

Input x: [1. 2.]
Input y = W.x: [4. 6.]


#### Interpretation
* Each output is a weighted sum of inputs
* No nonlinearity, no interaction terms
* Pure linear dependence

### 2. Make dependencies explicit

In [10]:
y1 = W[0,0]*x[0] + W[0,1]*x[1]
y2 = W[1,0]*x[0] + W[1,1]*x[1]

print("y1 depends on as:",y1)
print("y2 depends on as:",y2)

y1 depends on as: 4.0
y2 depends on as: 6.0


#### Key idea
* Every input contributes to multiple outputs
* During backprop, each input must collect blame from all outputs it influenced

### 3. Introduce a loss signal from above
In ML, gradients usually come from a loss.

In [11]:
dL_dy = np.array([1.0,-2.0])
print("Gradient coming form loss:", dL_dy)

Gradient coming form loss: [ 1. -2.]


#### This means:
* First output wants to increase
* Second output wants to decrease

### 4. Backpropagate to the input

In [12]:
dL_dx = W.T@dL_dy
print("Gradient wrt input x:", dL_dx)

Gradient wrt input x: [ 2. -5.]


#### Why the transpose appears
* Each input must sum contributions from all outputs
* Columns of W represent how one input affects all outputs
* Transpose turns columns into rows so dimensions align

This is bookkeeping, not magic.
### 5. See it input-by-input (core intuition)

In [13]:
for i in range(len(x)):
    contribution = dL_dy@W[:,i]
    print(f"Total influence on x[{i}]:", contribution)

Total influence on x[0]: 2.0
Total influence on x[1]: -5.0


#### Each input:
* Looks at all outputs it affected
* Collects weighted responsibility
* That collection is exactly Wᵀ · dL/dy

### 6. Numerical sanity check (no calculus)
Change x slightly and observe the loss.

In [19]:
epsilon  = 1e-6
# Fake loss: L = y.dL_dy
L = y@dL_dy
x_perturbed = x.copy()
x_perturbed[0] += epsilon 

y_perturbed = W@x_perturbed
L_perturbed = y_perturbed@dL_dy

numerical_grad = (L_perturbed - L)/ epsilon

print("Numerical gradient wrt x[0]:",numerical_grad)
print("Backprop gradient wrt x[0]:",dL_dx[0])

# They match.
# That’s the confirmation.

Numerical gradient wrt x[0]: 2.000000000279556
Backprop gradient wrt x[0]: 2.0


### 7. ML takeaway
* Linear layers distribute influence forward via W
* Backprop gathers influence backward via Wᵀ
* Transpose simply reverses the direction of responsibility
* This is the backbone of all neural network training

#### Practice suggestions
* Change W to a non-square matrix
* Increase input dimension to 3 or 4
* Chain two linear layers and backprop through both
* Print shapes at every step until it feels obvious