# CSE473s Project

## Part 1 (XOR)
Implement our custom library, test it with [XOR] problem, validate it, and also implement the same exact problem using [TesnorFlow] or [Keras] then compare the results.

### Section-1 (Gradient Checking)

#### Data

In [40]:
import numpy as np

X = np.array([[0,0],[0,1],[1,0],[1,1]], dtype=np.float32)
y_true = np.array([[0],[1],[1],[0]], dtype=np.float32)

#### Build netwrok with the custom library

In [41]:
import sys, os
sys.path.insert(0, os.path.abspath('..'))
from lib import Sequential, Dense, Tanh, Sigmoid, MSELoss, SGDOptimizer as SGD

model = Sequential([
    Dense(2, 4, 1.0), Tanh(),
    Dense(4, 1, 1.0), Sigmoid()
])

opt = SGD(learning_rate=1.0)
loss_fn = MSELoss()

#### Training loop

In [69]:
n_epochs = 10000

model.fit(X, y_true, loss_fn, opt, epochs=n_epochs)

Epoch 0, Loss: 0.12433801468652511
Epoch 100, Loss: 0.010525351475778776
Epoch 200, Loss: 0.004504717570086526
Epoch 300, Loss: 0.002771605387911751
Epoch 400, Loss: 0.0019767820050343235
Epoch 500, Loss: 0.0015266936821711889
Epoch 600, Loss: 0.0012390314484556457
Epoch 700, Loss: 0.0010401382532087796
Epoch 800, Loss: 0.0008948132132660921
Epoch 900, Loss: 0.0007841943890321982
Epoch 1000, Loss: 0.0006972967921973738
Epoch 1100, Loss: 0.000627303476862028
Epoch 1200, Loss: 0.0005697668834102934
Epoch 1300, Loss: 0.0005216653400751481
Epoch 1400, Loss: 0.0004808760505330671
Epoch 1500, Loss: 0.0004458651615112134
Epoch 1600, Loss: 0.00041549748069386245
Epoch 1700, Loss: 0.00038891531028111817
Epoch 1800, Loss: 0.0003654588322447541
Epoch 1900, Loss: 0.00034461234984691516
Epoch 2000, Loss: 0.000325967109300304
Epoch 2100, Loss: 0.00030919503803854915
Epoch 2200, Loss: 0.0002940298410829069
Epoch 2300, Loss: 0.00028025316186416893
Epoch 2400, Loss: 0.00026768429500523597
Epoch 2500, L

#### Assert correctness

In [39]:
y_pred = model.forward(X)

print("Raw preds:\n", y_pred)
print("Rounded:\n", np.round(y_pred))

Raw preds:
 [[0.49855366]
 [0.53412775]
 [0.55094336]
 [0.61886783]]
Rounded:
 [[0.]
 [1.]
 [1.]
 [1.]]


#### Gradient-check block

In [None]:
def grad_check(model: Sequential, X, y_true, eps=1e-5):
    loss_fn = MSELoss()
    numeric = []
    analytic = model.layers[0].dW.copy()  # shape of first W

    # numeric
    for idx in np.ndindex(analytic.shape):
        model.layers[0].W[idx] += eps
        plus = loss_fn(y_true, model.forward(X))
        model.layers[0].W[idx] -= 2*eps
        minus = loss_fn(y_true, model.forward(X))
        model.layers[0].W[idx] += eps
        
        numeric.append((plus - minus)/(2*eps))
    
    numeric = np.array(numeric).reshape(analytic.shape)

    # analytic (already stored by model.train_step)
    model.train_step(X, y_true, loss_fn, opt)  # refreshes dW
    analytic = model.layers[0].dW

    return np.abs(analytic - numeric).max()

print('Max diff:', grad_check(model, X, y_true))  # <1e-7 â†’ pass

Max diff: 1.831335037501096e-14


In [74]:
# Forward
y_pred = model.forward(X)
print("Initial predictions:\n", y_pred)
loss = loss_fn(y, y_pred)
print("Initial loss:", loss)

# Backprop once
grad = loss_fn.backward()
model.backward(grad)

# Show gradients stored in Dense layers
for i, layer in enumerate(model.layers):
    if layer.trainable:
        print(f"\nLayer {i} params shapes:", [p.shape for p in [layer.W, layer.b]])
        print("Layer grads shapes:", [g.shape for g in [layer.dW, layer.db]])
        print("Gradients dW:\n", layer.dW[:2,:2])
        print("Gradients db:\n", layer.db[:,:])
    else:
        print(f"\nLayer {i} not trainable")

# Copy params
old_params = [p.copy() for layer in model.layers if layer.trainable for p in [layer.W, layer.b]]

# Step optimizer
opt.step(model.layers)

# Compare params changed
new_params = [p for layer in model.layers if layer.trainable for p in [layer.W, layer.b]]
for old, new in zip(old_params, new_params):
    print("Param change norm:", np.linalg.norm(new - old))


Initial predictions:
 [[0.00338321]
 [0.99211866]
 [0.99116568]
 [0.00919011]]
Initial loss: 5.901622527026913e-05

Layer 0 params shapes: [(2, 4), (1, 4)]
Layer grads shapes: [(2, 4), (1, 4)]
Gradients dW:
 [[ 9.26883038e-06 -1.49909905e-05]
 [ 9.32883393e-06  1.00438532e-05]]
Gradients db:
 [[-2.50199832e-06 -2.47573612e-06  3.19412689e-06 -3.19480289e-06]]

Layer 1 not trainable

Layer 2 params shapes: [(4, 1), (1, 1)]
Layer grads shapes: [(4, 1), (1, 1)]
Gradients dW:
 [[2.65686885e-05]
 [3.29954470e-05]]
Gradients db:
 [[-2.19461969e-05]]

Layer 3 not trainable
Param change norm: 3.4864928962574566e-05
Param change norm: 5.726995463300427e-06
Param change norm: 6.658812622837672e-05
Param change norm: 2.1946196918420924e-05
