# CSE473s Project

## Part 1 (XOR)
Implement our custom library, test it with [XOR] problem, validate it, and also implement the same exact problem using [TesnorFlow] or [Keras] then compare the results.

### Section-1 (Gradient Checking)

#### Data

In [87]:
import numpy as np

X = np.array([[0,0],[0,1],[1,0],[1,1]], dtype=np.float32)
y_true = np.array([[0],[1],[1],[0]], dtype=np.float32)

#### Build netwrok with the custom library

In [88]:
import sys, os
sys.path.insert(0, os.path.abspath('..'))
from lib import Sequential, Dense, Tanh, Sigmoid, MSELoss, SGDOptimizer as SGD

model = Sequential([
    Dense(2, 4, 1.0), Tanh(),
    Dense(4, 1, 1.0), Sigmoid()
])

opt = SGD(learning_rate=1.0)
loss_fn = MSELoss()

#### Training loop

In [89]:
n_epochs = 10000

model.fit(X, y_true, loss_fn, opt, epochs=n_epochs)

Epoch 0, Loss: 0.2469483054359347
Epoch 100, Loss: 0.020644006458808983
Epoch 200, Loss: 0.006000519651884105
Epoch 300, Loss: 0.003312250491557
Epoch 400, Loss: 0.0022506851995748233
Epoch 500, Loss: 0.0016924667931390344
Epoch 600, Loss: 0.0013510226477977515
Epoch 700, Loss: 0.0011216587472929263
Epoch 800, Loss: 0.0009574344056137714
Epoch 900, Loss: 0.0008342844070182186
Epoch 1000, Loss: 0.0007386406327711276
Epoch 1100, Loss: 0.0006622893073454459
Epoch 1200, Loss: 0.0005999752704485083
Epoch 1300, Loss: 0.0005481840802363755
Epoch 1400, Loss: 0.0005044786339616938
Epoch 1500, Loss: 0.00046711700835226557
Epoch 1600, Loss: 0.0004348218417794829
Epoch 1700, Loss: 0.00040663561088157636
Epoch 1800, Loss: 0.0003818267200022188
Epoch 1900, Loss: 0.0003598267736524325
Epoch 2000, Loss: 0.00034018760471100787
Epoch 2100, Loss: 0.000322551172906021
Epoch 2200, Loss: 0.00030662805726915435
Epoch 2300, Loss: 0.00029218181451757933
Epoch 2400, Loss: 0.00027901742087783394
Epoch 2500, Loss

#### Assert correctness

In [90]:
y_pred = model.forward(X)

print("Raw preds:\n", y_pred)
print("Rounded:\n", np.round(y_pred))

Raw preds:
 [[0.00321298]
 [0.99170442]
 [0.99166056]
 [0.00996279]]
Rounded:
 [[0.]
 [1.]
 [1.]
 [0.]]


#### Gradient-check block

In [91]:
def grad_check(model: Sequential, X, y_true, eps=1e-5):
    loss_fn = MSELoss()
    numeric = []
    analytic = model.layers[0].dW.copy()  # shape of first W

    # numeric
    for idx in np.ndindex(analytic.shape):
        model.layers[0].W[idx] += eps
        plus = loss_fn(y_true, model.forward(X))
        model.layers[0].W[idx] -= 2*eps
        minus = loss_fn(y_true, model.forward(X))
        model.layers[0].W[idx] += eps
        
        numeric.append((plus - minus)/(2*eps))
    
    numeric = np.array(numeric).reshape(analytic.shape)

    # analytic (already stored by model.train_step)
    model.train_step(X, y_true, loss_fn, opt)  # refreshes dW
    analytic = model.layers[0].dW

    return np.abs(analytic - numeric).max()

print('Max diff:', grad_check(model, X, y_true))  # <1e-7 â†’ pass

Max diff: 3.445695470486246e-14


In [92]:
# Forward
y_pred = model.forward(X)
print("Initial predictions:\n", y_pred)
loss = loss_fn(y, y_pred)
print("Initial loss:", loss)

# Backprop once
grad = loss_fn.backward()
model.backward(grad)

# Show gradients stored in Dense layers
for i, layer in enumerate(model.layers):
    if layer.trainable:
        print(f"\nLayer {i} params shapes:", [p.shape for p in [layer.W, layer.b]])
        print("Layer grads shapes:", [g.shape for g in [layer.dW, layer.db]])
        print("Gradients dW:\n", layer.dW[:2,:2])
        print("Gradients db:\n", layer.db[:,:])
    else:
        print(f"\nLayer {i} not trainable")

# Copy params
old_params = [p.copy() for layer in model.layers if layer.trainable for p in [layer.W, layer.b]]

# Step optimizer
opt.step(model.layers)

# Compare params changed
new_params = [p for layer in model.layers if layer.trainable for p in [layer.W, layer.b]]
for old, new in zip(old_params, new_params):
    print("Param change norm:", np.linalg.norm(new - old))


Initial predictions:
 [[0.00321279]
 [0.99170484]
 [0.99166099]
 [0.00996228]]
Initial loss: 6.197942939023619e-05

Layer 0 params shapes: [(2, 4), (1, 4)]
Layer grads shapes: [(2, 4), (1, 4)]
Gradients dW:
 [[-1.27764440e-05 -1.75835763e-05]
 [ 1.85376577e-05  1.13883587e-05]]
Gradients db:
 [[ 3.30822820e-06 -2.33260785e-06  2.19968000e-06  1.27801622e-07]]

Layer 1 not trainable

Layer 2 params shapes: [(4, 1), (1, 1)]
Layer grads shapes: [(4, 1), (1, 1)]
Gradients dW:
 [[-4.77299017e-05]
 [ 4.77073961e-05]]
Gradients db:
 [[-1.43254514e-05]]

Layer 3 not trainable
Param change norm: 3.324000912889045e-05
Param change norm: 4.608726352107381e-06
Param change norm: 7.125137681653467e-05
Param change norm: 1.4325451397789735e-05
