# Understanding Cross-Entropy Derivatives in Binary Classification
This notebook shows how the scalar loss value is used in gradient descent by computing the derivative of the binary cross-entropy loss with respect to weights, using sigmoid activation.

## Step 1: Setup Inputs, Weights, and Target

In [None]:
import numpy as np

# Inputs
x1, x2 = 2.0, 3.0
x = np.array([x1, x2])

# Initial weights and bias
w = np.array([0.5, -0.4])
b = 0.0

# Target label
y = 1


## Step 2: Forward Pass (Logit, Sigmoid Output, Loss)

In [None]:
# Sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Forward pass
z = np.dot(w, x) + b
y_hat = sigmoid(z)

# Binary cross-entropy loss
loss = - (y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))

print(f"z (logit): {z}")
print(f"ŷ (sigmoid output): {y_hat}")
print(f"Loss (cross-entropy): {loss}")


## Step 3: Backpropagation — Compute Gradient of Loss w.r.t. Each Weight

In [None]:
# Derivative of loss with respect to y_hat
dL_dyhat = - (y / y_hat) + ((1 - y) / (1 - y_hat))

# Derivative of y_hat with respect to z
dyhat_dz = y_hat * (1 - y_hat)

# Derivative of z with respect to each weight w_i is x_i
dz_dw = x  # because z = w1*x1 + w2*x2

# Chain rule: dL/dw = dL/dyhat * dyhat/dz * dz/dw
dL_dz = dL_dyhat * dyhat_dz
grad_w = dL_dz * dz_dw  # gradient vector
grad_b = dL_dz          # gradient for bias

print(f"dL/dz: {dL_dz}")
print(f"Gradient w.r.t. weights (dL/dw): {grad_w}")
print(f"Gradient w.r.t. bias (dL/db): {grad_b}")


## Step 4: Update Weights with Gradient Descent

In [None]:
# Learning rate
lr = 0.1

# Gradient descent update
w_new = w - lr * grad_w
b_new = b - lr * grad_b

print(f"Updated weights: {w_new}")
print(f"Updated bias: {b_new}")


## ✅ Summary
- Even though the loss is a scalar, its derivative w.r.t. weights tells us how to change weights to reduce it.
- We used chain rule to connect loss → prediction → weights.
- This is the foundation of how neural networks learn through backpropagation.