<a href="https://colab.research.google.com/github/CyberMonk999/Singe-Neuron-Learning-Life-cycle/blob/main/neural_net_cycle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**The Single-Cycle Learning Demonstration in Neural nets**


This project provides a step-by-step, computational demonstration of the single learning cycle within a minimal neural network. The primary goal is to demystify the core mechanism of Deep Learning by showing how a network transitions from an initial, random guess to a measurably improved state. The methodology involves a complete trace of a single forward pass, the calculation of Mean Squared Error (MSE) Loss, and the subsequent corrective step using Backpropagation and Gradient Descent.

1. We start by importing the libraries and functions.

 Numpy library provides efficient multi dimensional array objects and matrix multiplication.

 Sigmoid function is a non linear squishing function ythat allows the network to learn patterns by keeping the value of output between 0 and 1.

In [1]:
import numpy as np

#The sigmoid activation function maps any value to a range between 0 and 1

def sigmoid(z):
  return 1 / (1 + np.exp(-z))

2. Now we set up the initial parametrs for a single learning cycle

In [7]:
#input vector X which are 3 features of one sample

X = np.array([[1.0, 0.5, 0.2]])

#Initial Weights (W_old), here we choose a 3x2 matrix of random weights.

# Row 1-3 corresponds to X's features; Column 1-2 corresponds to output neurons.
W_initial = np.array([[0.1, 0.5],
                  [0.9, 0.2],
                  [-0.3, 0.7]])

#Target Output (Y) shows What the network should produce.Here we want neuron 1 to be not activated but neuron 2 activated.

# Neuron 1=0.0 (silent), Neuron 2=1.0 (firing).

Y = np.array([[0.0, 1.0]])

print("Initial Weights (W_intial) is: \n \n", W_initial)

Initial Weights (W_intial) is: 
 
 [[ 0.1  0.5]
 [ 0.9  0.2]
 [-0.3  0.7]]


3. After initializing the input, target and initial weights, the next step is where the input data travels through the network. This process is called forward pass and is the Linear combination of Input and Weights (Z = X @ W)

In [8]:
# Pre-Activation (Z) is the Linear combination of Input and Weights (Z = X @ W)
# This is the sum of weighted inputs before the Sigmoid squashing

Z = X @ W_initial

print("Pre-Activation (Z) is: \n \n", Z)

Pre-Activation (Z) is: 
 
 [[0.49 0.74]]


Now we calculate the activation which is the networks final guess. The preactivation Z = X @ W_old sums up all the weighted inputs for each neuron.

Preactivation function is the weighted sum of inputs before any activation function is applied while activation function takes this preactivation value as input and introduces non linearity. Activation function is applied on pre activation function to capture meaningful probabilistic realataionships by using mathematical functions like Sigmoid, ReLU etc.

 Preactivation is the input to decision maker and activation function is the decision maker itself turning the raw, unbounded potential (Z) into the network's probability-like output (A), giving us our initial guess about the output in the target.

In [11]:
A = sigmoid(Z)
print("The activation function is : \n", A)

The activation function is : 
 [[0.62010643 0.67699586]]


4. Once activation function took inital guess, the next step is to check how wrong the guess was compared to the desired target. This is decided by calculating th loss.

Error is the raw difference between the target and the guess (Y - A). The loss os calclulated using Mean Squared Error method

Note - np.square(Error): Squaring the error ensures that positive and negative mistakes are treated equally, and it severely penalizes larger errors, which is key for gradient descent.

In [16]:
# --- Loss Calculation (The Mistake Score) ---

# 1. Error: The raw difference between the target and the guess (Y - A)
Error = Y - A

# 2. Loss (Mean Squared Error - MSE): Sum of squared errors, divided by 2 neurons
Loss_initial = np.sum(np.square(Error)) / 2

print("Initial Error:\n", Error)
print("Initial Loss (MSE):\n", Loss_initial)

Initial Error:
 [[-0.62010643  0.32300414]]
Initial Loss (MSE):
 0.24443183216018016


5. As we calculated above the intial loss calulated is 0.2443. Our goal is to adjust the weights so that loss descends and reaches minimal (the least point it can take). Back propogation is the process which enables this.

Backpropagation ($\mathbf{dLoss/dW}$) Calculates the Gradient. The Gradient is a detailed map telling you, for every single weight, which direction (positive or negative) to move it to most quickly reduce the Loss. Think about it ike you are blindfolded on a hill. You stick your hand out and feel the ground to find the steepest downhill slope.

Backpropogation uses Gradient descend and learning rate for finding this least point where loss is minimal.

>>The Gradient is a detailed map telling you, for every single weight, which direction (positive or negative) to move it to most quickly reduce the Loss.

>>Learning rate is the rate at which gradient decsend updates the model weight to find the minimum loss function.

d_sigmoid_dz = A * (1 - A): This calculates the slope of the Sigmoid curve at the neuron's current output, which determines how sensitive the output is to a change in the weights.

Delta = (A - Y) * d_sigmoid_dz: The Delta is the final, adjusted error signal found by multiplying the raw mistake by the slope, giving the neuron its appropriate "blame score."

d_loss_d_w = X.T @ Delta: The Gradient ($\mathbf{dLoss/dW}$) is calculated by distributing the neuron's Delta (blame) back to all connecting weights, using the strength of the input ($\mathbf{X}$) for each weight as the multiplier.


In [18]:
# --- STEP 1: CALCULATE ERROR SIGNAL (DELTA) ---

# 1a. Derivative of Sigmoid (derviative of A wrt Z): A * (1-A)
# This calculates the slope of the activation function at the output point.
d_sigmoid_dz = A * (1 - A)

# 1b. The overall error signal (Delta):
# Delta = (Output - Target) * Slope
# This determines how much responsibility each output neuron has for the loss.
Delta = (A - Y) * d_sigmoid_dz

# Note: We use (A - Y) because the conventional derivative of MSE is (A - Y).

print("Delta (Error Signal):\n", Delta)


# --- STEP 2: CALCULATE THE GRADIENT (dLoss/dW) ---

# The Chain Rule says: dLoss/dW = Input.T @ Delta
# This maps the responsibility (Delta) back across the input lines (X) to every weight.
d_loss_d_w = X.T @ Delta

# Verification: The calculated value should match the constant we were using.
print("Calculated Gradient (dLoss/dW):\n", d_loss_d_w)


# --- STEP 3: APPLY GRADIENT DESCENT ---

# Define a SAFE Learning Rate (LR)
LEARNING_RATE = 0.001

# 1. Calculate Adjustment: This is the step size we take in the direction of the gradient.
# Adjustment = LR * Gradient
Adjustment = LEARNING_RATE * d_loss_d_w

# 2. Calculate New Weights: W_new = W_initial - Adjustment
W_new = W_initial - Adjustment


print("New Weights (W_new):\n", W_new)

Delta (Error Signal):
 [[ 0.14608123 -0.07063211]]
Calculated Gradient (dLoss/dW):
 [[ 0.14608123 -0.07063211]
 [ 0.07304061 -0.03531606]
 [ 0.02921625 -0.01412642]]
New Weights (W_new):
 [[ 0.09985392  0.50007063]
 [ 0.89992696  0.20003532]
 [-0.30002922  0.70001413]]



Gradient descend finds the specific amount of blame (and the required corrective action) for every single weight in the network, not just the one that is "most to blame."The $\mathbf{dLoss/dW}$ matrix we calculated (a 3x2 matrix) gives an individual adjustment value for all six weights simultaneously.This adjustment is proportional, means if one weight contributed 10 times more to the mistake than another, its value in the $\mathbf{dLoss/dW}$ matrix will be 10 times larger, meaning it gets changed 10 times more.

Sometimes, the Gradient value is negative, and since we subtract the adjustment (1$\mathbf{W}_{\text{new}} = \mathbf{W}_{\text{old}} - \text{Adjustment}$), the weight value increases. The gradient descend process finds the optimal direction (increase or decrease) for each weight based on the calculus.If the weight needs to increase to lower the Loss, the Gradient value will be negativeIf the weight needs to decrease to lower the Loss, the Gradient value will be positive.

6 Once we get the adjusted weight, we do the forward pass again, with the new adjusted weight. The loss is then calculated to compared with old loss.

In [22]:
# 1. New Pre-Activation (Z_new): Use the corrected weights
Z_new = X @ W_new

# 2. New Activation (A_new): The network's new guess
A_new = sigmoid(Z_new)

# 3. New Loss: Calculate the new mistake score
Loss_new = np.sum(np.square(Y - A_new)) / 2

print("\n--- LEARNING CONFIRMATION ---")
print("New Activated Output (A_new):\n", A_new)
print("New Loss:", Loss_new)

# Final check: Did the Loss decrease?
if Loss_new < Loss_initial:
    print("\nSUCCESS! The Loss decreased. Learning confirmed.")
else:
    print("\nFAILURE! The Loss increased or stayed the same. Learning Rate was still too high.")

print(f"Loss Delta: {Loss_initial - Loss_new}")


--- LEARNING CONFIRMATION ---
New Activated Output (A_new):
 [[0.62006204 0.67701578]]
New Loss: 0.24439786890413875

SUCCESS! The Loss decreased. Learning confirmed.
Loss Delta: 3.396325604140826e-05


# Conclusion:

Our project successfully documented the most fundamental unit of intelligence in a neural network, a single cycle of forward pass, loss quantification, and corrective backpropagation. By calculating the unique Gradient ($\mathbf{dLoss/dW}$) and carefully tuning the Learning Rate to $0.001$, we demonstrated that the network can precisely identify and begin to correct its mistake.

The simple $3 \times 2$ neuron network we modeled becomes dramatically different when tackling big, real-world tasks like classifying millions of images or translating languages.

**Specialized Loss Functions:** For classification tasks (like identifying cats/dogs), the Mean Squared Error (MSE) we used is replaced by Cross-Entropy Loss, which is better suited for probability distributions.

**Adaptive Optimizers**: Instead of a fixed Learning Rate that applies to all weights, complex algorithms like Adam or RMSProp are used. These adaptive optimizers automatically calculate a different, customized learning rate for every single weight during every iteration, dramatically speeding up convergence and improving reliability.

**Regularization**: To prevent the network from Overfitting (memorizing the training data instead of generalizing), techniques like Dropout (randomly turning off neurons during training) and weight decay are added to the learning cycle to force the network to be more robust.

Input Layer: A vector of features (1$\mathbf{X}$).

Weights: Parameters (3$\mathbf{W}$) that connect the inputs to the next layer.

Non-linear Activation: The Sigmoid function, which transforms the simple weighted sum into a more complex, squashed output.

Learning Mechanism: It used Backpropagation and Gradient Descent to adjust the weights based on a calculated loss.