# 🧠 2-Layer Neural Network from Scratch (NumPy)
This notebook implements a minimal neural network with one hidden layer using only NumPy. It demonstrates how forward propagation, error calculation, and backpropagation work at a low level.

In [1]:
import numpy as np  # Linear algebra library

## Step 1. Define the Sigmoid Activation Function
The sigmoid function maps values into (0,1), which makes it suitable for output probabilities.
We also define its derivative, needed for backpropagation.

In [2]:
def sigmoid(x, deriv=False):
    if deriv:
        return x * (1 - x)  # derivative of sigmoid output
    return 1 / (1 + np.exp(-x))

## Step 2. Prepare the Training Data
Each row represents a training example, and each column is an input feature. The output `y` is a single column vector.

In [3]:
X = np.array([
    [0, 0, 1],
    [0, 1, 1],
    [1, 0, 1],
    [1, 1, 1]
])

y = np.array([[0, 0, 1, 1]]).T

## Step 3. Initialize Weights
We randomly initialize weights (syn0) with a mean close to zero. We also set a random seed for reproducibility.

In [4]:
np.random.seed(1)
syn0 = 2 * np.random.random((3, 1)) - 1  # (3 inputs × 1 output)

## Step 4. Train the Network
We use a simple loop for 10,000 iterations. Each iteration performs:
- Forward propagation
- Error computation
- Backpropagation (weight update)


In [5]:
for iter in range(10000):
    # Forward propagation
    l0 = X
    l1 = sigmoid(np.dot(l0, syn0))

    # Compute error
    l1_error = y - l1

    # Backpropagation: apply sigmoid derivative to adjust confidence
    l1_delta = l1_error * sigmoid(l1, deriv=True)

    # Update weights
    syn0 += np.dot(l0.T, l1_delta)

## Step 5. Display Results

In [10]:
print("Output After Training:")
print(l1)
print("Rounded Output After Training:")
print(l1.round())
print("\nLearned Weights (syn0):")
print(syn0)

Output After Training:
[[0.00966449]
 [0.00786506]
 [0.99358898]
 [0.99211957]]
Rounded Output After Training:
[[0.]
 [0.]
 [1.]
 [1.]]

Learned Weights (syn0):
[[ 9.67299303]
 [-0.2078435 ]
 [-4.62963669]]


### 🧩 Takeaways
- This is the simplest possible neural network (1 hidden layer, no bias, no hidden neurons).
- It learns correlations between input features and outputs by adjusting weights.
- The sigmoid derivative helps scale updates — confident predictions are adjusted less.
- The key learning signal is the dot product of input and error gradient.
- Reflecting on raining data:
    - When both an input and a output are 1, we increase the weight between them. 
    - When an input is 1 and an output is 0, we decrease the weight between them.
| Input 1 | Input 2 | Input 3 | Output |
|:--------|:--------|:--------|:-------|
| 0 | 0 | 1 | 0 |
| 1 | 1 | 1 | 1 |
| 1 | 0 | 1 | 1 |
| 0 | 1 | 1 | 0 |

    - Thus, in our four training examples below, the weight from the first input to the output would consistently 
increment or remain unchanged, 
    - Whereas the other two weights would find themselves both increasing and decreasing across training examples. 
    - This phenomenon is what causes our network to learn based on correlations between the input and output.
