# Neural Networks Lesson 1A: XOR Neural Network

## Building Your First Neural Network from Scratch

**Learning Objectives:**
- Understand why XOR is a classic neural network problem
- Build a simple 2-layer neural network from scratch
- Visualize forward propagation
- Train the network and watch weights evolve
- See how a trained network solves XOR

**Duration:** ~45 minutes

---

## Part 1: Why XOR?

The XOR (exclusive OR) problem is historically important because it proved that single-layer perceptrons cannot solve it. This led to the development of multi-layer neural networks!

**XOR Truth Table:**

| Input A | Input B | Output |
|---------|---------|--------|
| 0       | 0       | 0      |
| 0       | 1       | 1      |
| 1       | 0       | 1      |
| 1       | 1       | 0      |

**The Challenge:** These points are not linearly separable! You cannot draw a single straight line to separate the 0s from the 1s.

**The Solution:** A neural network with at least one hidden layer can learn this non-linear pattern.

In [None]:
# Setup: Install and import required libraries
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

# Set random seed for reproducibility
np.random.seed(42)

print("‚úÖ Libraries imported successfully!")
print(f"NumPy version: {np.__version__}")

## Part 2: Network Architecture

We'll build a simple 2-layer network:

```
Input Layer (2 neurons)  ‚Üí  Hidden Layer (2 neurons)  ‚Üí  Output Layer (1 neuron)
    [x1, x2]           ‚Üí      [h1, h2]              ‚Üí         [y]
```

**Layer Connections:**
- Input ‚Üí Hidden: 2√ó2 = 4 weights + 2 biases
- Hidden ‚Üí Output: 2√ó1 = 2 weights + 1 bias
- **Total parameters:** 9 (4+2+2+1)

**Activation Function:** We'll use sigmoid: œÉ(x) = 1 / (1 + e^(-x))

In [None]:
# Define activation functions
def sigmoid(x):
    """Sigmoid activation function"""
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    """Derivative of sigmoid for backpropagation"""
    return x * (1 - x)

# Visualize the sigmoid function
x = np.linspace(-6, 6, 100)
y = sigmoid(x)

plt.figure(figsize=(10, 4))
plt.plot(x, y, 'b-', linewidth=2, label='Sigmoid(x)')
plt.grid(True, alpha=0.3)
plt.xlabel('Input (x)', fontsize=12)
plt.ylabel('Output', fontsize=12)
plt.title('Sigmoid Activation Function: œÉ(x) = 1/(1+e^-x)', fontsize=14, fontweight='bold')
plt.axhline(y=0.5, color='r', linestyle='--', alpha=0.5, label='Middle (0.5)')
plt.axvline(x=0, color='r', linestyle='--', alpha=0.5)
plt.legend()
plt.tight_layout()
plt.show()

print("‚úÖ Activation function defined")
print("\nüîç Key Properties:")
print(f"  ‚Ä¢ Output range: (0, 1)")
print(f"  ‚Ä¢ Sigmoid(0) = {sigmoid(0):.3f}")
print(f"  ‚Ä¢ Sigmoid(-5) ‚âà {sigmoid(-5):.3f} (almost 0)")
print(f"  ‚Ä¢ Sigmoid(5) ‚âà {sigmoid(5):.3f} (almost 1)")

## Part 3: Initialize the Neural Network

Now let's create our network with random initial weights and biases.

In [None]:
class SimpleNeuralNetwork:
    def __init__(self, input_size=2, hidden_size=2, output_size=1):
        """Initialize a simple 2-layer neural network"""
        # Weights and biases for input ‚Üí hidden layer
        self.weights_input_hidden = np.random.randn(input_size, hidden_size)
        self.bias_hidden = np.random.randn(1, hidden_size)
        
        # Weights and biases for hidden ‚Üí output layer
        self.weights_hidden_output = np.random.randn(hidden_size, output_size)
        self.bias_output = np.random.randn(1, output_size)
        
        # Store training history
        self.loss_history = []
        
    def forward(self, X):
        """Forward propagation through the network"""
        # Input ‚Üí Hidden layer
        self.hidden_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = sigmoid(self.hidden_input)
        
        # Hidden ‚Üí Output layer
        self.output_input = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.output = sigmoid(self.output_input)
        
        return self.output
    
    def backward(self, X, y, learning_rate=0.5):
        """Backpropagation to update weights"""
        # Calculate output layer error
        output_error = y - self.output
        output_delta = output_error * sigmoid_derivative(self.output)
        
        # Calculate hidden layer error
        hidden_error = output_delta.dot(self.weights_hidden_output.T)
        hidden_delta = hidden_error * sigmoid_derivative(self.hidden_output)
        
        # Update weights and biases
        self.weights_hidden_output += self.hidden_output.T.dot(output_delta) * learning_rate
        self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
        self.weights_input_hidden += X.T.dot(hidden_delta) * learning_rate
        self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate
    
    def train(self, X, y, epochs=10000, print_every=1000):
        """Train the network"""
        for epoch in range(epochs):
            # Forward pass
            output = self.forward(X)
            
            # Calculate loss (Mean Squared Error)
            loss = np.mean((y - output) ** 2)
            self.loss_history.append(loss)
            
            # Backward pass
            self.backward(X, y)
            
            # Print progress
            if epoch % print_every == 0:
                print(f"Epoch {epoch:5d} | Loss: {loss:.6f}")
        
        print(f"\n‚úÖ Training completed! Final loss: {loss:.6f}")

# Create the network
nn = SimpleNeuralNetwork(input_size=2, hidden_size=2, output_size=1)

print("üß† Neural Network Initialized")
print(f"\nInitial Weights (Input ‚Üí Hidden):")
print(nn.weights_input_hidden)
print(f"\nInitial Weights (Hidden ‚Üí Output):")
print(nn.weights_hidden_output)

## Part 4: Prepare Training Data

Let's prepare our XOR training dataset:

In [None]:
# XOR training data
X_train = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])

y_train = np.array([
    [0],
    [1],
    [1],
    [0]
])

print("üìä XOR Training Data:")
print("="*40)
for i in range(len(X_train)):
    print(f"Input: {X_train[i]} ‚Üí Target Output: {y_train[i][0]}")
print("="*40)

# Visualize the XOR problem
plt.figure(figsize=(8, 6))
colors = ['red' if y == 0 else 'blue' for y in y_train]
plt.scatter(X_train[:, 0], X_train[:, 1], c=colors, s=200, alpha=0.6, edgecolors='black', linewidth=2)

for i, (x, y) in enumerate(X_train):
    plt.annotate(f'({x},{y})‚Üí{y_train[i][0]}', 
                xy=(x, y), 
                xytext=(10, 10), 
                textcoords='offset points',
                fontsize=12,
                bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.xlabel('Input A', fontsize=14)
plt.ylabel('Input B', fontsize=14)
plt.title('XOR Problem Visualization\nüî¥ Red = Output 0 | üîµ Blue = Output 1', fontsize=16, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.xlim(-0.5, 1.5)
plt.ylim(-0.5, 1.5)
plt.tight_layout()
plt.show()

print("\n‚ùì Challenge: Can you draw a single straight line to separate red from blue?")
print("   (Hint: No! That's why we need a neural network!)")

## Part 5: Train the Network! üöÄ

Now let's train our neural network to solve XOR. Watch the loss decrease as the network learns!

In [None]:
# Train the network
print("üèãÔ∏è Training the neural network...\n")
nn.train(X_train, y_train, epochs=10000, print_every=2000)

# Plot training loss
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(nn.loss_history, 'b-', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training Loss Over Time', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(nn.loss_history, 'b-', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training Loss (Log Scale)', fontsize=14, fontweight='bold')
plt.yscale('log')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìâ Loss decreased from {:.6f} to {:.6f}".format(
    nn.loss_history[0], nn.loss_history[-1]))

## Part 6: Test the Trained Network

Let's see how well our network learned XOR!

In [None]:
# Test the network
predictions = nn.forward(X_train)

print("üéØ Neural Network Predictions:")
print("="*60)
print(f"{'Input A':<10} {'Input B':<10} {'Target':<10} {'Prediction':<15} {'Correct?'}")
print("="*60)

for i in range(len(X_train)):
    input_a, input_b = X_train[i]
    target = y_train[i][0]
    prediction = predictions[i][0]
    rounded_pred = round(prediction)
    is_correct = "‚úÖ" if rounded_pred == target else "‚ùå"
    
    print(f"{input_a:<10} {input_b:<10} {target:<10} {prediction:<15.4f} {is_correct}")

print("="*60)

# Calculate accuracy
rounded_predictions = np.round(predictions)
accuracy = np.mean(rounded_predictions == y_train) * 100
print(f"\nüéâ Accuracy: {accuracy:.1f}%")

if accuracy == 100:
    print("\nüèÜ Perfect! The network successfully learned XOR!")
else:
    print("\n‚ö†Ô∏è  The network needs more training or architecture adjustment.")

## Part 7: Visualize the Trained Network

Let's visualize what the network learned by looking at the final weights:

In [None]:
print("üß† Trained Network Architecture:")
print("\n" + "="*60)
print("LAYER 1: Input ‚Üí Hidden")
print("="*60)
print("\nWeights:")
print(nn.weights_input_hidden)
print("\nBiases:")
print(nn.bias_hidden)

print("\n" + "="*60)
print("LAYER 2: Hidden ‚Üí Output")
print("="*60)
print("\nWeights:")
print(nn.weights_hidden_output)
print("\nBias:")
print(nn.bias_output)

# Visualize the decision boundary
def plot_decision_boundary(nn, X, y):
    """Plot the decision boundary learned by the network"""
    # Create a mesh grid
    x_min, x_max = -0.5, 1.5
    y_min, y_max = -0.5, 1.5
    h = 0.01
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                        np.arange(y_min, y_max, h))
    
    # Make predictions for each point in the mesh
    Z = nn.forward(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.figure(figsize=(10, 8))
    plt.contourf(xx, yy, Z, levels=20, cmap='RdBu', alpha=0.6)
    plt.colorbar(label='Network Output')
    
    # Plot training points
    colors = ['red' if label == 0 else 'blue' for label in y.flatten()]
    plt.scatter(X[:, 0], X[:, 1], c=colors, s=200, edgecolors='black', linewidth=2, zorder=5)
    
    # Add labels
    for i, (x, y_coord) in enumerate(X):
        plt.annotate(f'{y[i][0]}', 
                    xy=(x, y_coord), 
                    ha='center', 
                    va='center',
                    fontsize=14,
                    fontweight='bold',
                    color='white')
    
    plt.xlabel('Input A', fontsize=14)
    plt.ylabel('Input B', fontsize=14)
    plt.title('Decision Boundary Learned by Neural Network\nüî¥ Red regions ‚Üí 0 | üîµ Blue regions ‚Üí 1', 
             fontsize=16, fontweight='bold')
    plt.grid(True, alpha=0.3)
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.tight_layout()
    plt.show()

plot_decision_boundary(nn, X_train, y_train)

print("\n‚ú® The network created a non-linear decision boundary!")
print("   This is what makes neural networks powerful.")

## Part 8: Understanding What Happened

### Key Insights:

1. **Non-linearity is crucial:** The sigmoid activation function allows the network to create curved decision boundaries.

2. **Hidden layers enable complexity:** The 2 hidden neurons learned to represent the problem in a way that makes it linearly separable in their space.

3. **Learning through gradients:** Backpropagation adjusted weights to minimize the error between predictions and targets.

4. **Small networks can solve XOR:** We only needed 9 parameters (weights + biases) to solve this problem!

### What's Next?

In **Lesson 1B**, we'll scale up to recognize handwritten digits using the MNIST dataset. You'll see how these same principles apply to real-world image classification!

## üéì Exercises (Optional)

Try modifying the code to explore these questions:

1. **Change the hidden layer size:** What happens with 3 or 4 hidden neurons? Does it train faster?

2. **Adjust the learning rate:** Try values like 0.1, 1.0, or 2.0. What happens to training?

3. **Different activation functions:** Can you implement ReLU instead of sigmoid?

4. **Visualize intermediate steps:** Print the hidden layer activations for each input. What patterns do you see?

---

**Congratulations!** üéâ You've built and trained your first neural network from scratch!

**Next:** Head to **Lesson 1B** to tackle handwritten digit recognition with MNIST.