# 🧠 Notebook 01: Neural Networks Fundamentals

**Week 3-4: Deep Learning & NLP Foundations**  
**Gen AI Masters Program**

---

## 📋 Objectives

By the end of this notebook, you will master:
1. ✅ Neural network architecture and components
2. ✅ Forward propagation
3. ✅ Backpropagation and gradient descent
4. ✅ Activation functions
5. ✅ Building neural networks with PyTorch
6. ✅ Training and evaluation

**Estimated Time:** 3-4 hours

---

## 📚 What are Neural Networks?

Neural networks are the foundation of deep learning and Gen AI. They're inspired by biological neurons and can learn complex patterns from data.

**Key Components:**
- 🔵 **Neurons**: Basic computational units
- 🔗 **Layers**: Input, Hidden, Output
- ⚡ **Weights & Biases**: Learnable parameters
- 🎯 **Activation Functions**: Add non-linearity
- 📉 **Loss Function**: Measure error
- 🔄 **Optimizer**: Update weights

Let's dive deep! 🚀

In [None]:
# Import libraries
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"✅ Using device: {device}")
print(f"PyTorch version: {torch.__version__}")

## 1️⃣ The Perceptron: Building Block of Neural Networks

### Single Neuron (Perceptron)

A perceptron performs:
```
output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + b)
       = activation(W·X + b)
```

In [None]:
# Implement a simple perceptron from scratch
class Perceptron:
    def __init__(self, input_size):
        # Initialize weights and bias randomly
        self.weights = np.random.randn(input_size)
        self.bias = np.random.randn()
    
    def sigmoid(self, x):
        """Sigmoid activation function"""
        return 1 / (1 + np.exp(-x))
    
    def forward(self, X):
        """Forward pass"""
        # Linear combination
        z = np.dot(X, self.weights) + self.bias
        # Apply activation
        return self.sigmoid(z)
    
    def predict(self, X):
        """Make binary prediction"""
        return (self.forward(X) >= 0.5).astype(int)

# Test perceptron
perceptron = Perceptron(input_size=2)
test_input = np.array([0.5, 0.8])
output = perceptron.forward(test_input)

print("🔵 Single Perceptron Test")
print("="*50)
print(f"Input: {test_input}")
print(f"Weights: {perceptron.weights}")
print(f"Bias: {perceptron.bias:.4f}")
print(f"Output: {output:.4f}")
print(f"Prediction: {perceptron.predict(test_input)}")

## 2️⃣ Activation Functions

Activation functions introduce non-linearity, allowing neural networks to learn complex patterns.

In [None]:
# Common activation functions
x = np.linspace(-5, 5, 100)

# Define activation functions
sigmoid = 1 / (1 + np.exp(-x))
tanh = np.tanh(x)
relu = np.maximum(0, x)
leaky_relu = np.where(x > 0, x, 0.01 * x)

# Plot
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Sigmoid
axes[0, 0].plot(x, sigmoid, 'b-', linewidth=2)
axes[0, 0].set_title('Sigmoid: σ(x) = 1/(1+e⁻ˣ)', fontweight='bold', fontsize=12)
axes[0, 0].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[0, 0].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].set_ylabel('Output', fontweight='bold')
axes[0, 0].text(0.5, 0.9, 'Range: (0, 1)\nUse: Binary classification', 
                transform=axes[0, 0].transAxes, bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Tanh
axes[0, 1].plot(x, tanh, 'g-', linewidth=2)
axes[0, 1].set_title('Tanh: tanh(x)', fontweight='bold', fontsize=12)
axes[0, 1].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[0, 1].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].set_ylabel('Output', fontweight='bold')
axes[0, 1].text(0.5, 0.9, 'Range: (-1, 1)\nUse: Hidden layers', 
                transform=axes[0, 1].transAxes, bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.5))

# ReLU
axes[1, 0].plot(x, relu, 'r-', linewidth=2)
axes[1, 0].set_title('ReLU: max(0, x)', fontweight='bold', fontsize=12)
axes[1, 0].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[1, 0].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].set_xlabel('Input', fontweight='bold')
axes[1, 0].set_ylabel('Output', fontweight='bold')
axes[1, 0].text(0.5, 0.9, 'Range: [0, ∞)\nUse: Most popular!', 
                transform=axes[1, 0].transAxes, bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.5))

# Leaky ReLU
axes[1, 1].plot(x, leaky_relu, 'm-', linewidth=2)
axes[1, 1].set_title('Leaky ReLU: max(0.01x, x)', fontweight='bold', fontsize=12)
axes[1, 1].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[1, 1].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].set_xlabel('Input', fontweight='bold')
axes[1, 1].set_ylabel('Output', fontweight='bold')
axes[1, 1].text(0.5, 0.9, 'Range: (-∞, ∞)\nUse: Fixes dying ReLU', 
                transform=axes[1, 1].transAxes, bbox=dict(boxstyle='round', facecolor='plum', alpha=0.5))

plt.tight_layout()
plt.show()

print("\n📊 Activation Function Properties:")
print("="*60)
print("Sigmoid: Squashes values to (0,1), good for probabilities")
print("Tanh: Squashes values to (-1,1), zero-centered")
print("ReLU: Fast, prevents vanishing gradients, most popular")
print("Leaky ReLU: Fixes dying ReLU problem")

## 3️⃣ Building a Neural Network with PyTorch

### Multi-Layer Perceptron (MLP)

In [None]:
# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        
        # Define layers
        self.fc1 = nn.Linear(input_size, hidden_size)  # Input to hidden
        self.relu = nn.ReLU()                           # Activation
        self.fc2 = nn.Linear(hidden_size, output_size)  # Hidden to output
        self.sigmoid = nn.Sigmoid()                     # Output activation
    
    def forward(self, x):
        """Forward pass through the network"""
        x = self.fc1(x)      # Linear transformation
        x = self.relu(x)     # Non-linear activation
        x = self.fc2(x)      # Linear transformation
        x = self.sigmoid(x)  # Output activation
        return x

# Create model
model = SimpleNN(input_size=2, hidden_size=4, output_size=1)
print("🧠 Neural Network Architecture")
print("="*50)
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params}")

# Test forward pass
test_input = torch.tensor([[0.5, 0.8]], dtype=torch.float32)
output = model(test_input)
print(f"\nTest input: {test_input.numpy()}")
print(f"Test output: {output.item():.4f}")

## 4️⃣ Training a Neural Network

### XOR Problem (Classic Non-Linear Problem)

In [None]:
# XOR dataset (not linearly separable)
X_xor = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
y_xor = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)

print("⚡ XOR Problem")
print("="*50)
print("Input (X) | Output (y)")
print("-"*50)
for i in range(len(X_xor)):
    print(f"  {X_xor[i].numpy()}  |    {int(y_xor[i].item())}")

# Visualize XOR problem
plt.figure(figsize=(8, 6))
colors = ['red' if y.item() == 0 else 'blue' for y in y_xor]
plt.scatter(X_xor[:, 0], X_xor[:, 1], c=colors, s=200, edgecolors='black', linewidths=2)
plt.title('XOR Problem (Not Linearly Separable)', fontweight='bold', fontsize=14)
plt.xlabel('X₁', fontweight='bold')
plt.ylabel('X₂', fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend(['Class 0', 'Class 1'], loc='upper right')
plt.tight_layout()
plt.show()

In [None]:
# Create and train model for XOR
model_xor = SimpleNN(input_size=2, hidden_size=4, output_size=1)

# Define loss and optimizer
criterion = nn.BCELoss()  # Binary Cross Entropy Loss
optimizer = optim.Adam(model_xor.parameters(), lr=0.1)

# Training loop
epochs = 1000
losses = []

print("\n🔄 Training Neural Network on XOR...")
print("="*50)

for epoch in range(epochs):
    # Forward pass
    outputs = model_xor(X_xor)
    loss = criterion(outputs, y_xor)
    
    # Backward pass and optimization
    optimizer.zero_grad()  # Clear gradients
    loss.backward()        # Compute gradients
    optimizer.step()       # Update weights
    
    losses.append(loss.item())
    
    # Print progress
    if (epoch + 1) % 100 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

print("\n✅ Training Complete!")

# Test the trained model
with torch.no_grad():
    predictions = model_xor(X_xor)
    predicted_classes = (predictions >= 0.5).float()

print("\n🎯 Results:")
print("="*50)
print("Input | True | Predicted | Probability")
print("-"*50)
for i in range(len(X_xor)):
    print(f"{X_xor[i].numpy()} | {int(y_xor[i].item())} | {int(predicted_classes[i].item())} | {predictions[i].item():.4f}")

accuracy = (predicted_classes == y_xor).float().mean() * 100
print(f"\nAccuracy: {accuracy:.2f}%")

In [None]:
# Plot training loss
plt.figure(figsize=(10, 6))
plt.plot(losses, linewidth=2)
plt.title('Training Loss Over Time', fontweight='bold', fontsize=14)
plt.xlabel('Epoch', fontweight='bold')
plt.ylabel('Loss (BCE)', fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 5️⃣ Real-World Example: Defect Classification for Manufacturing Copilot

Our goal is to build a neural network that can predict whether a manufactured part is defective based on sensor readings. This is a critical component for our **Manufacturing Copilot**, enabling it to flag quality issues in real-time.

We'll use two sensor readings: `temperature_deviation` and `pressure_deviation` from the optimal manufacturing process. A non-linear relationship exists between these deviations and the likelihood of a defect.

In [None]:
# Generate synthetic manufacturing data
np.random.seed(42)
n_samples = 1000

# Generate non-linear data
X, y = make_circles(n_samples=n_samples, noise=0.1, factor=0.5, random_state=42)

# Add feature names for context
df = pd.DataFrame(X, columns=['temperature_deviation', 'pressure_deviation'])
df['is_defective'] = y

print("🏭 Manufacturing Dataset")
print("="*50)
print(f"Samples: {len(df)}")
print(f"Defective: {df['is_defective'].sum()} ({df['is_defective'].mean():.1%})")
print(f"Non-defective: {(1-df['is_defective']).sum()}")

# Visualize
plt.figure(figsize=(10, 6))
colors = ['green' if label == 0 else 'red' for label in y]
plt.scatter(X[:, 0], X[:, 1], c=colors, alpha=0.6, edgecolors='black', linewidths=0.5)
plt.title('Manufacturing Data: Temperature vs Pressure Deviations', fontweight='bold', fontsize=14)
plt.xlabel('Temperature Deviation', fontweight='bold')
plt.ylabel('Pressure Deviation', fontweight='bold')
plt.legend(['Non-defective', 'Defective'], loc='upper right')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Split and preprocess data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train).unsqueeze(1)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test).unsqueeze(1)

print("✅ Data prepared for training")
print(f"Training samples: {len(X_train_tensor)}")
print(f"Test samples: {len(X_test_tensor)}")

In [None]:
# Build deeper neural network
class ManufacturingNN(nn.Module):
    def __init__(self):
        super(ManufacturingNN, self).__init__()
        self.fc1 = nn.Linear(2, 16)
        self.fc2 = nn.Linear(16, 8)
        self.fc3 = nn.Linear(8, 4)
        self.fc4 = nn.Linear(4, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x

# Create model
model_manufacturing = ManufacturingNN()
print("🧠 Manufacturing Defect Classifier")
print("="*50)
print(model_manufacturing)
print(f"\nTotal parameters: {sum(p.numel() for p in model_manufacturing.parameters())}")

In [None]:
# Train the model
criterion = nn.BCELoss()
optimizer = optim.Adam(model_manufacturing.parameters(), lr=0.01)

epochs = 500
train_losses = []
test_losses = []
train_accuracies = []
test_accuracies = []

print("\n🔄 Training Manufacturing Defect Classifier...")
print("="*50)

for epoch in range(epochs):
    # Training
    model_manufacturing.train()
    optimizer.zero_grad()
    train_outputs = model_manufacturing(X_train_tensor)
    train_loss = criterion(train_outputs, y_train_tensor)
    train_loss.backward()
    optimizer.step()
    
    # Evaluation
    model_manufacturing.eval()
    with torch.no_grad():
        test_outputs = model_manufacturing(X_test_tensor)
        test_loss = criterion(test_outputs, y_test_tensor)
        
        # Calculate accuracies
        train_preds = (train_outputs >= 0.5).float()
        test_preds = (test_outputs >= 0.5).float()
        train_acc = (train_preds == y_train_tensor).float().mean().item()
        test_acc = (test_preds == y_test_tensor).float().mean().item()
    
    train_losses.append(train_loss.item())
    test_losses.append(test_loss.item())
    train_accuracies.append(train_acc)
    test_accuracies.append(test_acc)
    
    if (epoch + 1) % 100 == 0:
        print(f"Epoch [{epoch+1}/{epochs}]")
        print(f"  Train Loss: {train_loss.item():.4f}, Train Acc: {train_acc:.2%}")
        print(f"  Test Loss: {test_loss.item():.4f}, Test Acc: {test_acc:.2%}")

print("\n✅ Training Complete!")
print(f"Final Test Accuracy: {test_accuracies[-1]:.2%}")

In [None]:
# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Loss plot
ax1.plot(train_losses, label='Train Loss', linewidth=2)
ax1.plot(test_losses, label='Test Loss', linewidth=2)
ax1.set_title('Training and Test Loss', fontweight='bold', fontsize=14)
ax1.set_xlabel('Epoch', fontweight='bold')
ax1.set_ylabel('Loss (BCE)', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Accuracy plot
ax2.plot(train_accuracies, label='Train Accuracy', linewidth=2)
ax2.plot(test_accuracies, label='Test Accuracy', linewidth=2)
ax2.set_title('Training and Test Accuracy', fontweight='bold', fontsize=14)
ax2.set_xlabel('Epoch', fontweight='bold')
ax2.set_ylabel('Accuracy', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6️⃣ Visualizing Decision Boundaries

In [None]:
# Create decision boundary visualization
def plot_decision_boundary(model, X, y, scaler):
    # Create mesh
    h = 0.02
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    # Predict on mesh
    mesh_data = np.c_[xx.ravel(), yy.ravel()]
    mesh_data_scaled = scaler.transform(mesh_data)
    mesh_tensor = torch.FloatTensor(mesh_data_scaled)
    
    model.eval()
    with torch.no_grad():
        Z = model(mesh_tensor).numpy()
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.figure(figsize=(12, 6))
    
    # Contour plot
    plt.contourf(xx, yy, Z, levels=20, cmap='RdYlGn', alpha=0.6)
    plt.colorbar(label='Probability of Defect')
    
    # Scatter plot
    colors = ['green' if label == 0 else 'red' for label in y]
    plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black', linewidths=0.5, s=50)
    
    plt.title('Neural Network Decision Boundary', fontweight='bold', fontsize=14)
    plt.xlabel('Temperature Deviation', fontweight='bold')
    plt.ylabel('Pressure Deviation', fontweight='bold')
    plt.tight_layout()
    plt.show()

plot_decision_boundary(model_manufacturing, X_test, y_test, scaler)

## 🎉 Summary

Congratulations! You've mastered neural network fundamentals!

### Key Concepts
- ✅ Neural network architecture (layers, weights, biases)
- ✅ Activation functions (Sigmoid, Tanh, ReLU)
- ✅ Forward propagation
- ✅ Loss functions and backpropagation
- ✅ Training with PyTorch
- ✅ Real-world classification

### What You Built
1. 🔵 Single perceptron from scratch
2. 🧠 Multi-layer neural networks
3. ⚡ XOR problem solver
4. 🏭 Manufacturing defect classifier

### Next Steps
Continue to **Notebook 02: Convolutional Neural Networks** to learn about image processing!

<div align="center">
<b>Neural networks unlocked! Ready for CNNs! 🚀</b>
</div>