# The Perceptron: Building Block of Neural Networks

## Learning Objectives
- Understand the perceptron model and its components
- Learn how perceptrons make binary decisions
- Implement the perceptron learning algorithm
- Understand linear separability and limitations
- Visualize decision boundaries

## What is a Perceptron?
A perceptron is the simplest form of artificial neural network, consisting of a single neuron that makes binary decisions based on weighted inputs.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_classification, make_blobs
from sklearn.model_selection import train_test_split
import pandas as pd

# Set style for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
np.random.seed(42)

## 1. Perceptron Mathematical Model

### Mathematical Formula:
$$y = f(\sum_{i=1}^{n} w_i x_i + b)$$

Where:
- $x_i$: Input features
- $w_i$: Weights
- $b$: Bias
- $f$: Activation function (step function for perceptron)

In [None]:
class Perceptron:
    def __init__(self, learning_rate=0.01, max_iterations=1000):
        self.learning_rate = learning_rate
        self.max_iterations = max_iterations
        self.weights = None
        self.bias = None
        self.errors = []
    
    def step_function(self, x):
        """Step activation function"""
        return np.where(x >= 0, 1, 0)
    
    def fit(self, X, y):
        """Train the perceptron"""
        # Initialize weights and bias
        n_features = X.shape[1]
        self.weights = np.random.normal(0, 0.01, n_features)
        self.bias = 0
        
        # Training loop
        for iteration in range(self.max_iterations):
            errors = 0
            
            for i in range(X.shape[0]):
                # Forward pass
                linear_output = np.dot(X[i], self.weights) + self.bias
                prediction = self.step_function(linear_output)
                
                # Calculate error
                error = y[i] - prediction
                
                # Update weights and bias if there's an error
                if error != 0:
                    self.weights += self.learning_rate * error * X[i]
                    self.bias += self.learning_rate * error
                    errors += 1
            
            self.errors.append(errors)
            
            # Stop if no errors (convergence)
            if errors == 0:
                print(f"Converged after {iteration + 1} iterations")
                break
        
        return self
    
    def predict(self, X):
        """Make predictions"""
        linear_output = np.dot(X, self.weights) + self.bias
        return self.step_function(linear_output)
    
    def decision_boundary(self, X):
        """Calculate decision boundary values"""
        return np.dot(X, self.weights) + self.bias

print("Perceptron class implemented successfully!")

## 2. Simple Example: AND Gate

In [None]:
# AND gate truth table
X_and = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_and = np.array([0, 0, 0, 1])

print("AND Gate Truth Table:")
print("Input 1 | Input 2 | Output")
print("--------|---------|-------")
for i in range(len(X_and)):
    print(f"   {X_and[i][0]}    |    {X_and[i][1]}    |   {y_and[i]}")

# Train perceptron on AND gate
perceptron_and = Perceptron(learning_rate=0.1, max_iterations=100)
perceptron_and.fit(X_and, y_and)

# Test predictions
predictions = perceptron_and.predict(X_and)
print(f"\nPredictions: {predictions}")
print(f"Actual:      {y_and}")
print(f"Accuracy: {np.mean(predictions == y_and) * 100:.1f}%")

print(f"\nLearned weights: {perceptron_and.weights}")
print(f"Learned bias: {perceptron_and.bias:.3f}")

## 3. Visualizing the Learning Process

In [None]:
# Plot learning curve
plt.figure(figsize=(10, 6))
plt.plot(perceptron_and.errors, marker='o')
plt.title('Perceptron Learning Curve (AND Gate)')
plt.xlabel('Iteration')
plt.ylabel('Number of Errors')
plt.grid(True, alpha=0.3)
plt.show()

# Visualize decision boundary
def plot_decision_boundary(X, y, perceptron, title):
    plt.figure(figsize=(8, 6))
    
    # Create a mesh for plotting decision boundary
    h = 0.01
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    # Make predictions on the mesh
    mesh_points = np.c_[xx.ravel(), yy.ravel()]
    Z = perceptron.predict(mesh_points)
    Z = Z.reshape(xx.shape)
    
    # Plot decision boundary
    plt.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
    
    # Plot data points
    colors = ['red', 'blue']
    for i in range(2):
        plt.scatter(X[y == i, 0], X[y == i, 1], 
                   c=colors[i], marker='o', s=100, 
                   label=f'Class {i}', edgecolors='black')
    
    plt.title(title)
    plt.xlabel('Input 1')
    plt.ylabel('Input 2')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

plot_decision_boundary(X_and, y_and, perceptron_and, 'Perceptron Decision Boundary (AND Gate)')

## 4. OR Gate Example

In [None]:
# OR gate truth table
X_or = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_or = np.array([0, 1, 1, 1])

print("OR Gate Truth Table:")
print("Input 1 | Input 2 | Output")
print("--------|---------|-------")
for i in range(len(X_or)):
    print(f"   {X_or[i][0]}    |    {X_or[i][1]}    |   {y_or[i]}")

# Train perceptron on OR gate
perceptron_or = Perceptron(learning_rate=0.1, max_iterations=100)
perceptron_or.fit(X_or, y_or)

# Test predictions
predictions = perceptron_or.predict(X_or)
print(f"\nPredictions: {predictions}")
print(f"Actual:      {y_or}")
print(f"Accuracy: {np.mean(predictions == y_or) * 100:.1f}%")

plot_decision_boundary(X_or, y_or, perceptron_or, 'Perceptron Decision Boundary (OR Gate)')

## 5. The XOR Problem: Perceptron Limitation

In [None]:
# XOR gate truth table
X_xor = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_xor = np.array([0, 1, 1, 0])

print("XOR Gate Truth Table:")
print("Input 1 | Input 2 | Output")
print("--------|---------|-------")
for i in range(len(X_xor)):
    print(f"   {X_xor[i][0]}    |    {X_xor[i][1]}    |   {y_xor[i]}")

# Try to train perceptron on XOR gate
perceptron_xor = Perceptron(learning_rate=0.1, max_iterations=100)
perceptron_xor.fit(X_xor, y_xor)

# Test predictions
predictions = perceptron_xor.predict(X_xor)
print(f"\nPredictions: {predictions}")
print(f"Actual:      {y_xor}")
print(f"Accuracy: {np.mean(predictions == y_xor) * 100:.1f}%")

# Plot XOR problem
plt.figure(figsize=(8, 6))
colors = ['red', 'blue']
for i in range(2):
    plt.scatter(X_xor[y_xor == i, 0], X_xor[y_xor == i, 1], 
               c=colors[i], marker='o', s=100, 
               label=f'Class {i}', edgecolors='black')

plt.title('XOR Problem: Not Linearly Separable')
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.legend()
plt.grid(True, alpha=0.3)
plt.text(0.5, 0.8, 'No single line can\nseparate these classes!', 
         ha='center', va='center', fontsize=12, 
         bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.7))
plt.show()

# Show learning curve for XOR
plt.figure(figsize=(10, 6))
plt.plot(perceptron_xor.errors, marker='o', color='red')
plt.title('Perceptron Learning Curve (XOR Gate) - No Convergence')
plt.xlabel('Iteration')
plt.ylabel('Number of Errors')
plt.grid(True, alpha=0.3)
plt.show()

## 6. Real Dataset Example

In [None]:
# Generate a linearly separable dataset
X, y = make_blobs(n_samples=100, centers=2, n_features=2, 
                  random_state=42, cluster_std=1.5)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train perceptron
perceptron_real = Perceptron(learning_rate=0.01, max_iterations=1000)
perceptron_real.fit(X_train, y_train)

# Make predictions
train_predictions = perceptron_real.predict(X_train)
test_predictions = perceptron_real.predict(X_test)

# Calculate accuracy
train_accuracy = np.mean(train_predictions == y_train) * 100
test_accuracy = np.mean(test_predictions == y_test) * 100

print(f"Training Accuracy: {train_accuracy:.1f}%")
print(f"Test Accuracy: {test_accuracy:.1f}%")

# Visualize results
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Training data
colors = ['red', 'blue']
for i in range(2):
    axes[0].scatter(X_train[y_train == i, 0], X_train[y_train == i, 1], 
                   c=colors[i], marker='o', s=50, 
                   label=f'Class {i}', alpha=0.7)

# Plot decision boundary for training data
x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                     np.linspace(y_min, y_max, 100))
mesh_points = np.c_[xx.ravel(), yy.ravel()]
Z = perceptron_real.predict(mesh_points)
Z = Z.reshape(xx.shape)
axes[0].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
axes[0].set_title(f'Training Data (Accuracy: {train_accuracy:.1f}%)')
axes[0].set_xlabel('Feature 1')
axes[0].set_ylabel('Feature 2')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Test data
for i in range(2):
    axes[1].scatter(X_test[y_test == i, 0], X_test[y_test == i, 1], 
                   c=colors[i], marker='o', s=50, 
                   label=f'Class {i}', alpha=0.7)

# Plot decision boundary for test data
x_min, x_max = X_test[:, 0].min() - 1, X_test[:, 0].max() + 1
y_min, y_max = X_test[:, 1].min() - 1, X_test[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                     np.linspace(y_min, y_max, 100))
mesh_points = np.c_[xx.ravel(), yy.ravel()]
Z = perceptron_real.predict(mesh_points)
Z = Z.reshape(xx.shape)
axes[1].contourf(xx, yy, Z, alpha=0.3, cmap='RdYlBu')
axes[1].set_title(f'Test Data (Accuracy: {test_accuracy:.1f}%)')
axes[1].set_xlabel('Feature 1')
axes[1].set_ylabel('Feature 2')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Plot learning curve
plt.figure(figsize=(10, 6))
plt.plot(perceptron_real.errors, marker='o')
plt.title('Perceptron Learning Curve (Real Dataset)')
plt.xlabel('Iteration')
plt.ylabel('Number of Errors')
plt.grid(True, alpha=0.3)
plt.show()

## 7. Perceptron vs Scikit-learn Comparison

In [None]:
from sklearn.linear_model import Perceptron as SKPerceptron
from sklearn.metrics import accuracy_score, classification_report

# Train scikit-learn perceptron
sk_perceptron = SKPerceptron(max_iter=1000, random_state=42)
sk_perceptron.fit(X_train, y_train)

# Make predictions
sk_train_pred = sk_perceptron.predict(X_train)
sk_test_pred = sk_perceptron.predict(X_test)

# Compare results
print("Comparison: Our Implementation vs Scikit-learn")
print("=" * 50)
print(f"Our Perceptron - Train Accuracy: {train_accuracy:.1f}%")
print(f"Our Perceptron - Test Accuracy:  {test_accuracy:.1f}%")
print(f"Scikit-learn - Train Accuracy:   {accuracy_score(y_train, sk_train_pred) * 100:.1f}%")
print(f"Scikit-learn - Test Accuracy:    {accuracy_score(y_test, sk_test_pred) * 100:.1f}%")

print("\nDetailed Classification Report (Scikit-learn):")
print(classification_report(y_test, sk_test_pred))

## 8. Understanding Linear Separability

In [None]:
# Create different types of datasets
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 1. Linearly separable data
X1, y1 = make_blobs(n_samples=100, centers=2, n_features=2, 
                    random_state=42, cluster_std=1.0)
axes[0, 0].scatter(X1[y1 == 0, 0], X1[y1 == 0, 1], c='red', alpha=0.7, label='Class 0')
axes[0, 0].scatter(X1[y1 == 1, 0], X1[y1 == 1, 1], c='blue', alpha=0.7, label='Class 1')
axes[0, 0].set_title('Linearly Separable\n(Perceptron will work)')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Overlapping classes
X2, y2 = make_blobs(n_samples=100, centers=2, n_features=2, 
                    random_state=42, cluster_std=2.5)
axes[0, 1].scatter(X2[y2 == 0, 0], X2[y2 == 0, 1], c='red', alpha=0.7, label='Class 0')
axes[0, 1].scatter(X2[y2 == 1, 0], X2[y2 == 1, 1], c='blue', alpha=0.7, label='Class 1')
axes[0, 1].set_title('Overlapping Classes\n(Perceptron may struggle)')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# 3. Circular pattern (not linearly separable)
theta = np.linspace(0, 2*np.pi, 50)
r1, r2 = 2, 4
X3_inner = np.column_stack([r1 * np.cos(theta), r1 * np.sin(theta)])
X3_outer = np.column_stack([r2 * np.cos(theta), r2 * np.sin(theta)])
X3 = np.vstack([X3_inner, X3_outer])
y3 = np.hstack([np.zeros(50), np.ones(50)])
axes[1, 0].scatter(X3[y3 == 0, 0], X3[y3 == 0, 1], c='red', alpha=0.7, label='Class 0')
axes[1, 0].scatter(X3[y3 == 1, 0], X3[y3 == 1, 1], c='blue', alpha=0.7, label='Class 1')
axes[1, 0].set_title('Circular Pattern\n(Not linearly separable)')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# 4. XOR pattern
X4 = np.array([[0, 0], [0, 1], [1, 0], [1, 1], 
               [0.1, 0.1], [0.1, 0.9], [0.9, 0.1], [0.9, 0.9]])
y4 = np.array([0, 1, 1, 0, 0, 1, 1, 0])
axes[1, 1].scatter(X4[y4 == 0, 0], X4[y4 == 0, 1], c='red', alpha=0.7, s=100, label='Class 0')
axes[1, 1].scatter(X4[y4 == 1, 0], X4[y4 == 1, 1], c='blue', alpha=0.7, s=100, label='Class 1')
axes[1, 1].set_title('XOR Pattern\n(Not linearly separable)')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## üéØ Key Takeaways

### What We Learned:
1. **Perceptron Structure**: Single neuron with weights, bias, and step activation
2. **Learning Algorithm**: Weight updates based on prediction errors
3. **Linear Separability**: Perceptrons can only solve linearly separable problems
4. **Limitations**: Cannot solve XOR and other non-linearly separable problems

### Mathematical Insights:
- **Decision Boundary**: $w_1x_1 + w_2x_2 + b = 0$
- **Weight Update Rule**: $w_{new} = w_{old} + \eta \cdot error \cdot input$
- **Convergence**: Guaranteed for linearly separable data

### Practical Applications:
- Binary classification tasks
- Linear decision boundaries
- Foundation for multi-layer networks

## üìù Exercises

### Beginner:
1. Implement NAND gate using perceptron
2. Experiment with different learning rates
3. Visualize weight changes during training

### Intermediate:
1. Add momentum to the learning algorithm
2. Implement different activation functions
3. Create a perceptron for multi-class classification

### Advanced:
1. Analyze convergence properties mathematically
2. Implement pocket algorithm for non-separable data
3. Compare with other linear classifiers

## üîó Next Steps

The perceptron's limitation with non-linearly separable data leads us to:
1. **Multi-Layer Perceptrons (MLPs)**: Adding hidden layers
2. **Non-linear Activation Functions**: Beyond step functions
3. **Backpropagation**: Training multi-layer networks

**Next Notebook**: [Multi-Layer Perceptrons](./02_multilayer_perceptron.ipynb) ‚Üí