# Logistic Regression for Cat vs Non-Cat Image Classification

## Overview
This notebook implements a logistic regression model from scratch for binary image classification. We'll classify images as either containing a cat (1) or not containing a cat (0).

## What is Logistic Regression?
Logistic regression is a binary classification algorithm that:
1. Takes input features (flattened image pixels)
2. Computes a weighted sum: z = w^T * x + b
3. Applies sigmoid activation: a = σ(z) = 1/(1 + e^(-z))
4. Outputs a probability between 0 and 1

## Model Architecture
- **Input**: Flattened image (64x64x3 = 12,288 features)
- **Parameters**: Weight vector w (12,288,) and bias b (scalar)
- **Output**: Probability of being a cat (0 to 1)

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

## Step 1: Create Dummy Dataset

We'll create synthetic image data to simulate cat and non-cat images:
- **Image dimensions**: 64x64 pixels with 3 color channels (RGB)
- **Training set**: 200 examples (100 cats, 100 non-cats)
- **Test set**: 50 examples (25 cats, 25 non-cats)

### Dataset characteristics:
- **Cat images**: Higher intensity in certain pixel regions (simulating cat features)
- **Non-cat images**: Random patterns with different distributions

In [None]:
def create_dummy_cat_dataset(n_samples=200, img_height=64, img_width=64, n_channels=3, test_ratio=0.2):
    """
    Create a dummy dataset of cat and non-cat images.
    
    Arguments:
    n_samples -- total number of samples to generate
    img_height -- height of each image
    img_width -- width of each image
    n_channels -- number of color channels (3 for RGB)
    test_ratio -- proportion of data to use for testing
    
    Returns:
    train_set_x -- training images (n_train, img_height, img_width, n_channels)
    train_set_y -- training labels (1, n_train)
    test_set_x -- test images (n_test, img_height, img_width, n_channels)
    test_set_y -- test labels (1, n_test)
    """
    
    # Calculate number of samples for each class
    n_cats = n_samples // 2
    n_non_cats = n_samples // 2
    
    # Generate CAT images (label = 1)
    # Cats have higher intensity in center region (simulating cat face/body)
    cat_images = np.random.rand(n_cats, img_height, img_width, n_channels) * 100
    # Add brighter central region for cats
    center_h_start, center_h_end = img_height // 4, 3 * img_height // 4
    center_w_start, center_w_end = img_width // 4, 3 * img_width // 4
    cat_images[:, center_h_start:center_h_end, center_w_start:center_w_end, :] += 100
    cat_images = np.clip(cat_images, 0, 255)
    
    # Generate NON-CAT images (label = 0)
    # Non-cats have more random patterns with lower overall intensity
    non_cat_images = np.random.rand(n_non_cats, img_height, img_width, n_channels) * 150
    non_cat_images = np.clip(non_cat_images, 0, 255)
    
    # Combine and create labels
    all_images = np.vstack([cat_images, non_cat_images])
    all_labels = np.hstack([np.ones(n_cats), np.zeros(n_non_cats)])
    
    # Shuffle the dataset
    indices = np.random.permutation(n_samples)
    all_images = all_images[indices]
    all_labels = all_labels[indices]
    
    # Split into train and test sets
    n_test = int(n_samples * test_ratio)
    n_train = n_samples - n_test
    
    train_set_x = all_images[:n_train]
    train_set_y = all_labels[:n_train].reshape(1, n_train)
    
    test_set_x = all_images[n_train:]
    test_set_y = all_labels[n_train:].reshape(1, n_test)
    
    return train_set_x, train_set_y, test_set_x, test_set_y


# Create the dataset
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y = create_dummy_cat_dataset(
    n_samples=200, img_height=64, img_width=64, n_channels=3, test_ratio=0.25
)

print("Dataset created successfully!")
print(f"Training set: {train_set_x_orig.shape[0]} examples")
print(f"Test set: {test_set_x_orig.shape[0]} examples")
print(f"Image shape: {train_set_x_orig.shape[1:]}")
print(f"Train labels shape: {train_set_y.shape}")
print(f"Number of cats in training: {int(np.sum(train_set_y))}")
print(f"Number of non-cats in training: {int(train_set_y.shape[1] - np.sum(train_set_y))}")

## Step 2: Visualize Sample Images

Let's visualize some examples from our dataset to understand what we're working with.

In [None]:
# Visualize some examples
def visualize_samples(X, y, n_samples=6):
    """
    Visualize sample images from the dataset.
    
    Arguments:
    X -- images array
    y -- labels array
    n_samples -- number of samples to display
    """
    fig, axes = plt.subplots(2, 3, figsize=(12, 8))
    axes = axes.ravel()
    
    for i in range(n_samples):
        axes[i].imshow(X[i].astype('uint8'))
        label = "Cat" if y[0, i] == 1 else "Non-Cat"
        axes[i].set_title(f"{label}")
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_samples(train_set_x_orig, train_set_y, n_samples=6)

## Step 3: Preprocess the Data

### Preprocessing steps:
1. **Flatten images**: Convert from (height, width, channels) to a single vector
   - Original shape: (64, 64, 3)
   - Flattened shape: (12288,) where 12288 = 64 × 64 × 3

2. **Normalize pixel values**: Scale from [0, 255] to [0, 1]
   - Helps gradient descent converge faster
   - Prevents numerical instability

3. **Transpose**: Shape becomes (n_features, n_examples)
   - Enables efficient vectorized operations

In [None]:
def preprocess_data(train_x, test_x):
    """
    Flatten and normalize image data.
    
    Arguments:
    train_x -- training images (m_train, height, width, channels)
    test_x -- test images (m_test, height, width, channels)
    
    Returns:
    train_x_flatten -- flattened training images (n_features, m_train)
    test_x_flatten -- flattened test images (n_features, m_test)
    """
    
    # Flatten the images
    # reshape(m, -1) flattens each image: (m, height, width, channels) -> (m, height*width*channels)
    train_x_flatten = train_x.reshape(train_x.shape[0], -1).T
    test_x_flatten = test_x.reshape(test_x.shape[0], -1).T
    
    # Normalize pixel values from [0, 255] to [0, 1]
    train_x_flatten = train_x_flatten / 255.0
    test_x_flatten = test_x_flatten / 255.0
    
    return train_x_flatten, test_x_flatten


# Preprocess the data
train_set_x, test_set_x = preprocess_data(train_set_x_orig, test_set_x_orig)

print("Data preprocessed successfully!")
print(f"Training set shape: {train_set_x.shape}")
print(f"Test set shape: {test_set_x.shape}")
print(f"Training labels shape: {train_set_y.shape}")
print(f"Test labels shape: {test_set_y.shape}")
print(f"\nEach image is now a vector of size: {train_set_x.shape[0]}")
print(f"Number of training examples: {train_set_x.shape[1]}")

## Step 4: Implement Helper Functions

### Sigmoid Activation Function
The sigmoid function σ(z) = 1 / (1 + e^(-z)) maps any real number to a value between 0 and 1.

**Properties:**
- Output range: (0, 1)
- σ(0) = 0.5
- As z → ∞, σ(z) → 1
- As z → -∞, σ(z) → 0

**Why sigmoid for binary classification?**
- Outputs can be interpreted as probabilities
- Smooth and differentiable (needed for gradient descent)

In [None]:
def sigmoid(z):
    """
    Compute the sigmoid of z.
    
    Arguments:
    z -- A scalar or numpy array of any size
    
    Returns:
    s -- sigmoid(z)
    """
    s = 1 / (1 + np.exp(-z))
    return s


# Test sigmoid function
print("Testing sigmoid function:")
print(f"sigmoid(0) = {sigmoid(0):.4f} (should be 0.5)")
print(f"sigmoid(5) = {sigmoid(5):.4f} (should be close to 1)")
print(f"sigmoid(-5) = {sigmoid(-5):.4f} (should be close to 0)")

# Visualize sigmoid function
z = np.linspace(-10, 10, 100)
plt.figure(figsize=(8, 5))
plt.plot(z, sigmoid(z), linewidth=2)
plt.grid(True, alpha=0.3)
plt.xlabel('z', fontsize=12)
plt.ylabel('sigmoid(z)', fontsize=12)
plt.title('Sigmoid Activation Function', fontsize=14)
plt.axhline(y=0.5, color='r', linestyle='--', alpha=0.5, label='y=0.5')
plt.axvline(x=0, color='r', linestyle='--', alpha=0.5, label='z=0')
plt.legend()
plt.show()

## Step 5: Initialize Parameters

We need to initialize:
- **Weight vector (w)**: Shape (n_features, 1) - initialized to zeros
- **Bias (b)**: Scalar - initialized to zero

**Why initialize to zeros for logistic regression?**
- Unlike neural networks, logistic regression doesn't suffer from symmetry problems
- Zero initialization is simple and works well

In [None]:
def initialize_parameters(dim):
    """
    Initialize weights and bias to zeros.
    
    Arguments:
    dim -- size of the w vector (number of features)
    
    Returns:
    w -- initialized weight vector of shape (dim, 1)
    b -- initialized bias (scalar)
    """
    w = np.zeros((dim, 1))
    b = 0.0
    
    return w, b


# Test initialization
dim = train_set_x.shape[0]
w, b = initialize_parameters(dim)
print(f"Weight vector shape: {w.shape}")
print(f"Bias value: {b}")
print(f"Number of parameters: {w.shape[0] + 1}")

## Step 6: Forward and Backward Propagation

### Forward Propagation
Compute the predictions and cost:

1. **Linear transformation**: Z = w^T X + b
2. **Activation**: A = σ(Z)
3. **Cost function** (Binary Cross-Entropy):
   
   J = -1/m ∑[y log(a) + (1-y) log(1-a)]

### Backward Propagation
Compute gradients:

- **dw** = ∂J/∂w = 1/m X(A-Y)^T
- **db** = ∂J/∂b = 1/m ∑(A-Y)

**Why Binary Cross-Entropy?**
- Penalizes confident wrong predictions heavily
- Convex for logistic regression (guarantees finding global minimum)
- Derives from maximum likelihood estimation

In [None]:
def propagate(w, b, X, Y):
    """
    Implement forward and backward propagation.
    
    Arguments:
    w -- weights, numpy array of shape (n_features, 1)
    b -- bias, scalar
    X -- input data of shape (n_features, m_examples)
    Y -- true labels of shape (1, m_examples)
    
    Returns:
    cost -- binary cross-entropy cost
    dw -- gradient of loss with respect to w
    db -- gradient of loss with respect to b
    """
    
    m = X.shape[1]  # number of examples
    
    # FORWARD PROPAGATION
    # Compute activation
    Z = np.dot(w.T, X) + b  # Shape: (1, m)
    A = sigmoid(Z)           # Shape: (1, m)
    
    # Compute cost
    # Add small epsilon to prevent log(0)
    epsilon = 1e-8
    cost = -1/m * np.sum(Y * np.log(A + epsilon) + (1 - Y) * np.log(1 - A + epsilon))
    
    # BACKWARD PROPAGATION
    dZ = A - Y              # Shape: (1, m)
    dw = 1/m * np.dot(X, dZ.T)  # Shape: (n_features, 1)
    db = 1/m * np.sum(dZ)       # Scalar
    
    cost = np.squeeze(cost)  # Remove unnecessary dimensions
    
    grads = {
        "dw": dw,
        "db": db
    }
    
    return grads, cost


# Test propagation
w_test, b_test = initialize_parameters(train_set_x.shape[0])
grads, cost = propagate(w_test, b_test, train_set_x[:, :5], train_set_y[:, :5])
print(f"Initial cost (random initialization): {cost:.4f}")
print(f"Gradient dw shape: {grads['dw'].shape}")
print(f"Gradient db: {grads['db']:.4f}")

## Step 7: Optimization using Gradient Descent

### Gradient Descent Algorithm
Iteratively update parameters to minimize cost:

**Update rules:**
- w = w - α × dw
- b = b - α × db

Where α is the learning rate.

**Learning rate (α):**
- Too large: May overshoot minimum, diverge
- Too small: Slow convergence
- Typical values: 0.001 to 0.1

In [None]:
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=True):
    """
    Optimize w and b by running gradient descent.
    
    Arguments:
    w -- weights, numpy array of shape (n_features, 1)
    b -- bias, scalar
    X -- input data of shape (n_features, m_examples)
    Y -- true labels of shape (1, m_examples)
    num_iterations -- number of iterations for gradient descent
    learning_rate -- learning rate for gradient descent
    print_cost -- if True, print cost every 100 iterations
    
    Returns:
    params -- dictionary containing weights w and bias b
    grads -- dictionary containing gradients
    costs -- list of costs computed during optimization
    """
    
    costs = []
    
    for i in range(num_iterations):
        # Forward and backward propagation
        grads, cost = propagate(w, b, X, Y)
        
        # Retrieve gradients
        dw = grads["dw"]
        db = grads["db"]
        
        # Update parameters
        w = w - learning_rate * dw
        b = b - learning_rate * db
        
        # Record cost every 10 iterations
        if i % 10 == 0:
            costs.append(cost)
        
        # Print cost every 100 iterations
        if print_cost and i % 100 == 0:
            print(f"Cost after iteration {i}: {cost:.6f}")
    
    params = {
        "w": w,
        "b": b
    }
    
    grads = {
        "dw": dw,
        "db": db
    }
    
    return params, grads, costs


print("Optimization function defined successfully!")

## Step 8: Prediction Function

Convert probabilities to binary predictions:
- If A > 0.5: predict class 1 (cat)
- If A ≤ 0.5: predict class 0 (non-cat)

The threshold 0.5 is standard but can be adjusted based on:
- Cost of false positives vs false negatives
- Class imbalance
- Business requirements

In [None]:
def predict(w, b, X):
    """
    Predict labels using learned parameters.
    
    Arguments:
    w -- weights, numpy array of shape (n_features, 1)
    b -- bias, scalar
    X -- input data of shape (n_features, m_examples)
    
    Returns:
    Y_prediction -- predictions for the input data
    """
    
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    
    # Compute predictions
    Z = np.dot(w.T, X) + b
    A = sigmoid(Z)
    
    # Convert probabilities to binary predictions
    Y_prediction = (A > 0.5).astype(int)
    
    return Y_prediction


print("Prediction function defined successfully!")

## Step 9: Complete Model

Combine all components into a single model function that:
1. Initializes parameters
2. Runs gradient descent
3. Makes predictions on train and test sets
4. Returns learned parameters and predictions

In [None]:
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.005, print_cost=True):
    """
    Build the logistic regression model.
    
    Arguments:
    X_train -- training set of shape (n_features, m_train)
    Y_train -- training labels of shape (1, m_train)
    X_test -- test set of shape (n_features, m_test)
    Y_test -- test labels of shape (1, m_test)
    num_iterations -- number of iterations for optimization
    learning_rate -- learning rate for gradient descent
    print_cost -- if True, print cost during training
    
    Returns:
    d -- dictionary containing information about the model
    """
    
    # Initialize parameters
    w, b = initialize_parameters(X_train.shape[0])
    
    # Gradient descent
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters
    w = params["w"]
    b = params["b"]
    
    # Predict on train and test sets
    Y_prediction_train = predict(w, b, X_train)
    Y_prediction_test = predict(w, b, X_test)
    
    # Calculate accuracies
    train_accuracy = 100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100
    test_accuracy = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100
    
    print(f"\nTrain accuracy: {train_accuracy:.2f}%")
    print(f"Test accuracy: {test_accuracy:.2f}%")
    
    d = {
        "costs": costs,
        "Y_prediction_test": Y_prediction_test,
        "Y_prediction_train": Y_prediction_train,
        "w": w,
        "b": b,
        "learning_rate": learning_rate,
        "num_iterations": num_iterations,
        "train_accuracy": train_accuracy,
        "test_accuracy": test_accuracy
    }
    
    return d


print("Model function defined successfully!")

## Step 10: Train the Model

Now let's train our logistic regression model on the cat vs non-cat dataset.

In [None]:
# Train the model
print("Training logistic regression model...\n")
model_results = model(
    train_set_x, 
    train_set_y, 
    test_set_x, 
    test_set_y, 
    num_iterations=2000, 
    learning_rate=0.005, 
    print_cost=True
)

## Step 11: Visualize Learning Curve

The learning curve shows how the cost decreases over iterations:
- **Decreasing cost**: Model is learning
- **Flat cost**: Model has converged
- **Increasing cost**: Learning rate too high or numerical issues

In [None]:
# Plot learning curve
costs = np.squeeze(model_results['costs'])
plt.figure(figsize=(10, 6))
plt.plot(costs, linewidth=2)
plt.ylabel('Cost', fontsize=12)
plt.xlabel('Iterations (per 10)', fontsize=12)
plt.title(f"Learning Curve (Learning Rate = {model_results['learning_rate']})", fontsize=14)
plt.grid(True, alpha=0.3)
plt.show()

print(f"Final cost: {costs[-1]:.6f}")
print(f"Initial cost: {costs[0]:.6f}")
print(f"Cost reduction: {((costs[0] - costs[-1]) / costs[0] * 100):.2f}%")

## Step 12: Analyze Model Performance

Let's examine prediction examples and compute detailed metrics.

In [None]:
# Visualize predictions on test set
def visualize_predictions(X_orig, y_true, y_pred, n_samples=6):
    """
    Visualize predictions vs true labels.
    """
    fig, axes = plt.subplots(2, 3, figsize=(12, 8))
    axes = axes.ravel()
    
    for i in range(n_samples):
        axes[i].imshow(X_orig[i].astype('uint8'))
        true_label = "Cat" if y_true[0, i] == 1 else "Non-Cat"
        pred_label = "Cat" if y_pred[0, i] == 1 else "Non-Cat"
        
        # Color code: green for correct, red for incorrect
        color = 'green' if y_true[0, i] == y_pred[0, i] else 'red'
        axes[i].set_title(f"True: {true_label}\nPred: {pred_label}", color=color, fontweight='bold')
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_predictions(
    test_set_x_orig, 
    test_set_y, 
    model_results['Y_prediction_test'], 
    n_samples=6
)

## Step 13: Confusion Matrix and Metrics

**Confusion Matrix:**
- True Positives (TP): Correctly predicted cats
- True Negatives (TN): Correctly predicted non-cats
- False Positives (FP): Non-cats predicted as cats
- False Negatives (FN): Cats predicted as non-cats

**Metrics:**
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- F1-Score = 2 × (Precision × Recall) / (Precision + Recall)

In [None]:
def compute_metrics(y_true, y_pred):
    """
    Compute confusion matrix and classification metrics.
    """
    y_true = y_true.flatten()
    y_pred = y_pred.flatten()
    
    # Confusion matrix components
    TP = np.sum((y_true == 1) & (y_pred == 1))
    TN = np.sum((y_true == 0) & (y_pred == 0))
    FP = np.sum((y_true == 0) & (y_pred == 1))
    FN = np.sum((y_true == 1) & (y_pred == 0))
    
    # Metrics
    accuracy = (TP + TN) / (TP + TN + FP + FN)
    precision = TP / (TP + FP) if (TP + FP) > 0 else 0
    recall = TP / (TP + FN) if (TP + FN) > 0 else 0
    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    
    return {
        'confusion_matrix': np.array([[TN, FP], [FN, TP]]),
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1_score
    }


# Compute metrics for test set
metrics = compute_metrics(test_set_y, model_results['Y_prediction_test'])

print("\n=== Test Set Performance ===")
print(f"Accuracy:  {metrics['accuracy']*100:.2f}%")
print(f"Precision: {metrics['precision']*100:.2f}%")
print(f"Recall:    {metrics['recall']*100:.2f}%")
print(f"F1-Score:  {metrics['f1_score']*100:.2f}%")

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(metrics['confusion_matrix'], annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Non-Cat', 'Cat'], yticklabels=['Non-Cat', 'Cat'],
            cbar_kws={'label': 'Count'})
plt.ylabel('True Label', fontsize=12)
plt.xlabel('Predicted Label', fontsize=12)
plt.title('Confusion Matrix - Test Set', fontsize=14)
plt.show()

## Step 14: Experiment with Different Learning Rates

Let's see how different learning rates affect model performance and convergence speed.

In [None]:
# Test different learning rates
learning_rates = [0.001, 0.005, 0.01, 0.05]
models = {}

print("Training models with different learning rates...\n")

for lr in learning_rates:
    print(f"\n{'='*50}")
    print(f"Learning Rate: {lr}")
    print(f"{'='*50}")
    models[lr] = model(
        train_set_x, 
        train_set_y, 
        test_set_x, 
        test_set_y, 
        num_iterations=1500, 
        learning_rate=lr,
        print_cost=False
    )

# Plot learning curves for comparison
plt.figure(figsize=(12, 6))
for lr in learning_rates:
    costs = np.squeeze(models[lr]['costs'])
    plt.plot(costs, label=f'LR = {lr}')

plt.ylabel('Cost', fontsize=12)
plt.xlabel('Iterations (per 10)', fontsize=12)
plt.title('Learning Curves for Different Learning Rates', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Summary table
print("\n" + "="*70)
print(f"{'Learning Rate':<15} {'Train Accuracy':<20} {'Test Accuracy':<20}")
print("="*70)
for lr in learning_rates:
    train_acc = models[lr]['train_accuracy']
    test_acc = models[lr]['test_accuracy']
    print(f"{lr:<15} {train_acc:<20.2f}% {test_acc:<20.2f}%")
print("="*70)

## Summary and Key Takeaways

### What We Implemented:
1. **Data Creation**: Generated synthetic cat/non-cat images
2. **Preprocessing**: Flattened and normalized images
3. **Model Components**:
   - Sigmoid activation function
   - Forward propagation (prediction)
   - Cost function (binary cross-entropy)
   - Backward propagation (gradients)
   - Gradient descent optimization
4. **Evaluation**: Accuracy, precision, recall, F1-score

### Mathematical Foundation:
- **Model**: ŷ = σ(w^T x + b)
- **Cost**: J = -1/m ∑[y log(ŷ) + (1-y) log(1-ŷ)]
- **Gradients**: ∂J/∂w, ∂J/∂b
- **Update**: w = w - α ∂J/∂w

### Key Concepts:
1. **Vectorization**: Efficient computation using NumPy
2. **Feature Engineering**: Flattening images into vectors
3. **Normalization**: Scaling features for better convergence
4. **Hyperparameters**: Learning rate, iterations
5. **Evaluation**: Multiple metrics for comprehensive assessment

### Limitations:
- Linear decision boundary (can't learn complex patterns)
- No feature learning (uses raw pixels)
- Sensitive to learning rate

### Next Steps:
- Try neural networks for better feature learning
- Experiment with regularization (L2/L1)
- Use real cat image datasets
- Implement mini-batch gradient descent