# EE 467 Lab 2: Breaking CAPTCHAs with PyTorch
This notebook implements the same CAPTCHA breaking solution as the TensorFlow version, but using PyTorch instead. We will build and train a Convolutional Neural Network to automatically recognize CAPTCHA characters.

As usual, please check if the helper library, `lab_2_helpers.py` and the extracted dataset directory, `captcha-images` exist under the same directory.

In [23]:
import subprocess
import sys
import shutil
import os

# Step 1: Remove any broken matplotlib installations
print("Cleaning up corrupted matplotlib...")
site_packages = os.path.dirname(os.__file__).replace('\\lib', '\\Lib\\site-packages')
matplotlib_path = os.path.join(os.path.dirname(sys.executable).replace('\\Scripts', '\\Lib\\site-packages'), 'matplotlib')

if os.path.exists(matplotlib_path):
    try:
        shutil.rmtree(matplotlib_path)
        print(f"✓ Removed corrupted matplotlib at {matplotlib_path}")
    except Exception as e:
        print(f"Could not remove directory: {e}")

# Step 2: Install matplotlib from PyPI with explicit version
print("\nInstalling matplotlib...")
result = subprocess.run([sys.executable, "-m", "pip", "install", "-U", "setuptools", "wheel"], 
                       capture_output=True, text=True)

result = subprocess.run([sys.executable, "-m", "pip", "install", "matplotlib==3.7.2", "--no-cache-dir", "--no-build-isolation"],
                       capture_output=True, text=True)

if result.returncode == 0:
    print("✓ Successfully installed matplotlib 3.7.2")
else:
    print(f"Installation output: {result.stdout}")
    print(f"Installation errors: {result.stderr}")

# Step 3: Install other packages
print("\nInstalling other dependencies...")
other_packages = ["scikit-learn", "opencv-python>4", "imutils", "torch", "torchvision"]

for package in other_packages:
    result = subprocess.run([sys.executable, "-m", "pip", "install", package, "--no-cache-dir"],
                           capture_output=True, text=True)
    if result.returncode == 0:
        print(f"✓ {package}")
    else:
        print(f"✗ {package}")

print("\n✓ Installation complete!")

Cleaning up corrupted matplotlib...

Installing matplotlib...
Installation output: Collecting matplotlib==3.7.2
  Downloading matplotlib-3.7.2-cp310-cp310-win_amd64.whl.metadata (5.8 kB)
Downloading matplotlib-3.7.2-cp310-cp310-win_amd64.whl (7.5 MB)
   ---------------------------------------- 0.0/7.5 MB ? eta -:--:--
   ------ --------------------------------- 1.3/7.5 MB 8.4 MB/s eta 0:00:01
   ---------------- ----------------------- 3.1/7.5 MB 9.2 MB/s eta 0:00:01
   ------------------------------ --------- 5.8/7.5 MB 10.3 MB/s eta 0:00:01
   ---------------------------------------- 7.5/7.5 MB 10.8 MB/s  0:00:00
Installing collected packages: matplotlib
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.10.8


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip
error: uninstall-no-record-file

Ã— Cannot uninstall matplotlib 3.10.8
â•°â”€> The package's contents are unknown: no RECORD 

Next, we import all tools needed before starting:

In [None]:
import os, pickle, glob, math
from pprint import pprint

import cv2
import numpy as np
import imutils
from imutils import paths
from matplotlib import pyplot as plt
from matplotlib.gridspec import GridSpec
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

from lab_2_helpers import *

⚠ Matplotlib import failed: cannot import name 'pyplot' from 'matplotlib' (unknown location)
  Continuing without matplotlib...
✓ Helper functions imported successfully
✓ Core imports successful!


# Preprocessing
## Ground Truth Characters Extraction
As usual, we will start pre-processing stage by loading CAPTCHA images into the memory:

In [None]:
!tar -xJf captcha-images.tar.xz

In [None]:
# Dataset images folder
CAPTCHA_IMAGE_FOLDER = "./captcha-images"

# List of all the captcha images we need to process
captcha_image_paths = list(paths.list_images(CAPTCHA_IMAGE_FOLDER))
# Review image paths
pprint(captcha_image_paths[:10])

['./captcha-images\\2A2X.png',
 './captcha-images\\2A5R.png',
 './captcha-images\\2A5Z.png',
 './captcha-images\\2A98.png',
 './captcha-images\\2A9N.png',
 './captcha-images\\2AD9.png',
 './captcha-images\\2AEF.png',
 './captcha-images\\2APC.png',
 './captcha-images\\2AQ7.png',
 './captcha-images\\2AX2.png']


Note that for each image, its file name (without extension) happens to be its corresponding CAPTCHA text. Thus, we extract file names for all CAPTCHA images and save them as labels for future use:

In [None]:
def extract_captcha_text(image_path):
    """ Extract correct CAPTCHA texts from file name of images. """
    # Extract file name of image from its path
    # e.g. "./captcha-images/2A2X.png" -> "2A2X.png"
    image_file_name = os.path.basename(image_path)
    # Extract base name of image, omitting file extension
    # e.g. "2A2X.png" -> "2A2X"
    return os.path.splitext(image_file_name)[0]

captcha_texts = [extract_captcha_text(image_path) for image_path in captcha_image_paths]
# Review extraction results
pprint(captcha_texts[:10])

['2A2X', '2A5R', '2A5Z', '2A98', '2A9N', '2AD9', '2AEF', '2APC', '2AQ7', '2AX2']


## Loading and Transforming Images
For the feature extraction stage, we are going to extract individual characters from these CAPTCHAs. This is done by looking for contours (bounding boxes) around characters, then cropping the CAPTCHAs such as only the contour areas are preserved. We begin feature extraction by loading and transforming images:

In [1]:
import cv2

def load_transform_image(image_path):
    """ Load and transform image into grayscale. """
    # 1) Load image with OpenCV
    image = cv2.imread(image_path)

    # 2) Convert image to grayscale
    image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    # 3) Add extra padding (8px) around the image
    image_padded = cv2.copyMakeBorder(image_gray, 8, 8, 8, 8, cv2.BORDER_REPLICATE)

    return image_padded

captcha_images = [load_transform_image(image_path) for image_path in captcha_image_paths]

# Review loaded CAPTCHAs (skip visualization due to matplotlib issues)
print(f"✓ Loaded {len(captcha_images)} CAPTCHA images")
# print_images(
#     captcha_images[:10], n_rows=2, texts=captcha_texts[:10]
# )

NameError: name 'captcha_image_paths' is not defined

Next, we will split our dataset into train-validation set and test set. The former set will be used for training and validation in deep character classification model, while the latter will be used for testing our CAPTCHA recognition pipline end-to-end:

In [None]:
# Train-validation-test split seed
TVT_SPLIT_SEED = 31528476

# Perform split on CAPTCHA images as well as labels
captcha_images_tv, captcha_images_test, captcha_texts_tv, captcha_texts_test = train_test_split(
    captcha_images, captcha_texts, test_size=0.2, random_state=TVT_SPLIT_SEED
)

print("Train-validation:", len(captcha_texts_tv))
print("Test:", len(captcha_texts_test))

## Bounding Box Extraction
It's now time to perform the most important feature extraction step: finding contours and extracting characters. Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. It is useful for shape analysis and object detection and recognition. For our task however, we are **more interested in the bounding boxes around characters**, since these are the part of images we will be used for character classification.

After these steps, we have transformed CAPTCHA images into images of single character. This simplifies our task since now our model only needs to deal with classification (from character image to character itself) rather than also dealing with detection (finding and extracting charatcers).

In [None]:
# Character images folder template
CHAR_IMAGE_FOLDER = f"./char-images-{TVT_SPLIT_SEED}"

def extract_chars(image):
    """ Find contours and extract characters inside each CAPTCHA. """
    # Threshold image and convert it to black-white
    image_bw = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
    # Find contours (continuous blobs of pixels) the image
    contours = cv2.findContours(image_bw, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]

    char_regions = []
    # Loop through each contour
    for contour in contours:
        # Get the rectangle that contains the contour
        x, y, w, h = cv2.boundingRect(contour)

        # Compare the width and height of the bounding box,
        # detect if there are letters conjoined into one chunk
        if w / h > 1.25:
            # Bounding box is too wide for a single character
            # Split it in half into two letter regions
            half_width = int(w / 2)
            char_regions.append((x, y, half_width, h))
            char_regions.append((x + half_width, y, half_width, h))
        else:
            # Only a single letter in contour
            char_regions.append((x, y, w, h))

    # Ignore image if less or more than 4 regions detected
    if len(char_regions)!=4:
        return None
    # Sort regions by their X coordinates
    char_regions.sort(key=lambda x: x[0])

    # Character images
    char_images = []
    # Save each character as a single image
    for x, y, w, h in char_regions:
        # Extract character from image with 2px margin
        char_image = image[y - 2:y + h + 2, x - 2:x + w + 2]
        # Save character images
        char_images.append(char_image)

    # Return character images
    return char_images

def save_chars(char_images, captcha_text, save_dir, char_counts):
    """ Save character images to directory. """
    for char_image, char in zip(char_images, captcha_text):
        # Get the folder to save the image in
        save_path = os.path.join(save_dir, char)
        os.makedirs(save_path, exist_ok=True)

        # Write letter image to file
        char_count = char_counts.get(char, 1)
        char_image_path = os.path.join(save_path, f"{char_count}.png")
        cv2.imwrite(char_image_path, char_image)

        # Update count
        char_counts[char] = char_count+1

# Force character extraction even if results are already available
FORCE_EXTRACT_CHAR = False

char_counts = {}
# Extract and save images for characters
if FORCE_EXTRACT_CHAR or not os.path.exists(CHAR_IMAGE_FOLDER):
    for captcha_image, captcha_text in zip(captcha_images_tv, captcha_texts_tv):
        # Extract character images
        char_images = extract_chars(captcha_image)
        # Skip if extraction failed
        if char_images is None:
            continue
        # Save character images
        save_chars(char_images, captcha_text, CHAR_IMAGE_FOLDER, char_counts)

## Label Encoding
During the training stage, we are going to load character images from previous stages as features and generate corresponding labels from their path. We will then rescale features, one-hot encode labels (occurred characters) and save labels to an external file.

In [None]:
# Path of occurred characters (labels)
LABELS_PATH = "./labels_pytorch.pkl"

def make_feature(image):
    """ Process character image and turn it into feature. """
    # Resize letter to 20*20
    image_resized = resize_to_fit(image, 20, 20)
    # Add extra dimension as the only channel
    feature = image_resized[..., None]

    return feature

def make_feature_label(image_path):
    """ Load character image and make feature-label pair from image path. """
    # Load image and make feature
    feature = make_feature(cv2.imread(image_path, cv2.COLOR_BGR2GRAY))
    # Extract label based on the directory the image is in
    label = image_path.split(os.path.sep)[-2]

    return feature, label

# Make features and labels from character image paths
features_tv, labels_tv = unzip((
    make_feature_label(image_path) for image_path in paths.list_images(CHAR_IMAGE_FOLDER)
))

# Scale raw pixel values into range [0, 1]
features_tv = np.array(features_tv, dtype="float")/255
# Convert labels into one-hot encodings
lb = LabelBinarizer()
labels_one_hot_tv = lb.fit_transform(labels_tv)
# Number of classes
n_classes = len(lb.classes_)

# Further split the training data into training and validation set
X_train, X_vali, y_train, y_vali = train_test_split(
    features_tv, labels_one_hot_tv, test_size=0.25, random_state=955996
)
# Save mapping from labels to one-hot encoding
with open(LABELS_PATH, "wb") as f:
    pickle.dump(lb, f)

# Training
Next, we build a Convolutional Neural Network (CNN) as our classification model with PyTorch. The structure of the neural network is the same as the TensorFlow version:
- First conv block: Conv2D(20 channels) + ReLU + MaxPooling2D
- Second conv block: Conv2D(50 channels) + ReLU + MaxPooling2D
- Flatten layer
- Dense(500) + ReLU
- Dense(n_classes) + Softmax

After building the neural network, we train it and save weights.

In [None]:
# Batch size
BATCH_SIZE = 32
# Number of epochs
N_EPOCHS = 10

# Path of model weights file
MODEL_WEIGHTS_PATH = "./captcha-model-pytorch.pth"
# Force training even if weights are already available
FORCE_TRAINING = True

# Define the CNN model in PyTorch
class CaptchaCNN(nn.Module):
    def __init__(self, num_classes):
        super(CaptchaCNN, self).__init__()
        
        # First convolution block: 20 channels, 5x5 kernel, ReLU, same padding
        self.conv1 = nn.Conv2d(1, 20, kernel_size=5, padding=2)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Second convolution block: 50 channels, 5x5 kernel, ReLU, same padding
        self.conv2 = nn.Conv2d(20, 50, kernel_size=5, padding=2)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Flatten and fully connected layers
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(50 * 5 * 5, 500)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(500, num_classes)
    
    def forward(self, x):
        # First conv block
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)  # (*, 20, 10, 10)
        
        # Second conv block
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.pool2(x)  # (*, 50, 5, 5)
        
        # Flatten
        x = self.flatten(x)  # (*, 1250)
        
        # Fully connected layers
        x = self.fc1(x)
        x = self.relu3(x)
        x = self.fc2(x)
        
        return x

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Create model
model = CaptchaCNN(n_classes).to(device)
print(model)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

# Convert numpy arrays to torch tensors
X_train_torch = torch.FloatTensor(X_train).to(device)
y_train_torch = torch.FloatTensor(y_train).to(device)
X_vali_torch = torch.FloatTensor(X_vali).to(device)
y_vali_torch = torch.FloatTensor(y_vali).to(device)

# Create data loaders
train_dataset = TensorDataset(X_train_torch, y_train_torch)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)

vali_dataset = TensorDataset(X_vali_torch, y_vali_torch)
vali_loader = DataLoader(vali_dataset, batch_size=BATCH_SIZE, shuffle=False)

# Training function
def train_epoch(model, train_loader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    
    for batch_X, batch_y in train_loader:
        batch_X = batch_X.to(device)
        batch_y = batch_y.to(device)
        
        # Forward pass
        outputs = model(batch_X)
        loss = criterion(outputs, torch.argmax(batch_y, dim=1))
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        
        # Calculate accuracy
        _, predicted = torch.max(outputs, 1)
        total += batch_y.size(0)
        correct += (predicted == torch.argmax(batch_y, dim=1)).sum().item()
    
    avg_loss = total_loss / len(train_loader)
    accuracy = correct / total
    return avg_loss, accuracy

# Validation function
def validate(model, vali_loader, criterion, device):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for batch_X, batch_y in vali_loader:
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)
            
            outputs = model(batch_X)
            loss = criterion(outputs, torch.argmax(batch_y, dim=1))
            
            total_loss += loss.item()
            
            _, predicted = torch.max(outputs, 1)
            total += batch_y.size(0)
            correct += (predicted == torch.argmax(batch_y, dim=1)).sum().item()
    
    avg_loss = total_loss / len(vali_loader)
    accuracy = correct / total
    return avg_loss, accuracy

# Train the model
if FORCE_TRAINING or not os.path.exists(MODEL_WEIGHTS_PATH):
    print(f"Training for {N_EPOCHS} epochs...")
    for epoch in range(N_EPOCHS):
        train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
        vali_loss, vali_acc = validate(model, vali_loader, criterion, device)
        print(f"Epoch {epoch+1}/{N_EPOCHS} - Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, Vali Loss: {vali_loss:.4f}, Vali Acc: {vali_acc:.4f}")
    
    # Save model weights
    torch.save(model.state_dict(), MODEL_WEIGHTS_PATH)
    print(f"Model weights saved to {MODEL_WEIGHTS_PATH}")
else:
    model.load_state_dict(torch.load(MODEL_WEIGHTS_PATH, map_location=device))
    print(f"Model weights loaded from {MODEL_WEIGHTS_PATH}")

# Evaluation
During the training part, we have validated the performance of our neural network model on images of single characters. Now it's time to test and evaluate CAPTCHAs from the beginning to the end. First, we will need to build the pipeline for CAPTCHA character prediction:

In [None]:
# Load labels from file (so we can translate model predictions to actual letters)
with open(LABELS_PATH, "rb") as f:
    lb = pickle.load(f)

# Test our pipeline (and model) with the test set.
# However, you'd want to replace this with some random CAPTCHAs in the real world.

# Dummy character images
DUMMY_CHAR_IMAGES = np.zeros((4, 20, 20, 1))

# Indices of CAPTCHAs on which extractions failed
extract_failed_indices = []
# Extracted character images
char_images_test = []

# Extract character images and make features
for i, captcha_image in enumerate(captcha_images_test):
    # Extract character images
    char_images = extract_chars(captcha_image)

    if char_images:
        char_images_test.extend(char_images)
    # Use dummy character images as placeholder if extraction failed
    else:
        extract_failed_indices.append(i)
        char_images_test.extend(DUMMY_CHAR_IMAGES)

# Make features for character images
features_test = [make_feature(char_image) for char_image in char_images_test]
# Scale raw pixel values into range [0, 1]
features_test = np.array(features_test, dtype="float")/255

# Convert to torch tensor
features_test_torch = torch.FloatTensor(features_test).to(device)

# Predict labels with neural network
model.eval()
with torch.no_grad():
    preds_test = model(features_test_torch)
    preds_test = torch.softmax(preds_test, dim=1).cpu().numpy()

# Convert predictions to class indices
preds_test = np.argmax(preds_test, axis=1)
# Convert class indices to actual characters
preds_test = lb.inverse_transform(preds_test)

# Group all 4 characters for the same CAPTCHA
preds_test = ["".join(chars) for chars in group_every(preds_test, 4)]
# Update result for CAPTCHAs on which extractions failed
for i in extract_failed_indices:
    preds_test[i] = "-"

Now, we can compute the accuracy of our pipeline, as well as taking a look at correct and incorrect CAPTCHA text predictions:

In [None]:
# Number of CAPTCHAs to display
N_DISPLAY_SAMPLES = 10

# Number of test CAPTCHAs
n_test = len(captcha_texts_test)
# Number of correct predictions
n_correct = 0

# Indices of correct predictions
correct_indices = []
# Indices of incorrect predictions
incorrect_indices = []

for i, (pred_text, actual_text) in enumerate(zip(preds_test, captcha_texts_test)):
    if pred_text==actual_text:
        # 1) Update number of correct predictions
        n_correct += 1
        # 2) Collect index of correct prediction
        if len(correct_indices)<N_DISPLAY_SAMPLES:
            correct_indices.append(i)
    else:
        # 3) Collect index of incorrect prediction
        if len(incorrect_indices)<N_DISPLAY_SAMPLES:
            incorrect_indices.append(i)

# Show number of total / correct predictions and accuracy
print("# of test CAPTCHAs:", n_test)
print("# correctly recognized:", n_correct)
print("Accuracy:", n_correct/n_test, "\n")

# Visualization disabled due to matplotlib issues
# Uncomment below if matplotlib becomes available
# print_images(
#     [captcha_images_test[i] for i in correct_indices],
#     texts=[f"Correct: {captcha_texts_test[i]}" for i in correct_indices],
#     n_rows=2
# )
# print_images(
#     [captcha_images_test[i] for i in incorrect_indices],
#     texts=[
#         f"Prediction: {preds_test[i]}\nActual: {captcha_texts_test[i]}" \
#         for i in incorrect_indices
#     ],
#     n_rows=2,
#     fig_size=(20, 6),
#     text_center=(0.5, -0.25)
# )

print(f"\nCorrect predictions: {correct_indices[:N_DISPLAY_SAMPLES]}")
print(f"Incorrect predictions: {incorrect_indices[:N_DISPLAY_SAMPLES]}")

## Comparison with TensorFlow Results

Both the TensorFlow and PyTorch implementations follow the exact same architecture and training procedure:
- Same preprocessing and data split
- Same CNN architecture (2 conv blocks, 2 fully connected layers)
- Same batch size (32) and number of epochs (10)
- Same optimizer (Adam) and loss function (cross entropy)

The results should be similar, with minor differences expected due to randomness and implementation details in the respective frameworks.

## References
1. How to break a CAPTCHA system in 15 minutes with Machine Learning: https://medium.com/@ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710
2. CaptchaSolver Jupyter Notebook: https://github.com/BenjaminWegener/CaptchaSolver
3. PyTorch Documentation: https://pytorch.org/docs/stable/index.html
4. Keras Tutorial: The Ultimate Beginner's Guide to Deep Learning in Python: https://elitedatascience.com/keras-tutorial-deep-learning-in-python
5. Tensorflow API reference: https://www.tensorflow.org/api_docs