# Captcha Recognition Model Training

This notebook demonstrates how to build and train a multi-output Convolutional Neural Network (CNN) for solving captchas. Each captcha is a 5-character alphanumeric string, and the images (stored in `/captchas`) are paired with their labels in `captchas.csv` (which has columns `uniq_id` and `captcha_answer`).

In [None]:
import os
import numpy as np
import pandas as pd
import cv2

from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.utils import Sequence
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import load_model

from sklearn.model_selection import train_test_split

## Helper Function: Mapping Characters to Indices

The captcha characters are digits (`0-9`) and lowercase letters (`a-z`), giving 36 classes. This helper function maps a given character to its corresponding index.

In [None]:
def char_to_index(char):
    """
    Maps a character to an index:
    - Digits 0-9 map to indices 0-9
    - Letters a-z map to indices 10-35
    """
    if char.isdigit():
        return ord(char) - ord("0")
    else:
        return 10 + ord(char.upper()) - ord("a")


IMG_HEIGHT, IMG_WIDTH = 50, 250
NUM_CLASSES = 36  # 10 digits + 26 letters

## Captcha Data Generator

A custom Keras data generator is implemented by subclassing `Sequence` to dynamically load and preprocess images along with their corresponding labels. Each image undergoes normalization and resizing, while the associated label, a five-character string, is transformed into five one-hot encoded vectors, with each vector representing an individual character.


In [None]:
class CaptchaDataGenerator(Sequence):
    def __init__(self, df, batch_size, img_dir, img_height, img_width, num_classes, shuffle=True):
        self.df = df.reset_index(drop=True)
        self.batch_size = batch_size
        self.img_dir = img_dir
        self.img_height = img_height
        self.img_width = img_width
        self.num_classes = num_classes
        self.shuffle = shuffle
        self.indices = np.arange(len(df))
        self.on_epoch_end()

    def __len__(self):
        return int(np.ceil(len(self.df) / self.batch_size))

    def on_epoch_end(self):
        if self.shuffle:
            np.random.shuffle(self.indices)

    def __getitem__(self, index):
        batch_indices = self.indices[index * self.batch_size : (index + 1) * self.batch_size]
        batch_data = self.df.iloc[batch_indices]

        # Initialize an array for images (3 channels assumed)
        X = np.zeros((len(batch_data), self.img_height, self.img_width, 3), dtype=np.float32)

        # Create 5 outputs, one for each character in the captcha
        Y = [np.zeros((len(batch_data), self.num_classes), dtype=np.float32) for _ in range(5)]

        for i, (_, row) in enumerate(batch_data.iterrows()):
            uniq_id = row["uniq_id"]
            answer = row["captcha_answer"]
            img_path = os.path.join(self.img_dir, f"{uniq_id}.png")
            img = cv2.imread(img_path)
            if img is None:
                raise ValueError(f"Image not found: {img_path}")
            img = cv2.resize(img, (self.img_width, self.img_height))
            img = img.astype(np.float32) / 255.0  # normalize
            X[i] = img

            # Convert each character into a one-hot vector
            for j, char in enumerate(answer):
                idx = char_to_index(char)
                Y[j][i, idx] = 1.0

        return X, Y

## Build the Multi-Output CNN Model

A convolutional neural network (CNN) is constructed with a shared convolutional backbone. The architecture then diverges into five fully connected layers, each corresponding to a single character in the captcha. Each of these layers utilizes a softmax activation function to predict one character from a set of 36 possible classes.


In [None]:
def build_model(img_height, img_width, num_classes):
    input_img = Input(shape=(img_height, img_width, 3))

    # Shared convolutional layers
    x = Conv2D(32, (3, 3), activation="relu", padding="same")(input_img)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Conv2D(64, (3, 3), activation="relu", padding="same")(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Conv2D(128, (3, 3), activation="relu", padding="same")(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Flatten()(x)
    x = Dropout(0.5)(x)

    # Create 5 separate branches (one per character)
    outputs = []
    for i in range(5):
        fc = Dense(128, activation="relu")(x)
        out = Dense(num_classes, activation="softmax", name=f"char_{i+1}")(fc)
        outputs.append(out)

    model = Model(inputs=input_img, outputs=outputs)
    return model


model = build_model(IMG_HEIGHT, IMG_WIDTH, NUM_CLASSES)
model.compile(optimizer=Adam(), loss="categorical_crossentropy", metrics=["accuracy"])
model.summary()

## Load Pre-Trained Model

In [None]:
MODEL_PATH = "captcha_model.keras"
model = load_model(MODEL_PATH)

print("Model loaded successfully from", MODEL_PATH)

## Load Data and Train the Model

The CSV file `captchas.csv` is loaded, after which the data is split into training and validation sets. Data generators are then created for each set, and the model is subsequently trained.


In [None]:
df = pd.read_csv("./captchas.csv")

train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)

train_gen = CaptchaDataGenerator(
    train_df,
    batch_size=32,
    img_dir="./captchas",
    img_height=IMG_HEIGHT,
    img_width=IMG_WIDTH,
    num_classes=NUM_CLASSES,
    shuffle=True,
)
val_gen = CaptchaDataGenerator(
    val_df,
    batch_size=32,
    img_dir="./captchas",
    img_height=IMG_HEIGHT,
    img_width=IMG_WIDTH,
    num_classes=NUM_CLASSES,
    shuffle=False,
)

early_stopping = EarlyStopping(monitor="val_loss", min_delta=0.001, patience=5, verbose=1)

history = model.fit(train_gen, validation_data=val_gen, epochs=10, callbacks=[early_stopping])

## Visualizing Training History

The following code visualizes the overall loss, aggregated across all outputs, along with the accuracy curves for each of the five output branches, corresponding to individual captcha characters.


In [None]:
import matplotlib.pyplot as plt

# --------------------------
# Plot Overall Loss
# --------------------------
plt.figure(figsize=(10, 5))
plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Overall Training and Validation Loss")
plt.legend()
plt.grid(True)
plt.show()

# --------------------------
# Plot Accuracy for Each Output Branch
# --------------------------
for i in range(5):
    train_key = f"char_{i+1}_accuracy"
    val_key = f"val_char_{i+1}_accuracy"

    plt.figure(figsize=(10, 5))
    plt.plot(history.history[train_key], label=f"Training Accuracy - Char {i+1}")
    plt.plot(history.history[val_key], label=f"Validation Accuracy - Char {i+1}")
    plt.xlabel("Epochs")
    plt.ylabel("Accuracy")
    plt.title(f"Accuracy for Captcha Character {i+1}")
    plt.legend()
    plt.grid(True)
    plt.show()

## Saving the Model


In [None]:
model.save("./captcha_model.keras")
print("Model saved successfully to 'captcha_model.keras'!")