<a href="https://colab.research.google.com/github/Leerish/Captcha-Decoding-using-TensorFlow/blob/main/Captcha_Decoding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Captcha to Text OCR using TensorFlow


**Introduction:**

**Captcha (Completely Automated Public Turing test to tell Computers and Humans Apart)** is a challenge-response test used in computing to determine whether or not the user is human. It typically involves presenting the user with a distorted image of text, and the user must correctly transcribe the text to prove they are human. Captchas are commonly used in online forms, website sign-ups, and account logins to prevent automated bots from abusing or spamming the system.


The model is designed to read and interpret captcha images using TensorFlow. It utilizes deep learning techniques, particularly convolutional neural networks (CNNs), to recognize the distorted text within captcha images. By training on a dataset of labeled captcha images, the model learns to extract features and classify the characters present in the images. Once trained, the model can then be used to predict the text present in unseen captcha images.

Key components of the captcha reading model may include:

1. **Data Preprocessing**: Captcha images are preprocessed to enhance their quality and make them suitable for training. Preprocessing steps may include resizing, normalization, noise reduction, and augmentation.

2. **Model Architecture**: The model architecture defines the structure of the neural network used for captcha recognition. It typically consists of multiple layers, including convolutional layers for feature extraction, pooling layers for downsampling, and fully connected layers for classification.

3. **Training**: The model is trained on a dataset of labeled captcha images. During training, the model learns to minimize the difference between its predictions and the ground truth labels using optimization algorithms like stochastic gradient descent (SGD) or Adam.

4. **Evaluation**: The trained model is evaluated on a separate validation dataset to assess its performance. Evaluation metrics such as accuracy, precision, recall, and F1 score are used to measure the model's effectiveness in captcha recognition.

5. **Deployment**: Once trained and evaluated, the model can be deployed in production environments to read captcha images in real-time. This may involve integrating the model into web applications, APIs, or other systems where captcha verification is required.

Overall, the captcha reading model using TensorFlow demonstrates the application of deep learning techniques in solving practical challenges related to computer vision and security. It provides an automated solution for reading and interpreting captcha images, contributing to improved user experience and security in online applications.

In [1]:
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py

--2024-04-17 14:50:40--  https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10246 (10K) [text/plain]
Saving to: ‘helper_functions.py’


2024-04-17 14:50:40 (20.2 MB/s) - ‘helper_functions.py’ saved [10246/10246]



##Setting up initial Configurations

In [4]:
import os
from datetime import datetime

from mltu.configs import BaseModelConfigs


class ModelConfigs(BaseModelConfigs):
    def __init__(self):
        super().__init__()
        self.model_path = os.path.join("Models/02_captcha_to_text", datetime.strftime(datetime.now(), "%Y%m%d%H%M"))
        self.vocab = ""
        self.height = 50
        self.width = 200
        self.max_text_length = 0
        self.batch_size = 64
        self.learning_rate = 1e-3
        self.train_epochs = 70
        self.train_workers = 20

## Installing Dependencies

In [4]:
! pip install stow

Collecting stow
  Downloading stow-1.3.1-py3-none-any.whl (74 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.8/74.8 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: stow
Successfully installed stow-1.3.1


In [1]:
!pip install mltu==0.1.4



## Data Preprocesing

In [2]:
import stow
from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile

def download_and_unzip(url, extract_to='Datasets'):
    http_response = urlopen(url)
    zipfile = ZipFile(BytesIO(http_response.read()))
    zipfile.extractall(path=extract_to)

if not stow.exists(stow.join('Datasets', 'captcha_images_v2')):
    download_and_unzip('https://github.com/AakashKumarNain/CaptchaCracker/raw/master/captcha_images_v2.zip', extract_to='Datasets')

In [6]:
dataset, vocab, max_len = [], set(), 0
captcha_path = os.path.join("Datasets", "captcha_images_v2")
for file in os.listdir(captcha_path):
    file_path = os.path.join(captcha_path, file)
    label = os.path.splitext(file)[0] # Get the file name without the extension
    dataset.append([file_path, label])
    vocab.update(list(label))
    max_len = max(max_len, len(label))

configs = ModelConfigs()

# Save vocab and maximum text length to configs
configs.vocab = "".join(vocab)
configs.max_text_length = max_len
configs.save()

In [17]:
import tensorflow as tf
from keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau, TensorBoard

from mltu.dataProvider import DataProvider
from mltu.losses import CTCloss
from mltu.callbacks import Model2onnx, TrainLogger
from mltu.metrics import CWERMetric

from mltu.preprocessors import ImageReader
from mltu.transformers import ImageResizer, LabelIndexer, LabelPadding
from mltu.augmentors import RandomBrightness, RandomRotate, RandomErodeDilate

from model import train_model
from configs import ModelConfigs

import os
from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile


In [9]:
from mltu.dataProvider import DataProvider
from mltu.preprocessors import ImageReader
from mltu.transformers import ImageResizer, LabelIndexer, LabelPadding
from mltu.augmentors import RandomBrightness, RandomRotate, RandomErodeDilate

data_provider = DataProvider(
    dataset=dataset,
    skip_validation=True,
    batch_size=configs.batch_size,
    data_preprocessors=[ImageReader()],
    transformers=[
        ImageResizer(configs.width, configs.height),
        LabelIndexer(configs.vocab),
        LabelPadding(max_word_length=configs.max_text_length, padding_value=len(configs.vocab))
        ],
)

INFO:mltu.dataProvider:Skipping Dataset validation...


In [10]:
train_data_provider, val_data_provider = data_provider.split(split = 0.9)

In [11]:
train_data_provider.augmentors = [RandomBrightness(), RandomRotate(), RandomErodeDilate()]

## Training Model


In [12]:
model = train_model(
    input_dim = (configs.height, configs.width, 3),
    output_dim = len(configs.vocab),
)

In [13]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=configs.learning_rate),
    loss=CTCloss(),
    metrics=[CWERMetric()],
)

In [14]:
# Define callbacks
earlystopper = EarlyStopping(monitor='val_CER', patience=40, verbose=1)
checkpoint = ModelCheckpoint(f"{configs.model_path}/model.h5", monitor='val_CER', verbose=1, save_best_only=True, mode='min')
trainLogger = TrainLogger(configs.model_path)
tb_callback = TensorBoard(f'{configs.model_path}/logs', update_freq=1)
reduceLROnPlat = ReduceLROnPlateau(monitor='val_CER', factor=0.9, min_delta=1e-10, patience=20, verbose=1, mode='auto')
model2onnx = Model2onnx(f"{configs.model_path}/model.h5")

In [19]:
# Train the model
model.fit(
    train_data_provider,
    validation_data=val_data_provider,
    epochs=70,
    callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx],
    workers=configs.train_workers
)

# Save training and validation datasets as csv files
train_data_provider.to_csv(stow.join(configs.model_path, 'train.csv'))
val_data_provider.to_csv(stow.join(configs.model_path, 'val.csv'))

Epoch 1/70
Epoch 1: val_CER did not improve from 1.00000


INFO:root:Epoch 0; loss: 16.321199417114258; CER: 1.0; WER: 1.0; val_loss: 16.35167121887207; val_CER: 1.0; val_WER: 1.0


Epoch 2/70
Epoch 2: val_CER did not improve from 1.00000


INFO:root:Epoch 1; loss: 16.31488609313965; CER: 1.0; WER: 1.0; val_loss: 16.328990936279297; val_CER: 1.0; val_WER: 1.0


Epoch 3/70
Epoch 3: val_CER did not improve from 1.00000


INFO:root:Epoch 2; loss: 16.315914154052734; CER: 1.0; WER: 1.0; val_loss: 16.349382400512695; val_CER: 1.0; val_WER: 1.0


Epoch 4/70
Epoch 4: val_CER did not improve from 1.00000


INFO:root:Epoch 3; loss: 16.28876304626465; CER: 1.0; WER: 1.0; val_loss: 16.31635856628418; val_CER: 1.0; val_WER: 1.0


Epoch 5/70
Epoch 5: val_CER did not improve from 1.00000


INFO:root:Epoch 4; loss: 16.29865074157715; CER: 1.0; WER: 1.0; val_loss: 16.32975196838379; val_CER: 1.0; val_WER: 1.0


Epoch 6/70
Epoch 6: val_CER did not improve from 1.00000


INFO:root:Epoch 5; loss: 16.280241012573242; CER: 1.0; WER: 1.0; val_loss: 16.33698081970215; val_CER: 1.0; val_WER: 1.0


Epoch 7/70
Epoch 7: val_CER did not improve from 1.00000


INFO:root:Epoch 6; loss: 16.23756217956543; CER: 1.0; WER: 1.0; val_loss: 16.319162368774414; val_CER: 1.0; val_WER: 1.0


Epoch 8/70
Epoch 8: val_CER did not improve from 1.00000


INFO:root:Epoch 7; loss: 16.18514060974121; CER: 1.0; WER: 1.0; val_loss: 16.334077835083008; val_CER: 1.0; val_WER: 1.0


Epoch 9/70
Epoch 9: val_CER did not improve from 1.00000


INFO:root:Epoch 8; loss: 16.07619285583496; CER: 1.0; WER: 1.0; val_loss: 16.247892379760742; val_CER: 1.0; val_WER: 1.0


Epoch 10/70
Epoch 10: val_CER did not improve from 1.00000


INFO:root:Epoch 9; loss: 15.885340690612793; CER: 1.0; WER: 1.0; val_loss: 16.239238739013672; val_CER: 1.0; val_WER: 1.0


Epoch 11/70
Epoch 11: val_CER did not improve from 1.00000


INFO:root:Epoch 10; loss: 15.597818374633789; CER: 1.0; WER: 1.0; val_loss: 16.165714263916016; val_CER: 1.0; val_WER: 1.0


Epoch 12/70
Epoch 12: val_CER did not improve from 1.00000


INFO:root:Epoch 11; loss: 15.177270889282227; CER: 0.9982905387878418; WER: 1.0; val_loss: 15.894679069519043; val_CER: 1.0; val_WER: 1.0


Epoch 13/70
Epoch 13: val_CER did not improve from 1.00000


INFO:root:Epoch 12; loss: 14.722421646118164; CER: 0.9931623935699463; WER: 1.0; val_loss: 14.561226844787598; val_CER: 1.0; val_WER: 1.0


Epoch 14/70
Epoch 14: val_CER improved from 1.00000 to 0.95000, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 13; loss: 14.11832332611084; CER: 0.9589743614196777; WER: 1.0; val_loss: 14.462864875793457; val_CER: 0.9500000476837158; val_WER: 1.0


Epoch 15/70
Epoch 15: val_CER improved from 0.95000 to 0.92500, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 14; loss: 13.437328338623047; CER: 0.8993590474128723; WER: 1.0; val_loss: 17.25698471069336; val_CER: 0.9249999523162842; val_WER: 1.0


Epoch 16/70
Epoch 16: val_CER did not improve from 0.92500


INFO:root:Epoch 15; loss: 12.786291122436523; CER: 0.8497862815856934; WER: 1.0; val_loss: 19.582263946533203; val_CER: 0.9249999523162842; val_WER: 1.0


Epoch 17/70
Epoch 17: val_CER improved from 0.92500 to 0.91538, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 16; loss: 11.98105239868164; CER: 0.7886751890182495; WER: 1.0; val_loss: 21.24038314819336; val_CER: 0.9153845906257629; val_WER: 1.0


Epoch 18/70
Epoch 18: val_CER did not improve from 0.91538


INFO:root:Epoch 17; loss: 11.297892570495605; CER: 0.7566239237785339; WER: 1.0; val_loss: 23.39720916748047; val_CER: 0.9307692646980286; val_WER: 1.0


Epoch 19/70
Epoch 19: val_CER improved from 0.91538 to 0.91346, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 18; loss: 10.674028396606445; CER: 0.7119658589363098; WER: 1.0; val_loss: 21.676023483276367; val_CER: 0.9134615659713745; val_WER: 1.0


Epoch 20/70
Epoch 20: val_CER improved from 0.91346 to 0.83269, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 19; loss: 10.04058837890625; CER: 0.6645299792289734; WER: 0.9989316463470459; val_loss: 15.628806114196777; val_CER: 0.8326922655105591; val_WER: 1.0


Epoch 21/70
Epoch 21: val_CER did not improve from 0.83269


INFO:root:Epoch 20; loss: 9.427144050598145; CER: 0.6185897588729858; WER: 0.995726466178894; val_loss: 32.5456657409668; val_CER: 0.9442307353019714; val_WER: 1.0


Epoch 22/70
Epoch 22: val_CER did not improve from 0.83269


INFO:root:Epoch 21; loss: 8.841062545776367; CER: 0.5829060077667236; WER: 0.995726466178894; val_loss: 31.69342613220215; val_CER: 0.9403846263885498; val_WER: 1.0


Epoch 23/70
Epoch 23: val_CER did not improve from 0.83269


INFO:root:Epoch 22; loss: 8.218981742858887; CER: 0.547435998916626; WER: 0.9903846383094788; val_loss: 31.902193069458008; val_CER: 0.9307692646980286; val_WER: 1.0


Epoch 24/70
Epoch 24: val_CER did not improve from 0.83269


INFO:root:Epoch 23; loss: 7.686049461364746; CER: 0.49871793389320374; WER: 0.9775640964508057; val_loss: 33.69876480102539; val_CER: 0.934615433216095; val_WER: 1.0


Epoch 25/70
Epoch 25: val_CER did not improve from 0.83269


INFO:root:Epoch 24; loss: 7.1088714599609375; CER: 0.45876067876815796; WER: 0.9540598392486572; val_loss: 36.56772994995117; val_CER: 0.9480769634246826; val_WER: 1.0


Epoch 26/70
Epoch 26: val_CER did not improve from 0.83269


INFO:root:Epoch 25; loss: 6.566474437713623; CER: 0.4239315986633301; WER: 0.9412392973899841; val_loss: 36.87615203857422; val_CER: 0.934615433216095; val_WER: 1.0


Epoch 27/70
Epoch 27: val_CER did not improve from 0.83269


INFO:root:Epoch 26; loss: 5.9743733406066895; CER: 0.3790598213672638; WER: 0.9209401607513428; val_loss: 40.6323356628418; val_CER: 0.9480769038200378; val_WER: 1.0


Epoch 28/70
Epoch 28: val_CER did not improve from 0.83269


INFO:root:Epoch 27; loss: 5.488729476928711; CER: 0.34294870495796204; WER: 0.8846153616905212; val_loss: 38.77775955200195; val_CER: 0.9615384340286255; val_WER: 1.0


Epoch 29/70
Epoch 29: val_CER did not improve from 0.83269


INFO:root:Epoch 28; loss: 5.038846492767334; CER: 0.3053418695926666; WER: 0.8344017267227173; val_loss: 40.12788391113281; val_CER: 0.9865384697914124; val_WER: 1.0


Epoch 30/70
Epoch 30: val_CER did not improve from 0.83269


INFO:root:Epoch 29; loss: 4.647592544555664; CER: 0.28803423047065735; WER: 0.8301281929016113; val_loss: 37.050785064697266; val_CER: 0.9461538791656494; val_WER: 1.0


Epoch 31/70
Epoch 31: val_CER did not improve from 0.83269


INFO:root:Epoch 30; loss: 4.27063512802124; CER: 0.2587606906890869; WER: 0.7873931527137756; val_loss: 40.24337387084961; val_CER: 0.9711538553237915; val_WER: 1.0


Epoch 32/70
Epoch 32: val_CER did not improve from 0.83269


INFO:root:Epoch 31; loss: 3.943861484527588; CER: 0.2410256266593933; WER: 0.754273533821106; val_loss: 40.43387222290039; val_CER: 0.942307710647583; val_WER: 1.0


Epoch 33/70
Epoch 33: val_CER did not improve from 0.83269


INFO:root:Epoch 32; loss: 3.5873095989227295; CER: 0.21645300090312958; WER: 0.7190170884132385; val_loss: 42.04409408569336; val_CER: 0.9576922655105591; val_WER: 1.0


Epoch 34/70
Epoch 34: val_CER did not improve from 0.83269


INFO:root:Epoch 33; loss: 3.39909029006958; CER: 0.20213676989078522; WER: 0.688034176826477; val_loss: 41.32634353637695; val_CER: 0.9673077464103699; val_WER: 1.0


Epoch 35/70
Epoch 35: val_CER did not improve from 0.83269


INFO:root:Epoch 34; loss: 3.1768181324005127; CER: 0.18888887763023376; WER: 0.6399572491645813; val_loss: 32.19667434692383; val_CER: 0.9711538553237915; val_WER: 1.0


Epoch 36/70
Epoch 36: val_CER did not improve from 0.83269


INFO:root:Epoch 35; loss: 2.9088709354400635; CER: 0.1754273623228073; WER: 0.627136766910553; val_loss: 35.311038970947266; val_CER: 1.076923131942749; val_WER: 1.0


Epoch 37/70
Epoch 37: val_CER did not improve from 0.83269


INFO:root:Epoch 36; loss: 2.7005527019500732; CER: 0.1536324918270111; WER: 0.5758547186851501; val_loss: 35.173500061035156; val_CER: 0.9615384340286255; val_WER: 1.0


Epoch 38/70
Epoch 38: val_CER did not improve from 0.83269


INFO:root:Epoch 37; loss: 2.53662371635437; CER: 0.14829060435295105; WER: 0.5512820482254028; val_loss: 25.88795280456543; val_CER: 0.876923143863678; val_WER: 1.0


Epoch 39/70
Epoch 39: val_CER did not improve from 0.83269


INFO:root:Epoch 38; loss: 2.3276312351226807; CER: 0.14017093181610107; WER: 0.5213675498962402; val_loss: 38.813804626464844; val_CER: 1.0211539268493652; val_WER: 1.0


Epoch 40/70
Epoch 40: val_CER did not improve from 0.83269


INFO:root:Epoch 39; loss: 2.1103103160858154; CER: 0.1220085620880127; WER: 0.4839743673801422; val_loss: 35.80543899536133; val_CER: 1.0692307949066162; val_WER: 1.0



Epoch 40: ReduceLROnPlateau reducing learning rate to 0.0009000000427477062.
Epoch 41/70
Epoch 41: val_CER did not improve from 0.83269


INFO:root:Epoch 40; loss: 2.0186805725097656; CER: 0.11880341917276382; WER: 0.46581196784973145; val_loss: 20.97980499267578; val_CER: 0.8423076868057251; val_WER: 1.0


Epoch 42/70
Epoch 42: val_CER did not improve from 0.83269


INFO:root:Epoch 41; loss: 1.968104362487793; CER: 0.11217950284481049; WER: 0.45085468888282776; val_loss: 22.076566696166992; val_CER: 0.857692301273346; val_WER: 1.0


Epoch 43/70
Epoch 43: val_CER improved from 0.83269 to 0.83077, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 42; loss: 1.8035746812820435; CER: 0.09615384787321091; WER: 0.3878205120563507; val_loss: 19.839141845703125; val_CER: 0.8307692408561707; val_WER: 1.0


Epoch 44/70
Epoch 44: val_CER improved from 0.83077 to 0.63846, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 43; loss: 1.7036070823669434; CER: 0.09935897588729858; WER: 0.40918803215026855; val_loss: 13.076189994812012; val_CER: 0.6384615302085876; val_WER: 0.9903846383094788


Epoch 45/70
Epoch 45: val_CER improved from 0.63846 to 0.63462, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 44; loss: 1.5630770921707153; CER: 0.08504272997379303; WER: 0.3611111044883728; val_loss: 14.755290031433105; val_CER: 0.6346153616905212; val_WER: 0.9807692170143127


Epoch 46/70
Epoch 46: val_CER did not improve from 0.63462


INFO:root:Epoch 45; loss: 1.473323941230774; CER: 0.08119659870862961; WER: 0.345085471868515; val_loss: 16.243385314941406; val_CER: 0.6865384578704834; val_WER: 1.0


Epoch 47/70
Epoch 47: val_CER did not improve from 0.63462


INFO:root:Epoch 46; loss: 1.3850525617599487; CER: 0.07649572193622589; WER: 0.3194444477558136; val_loss: 17.57558822631836; val_CER: 0.7653846740722656; val_WER: 1.0


Epoch 48/70
Epoch 48: val_CER improved from 0.63462 to 0.28269, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 47; loss: 1.3359936475753784; CER: 0.0724359005689621; WER: 0.31089743971824646; val_loss: 4.758038520812988; val_CER: 0.2826923131942749; val_WER: 0.8269230723381042


Epoch 49/70
Epoch 49: val_CER did not improve from 0.28269


INFO:root:Epoch 48; loss: 1.3142775297164917; CER: 0.06923076510429382; WER: 0.29914531111717224; val_loss: 5.233255863189697; val_CER: 0.3134615123271942; val_WER: 0.8653846383094788


Epoch 50/70
Epoch 50: val_CER did not improve from 0.28269


INFO:root:Epoch 49; loss: 1.2215579748153687; CER: 0.06474359333515167; WER: 0.2788461446762085; val_loss: 5.020917892456055; val_CER: 0.2826923131942749; val_WER: 0.875


Epoch 51/70
Epoch 51: val_CER improved from 0.28269 to 0.15000, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 50; loss: 1.1273868083953857; CER: 0.060256410390138626; WER: 0.26923078298568726; val_loss: 2.514333963394165; val_CER: 0.15000000596046448; val_WER: 0.5961538553237915


Epoch 52/70
Epoch 52: val_CER did not improve from 0.15000


INFO:root:Epoch 51; loss: 1.095484733581543; CER: 0.06111111119389534; WER: 0.2670940160751343; val_loss: 8.156960487365723; val_CER: 0.39615386724472046; val_WER: 0.9230769276618958


Epoch 53/70
Epoch 53: val_CER did not improve from 0.15000


INFO:root:Epoch 52; loss: 1.0195876359939575; CER: 0.05277778208255768; WER: 0.23397435247898102; val_loss: 3.0000431537628174; val_CER: 0.18269230425357819; val_WER: 0.682692289352417


Epoch 54/70
Epoch 54: val_CER did not improve from 0.15000


INFO:root:Epoch 53; loss: 0.9605056047439575; CER: 0.05021367967128754; WER: 0.2211538404226303; val_loss: 3.4005627632141113; val_CER: 0.19423078000545502; val_WER: 0.7115384340286255


Epoch 55/70
Epoch 55: val_CER improved from 0.15000 to 0.14423, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 54; loss: 0.9187546968460083; CER: 0.05106838047504425; WER: 0.23076923191547394; val_loss: 2.4711365699768066; val_CER: 0.14423076808452606; val_WER: 0.5480769276618958


Epoch 56/70
Epoch 56: val_CER did not improve from 0.14423


INFO:root:Epoch 55; loss: 0.9063342809677124; CER: 0.04850427806377411; WER: 0.21260683238506317; val_loss: 3.6957836151123047; val_CER: 0.20192307233810425; val_WER: 0.682692289352417


Epoch 57/70
Epoch 57: val_CER improved from 0.14423 to 0.11154, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 56; loss: 0.8463764786720276; CER: 0.045085471123456955; WER: 0.2083333283662796; val_loss: 2.0248324871063232; val_CER: 0.11153846234083176; val_WER: 0.4711538553237915


Epoch 58/70
Epoch 58: val_CER improved from 0.11154 to 0.06346, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 57; loss: 0.7722242474555969; CER: 0.0386752113699913; WER: 0.1805555522441864; val_loss: 1.3447237014770508; val_CER: 0.0634615421295166; val_WER: 0.3076923191547394


Epoch 59/70
Epoch 59: val_CER did not improve from 0.06346


INFO:root:Epoch 58; loss: 0.7316760420799255; CER: 0.03995726630091667; WER: 0.18482905626296997; val_loss: 2.2887823581695557; val_CER: 0.126923069357872; val_WER: 0.5192307829856873


Epoch 60/70
Epoch 60: val_CER did not improve from 0.06346


INFO:root:Epoch 59; loss: 0.7123139500617981; CER: 0.03782051429152489; WER: 0.17200854420661926; val_loss: 2.64404296875; val_CER: 0.1538461595773697; val_WER: 0.6153846383094788


Epoch 61/70
Epoch 61: val_CER did not improve from 0.06346


INFO:root:Epoch 60; loss: 0.6528772115707397; CER: 0.03611111268401146; WER: 0.17200854420661926; val_loss: 3.0187125205993652; val_CER: 0.18269230425357819; val_WER: 0.682692289352417


Epoch 62/70
Epoch 62: val_CER did not improve from 0.06346


INFO:root:Epoch 61; loss: 0.6037291884422302; CER: 0.029487179592251778; WER: 0.13354700803756714; val_loss: 4.977357387542725; val_CER: 0.30384618043899536; val_WER: 0.8653846383094788


Epoch 63/70
Epoch 63: val_CER did not improve from 0.06346


INFO:root:Epoch 62; loss: 0.651523232460022; CER: 0.03611110895872116; WER: 0.1677350401878357; val_loss: 4.6210503578186035; val_CER: 0.26346153020858765; val_WER: 0.9134615659713745


Epoch 64/70
Epoch 64: val_CER did not improve from 0.06346


INFO:root:Epoch 63; loss: 0.6692413687705994; CER: 0.03910256549715996; WER: 0.17628204822540283; val_loss: 2.324664354324341; val_CER: 0.14038461446762085; val_WER: 0.5961538553237915


Epoch 65/70
Epoch 65: val_CER improved from 0.06346 to 0.05192, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 64; loss: 0.5601702332496643; CER: 0.029700858518481255; WER: 0.1367521435022354; val_loss: 1.2158493995666504; val_CER: 0.051923077553510666; val_WER: 0.25


Epoch 66/70
Epoch 66: val_CER improved from 0.05192 to 0.05000, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 65; loss: 0.5495352149009705; CER: 0.028205130249261856; WER: 0.13141025602817535; val_loss: 1.307845115661621; val_CER: 0.05000000447034836; val_WER: 0.24038460850715637


Epoch 67/70
Epoch 67: val_CER improved from 0.05000 to 0.04423, saving model to Models/02_captcha_to_text/202404171552/model.h5


INFO:root:Epoch 66; loss: 0.49626028537750244; CER: 0.025427350774407387; WER: 0.11965811997652054; val_loss: 0.9587940573692322; val_CER: 0.04423076659440994; val_WER: 0.21153846383094788


Epoch 68/70
Epoch 68: val_CER did not improve from 0.04423


INFO:root:Epoch 67; loss: 0.4678451716899872; CER: 0.02606837823987007; WER: 0.12393162399530411; val_loss: 1.35899817943573; val_CER: 0.05384615808725357; val_WER: 0.26923078298568726


Epoch 69/70
Epoch 69: val_CER did not improve from 0.04423


INFO:root:Epoch 68; loss: 0.46646228432655334; CER: 0.027350427582859993; WER: 0.13247863948345184; val_loss: 1.0080604553222656; val_CER: 0.04999999701976776; val_WER: 0.24038460850715637


Epoch 70/70
Epoch 70: val_CER did not improve from 0.04423


INFO:root:Epoch 69; loss: 0.5081627368927002; CER: 0.028418803587555885; WER: 0.1378205120563507; val_loss: 1.5910942554473877; val_CER: 0.07499999552965164; val_WER: 0.3461538553237915




INFO:tf2onnx.tfonnx:Using tensorflow=2.10.1, onnx=1.12.0, tf2onnx=1.14.0/8f8d49
INFO:tf2onnx.tfonnx:Using opset <onnx, 15>
INFO:tf2onnx.tf_utils:Computed 0 values for constant folding
INFO:tf2onnx.tf_utils:Computed 0 values for constant folding
INFO:tf2onnx.tf_utils:Computed 0 values for constant folding
INFO:tf2onnx.tf_utils:Computed 0 values for constant folding
INFO:tf2onnx.tf_utils:Computed 2 values for constant folding
INFO:tf2onnx.tfonnx:folding node using tf type=StridedSlice, name=model/bidirectional/forward_lstm/PartitionedCall/strided_slice
INFO:tf2onnx.tfonnx:folding node using tf type=StridedSlice, name=model/bidirectional/backward_lstm/PartitionedCall/strided_slice
INFO:tf2onnx.optimizer:Optimizing ONNX model
INFO:tf2onnx.optimizer:After optimization: BatchNormalization -18 (18->0), Cast -4 (11->7), Concat -4 (10->6), Const -128 (194->66), Expand -3 (4->1), Gather +2 (2->4), Identity -2 (2->0), Shape -1 (4->3), Slice -1 (5->4), Squeeze -3 (5->2), Transpose -83 (87->4), Uns

In [22]:
! pip install text-utils

Collecting text-utils
  Downloading text_utils-0.0.5-py3-none-any.whl (3.6 kB)
Installing collected packages: text-utils
Successfully installed text-utils-0.0.5


In [5]:
!pip install python-Levenshtein

Collecting python-Levenshtein
  Downloading python_Levenshtein-0.25.1-py3-none-any.whl (9.4 kB)
Collecting Levenshtein==0.25.1 (from python-Levenshtein)
  Downloading Levenshtein-0.25.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.4/177.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rapidfuzz<4.0.0,>=3.8.0 (from Levenshtein==0.25.1->python-Levenshtein)
  Downloading rapidfuzz-3.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rapidfuzz, Levenshtein, python-Levenshtein
Successfully installed Levenshtein-0.25.1 python-Levenshtein-0.25.1 rapidfuzz-3.8.1


# Verifying Results

In [5]:
import cv2
import typing
import numpy as np
import Levenshtein

from mltu.inferenceModel import OnnxInferenceModel

class ImageToWordModel(OnnxInferenceModel):
    def __init__(self, char_list: typing.Union[str, list], *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.char_list = char_list

    @staticmethod
    def ctc_decoder(preds, char_list):
        decoded_texts = []

        for pred in preds:
            decoded_text = ''
            previous_index = len(char_list)  # Initialize previous_index to an invalid value

            for timestep_pred in pred:
                max_index = np.argmax(timestep_pred)

                # Check if the max_index is valid and not the blank index
                if 0 <= max_index < len(char_list) and max_index != len(char_list) - 1:
                    # Append the character to the decoded text
                    decoded_text += char_list[max_index]

                # Update previous_index
                previous_index = max_index

            decoded_texts.append(decoded_text)

        return decoded_texts

    @staticmethod
    def get_cer(prediction_text, label):
       # Ensure both prediction_text and label are strings
        if not isinstance(prediction_text, str) or not isinstance(label, str):
            raise ValueError("Both prediction_text and label must be strings")

        # Initialize variables to store counts of insertions, deletions, and substitutions
        insertions = deletions = substitutions = 0

        # Lengths of prediction_text and label
        len_pred = len(prediction_text)
        len_label = len(label)

        # Initialize matrices for dynamic programming approach
        dp = [[0] * (len_label + 1) for _ in range(len_pred + 1)]

        # Initialize first row and column of dp matrix
        for i in range(len_pred + 1):
            dp[i][0] = i
        for j in range(len_label + 1):
            dp[0][j] = j

        # Calculate edit distance using dynamic programming
        for i in range(1, len_pred + 1):
            for j in range(1, len_label + 1):
                if prediction_text[i - 1] == label[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1]
                else:
                    dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1])

        # Edit distance is the last cell in the matrix
        edit_distance = dp[len_pred][len_label]

        # Calculate CER
        cer = edit_distance / len_label

        return cer

    def predict(self, image: np.ndarray):
        image = cv2.resize(image, self.input_shape[:2][::-1])

        image_pred = np.expand_dims(image, axis=0).astype(np.float32)

        preds = self.model.run(None, {self.input_name: image_pred})[0]

        text = self.ctc_decoder(preds, self.char_list)[0]

        return text

if __name__ == "__main__":
    import pandas as pd
    from tqdm import tqdm
    from mltu.configs import BaseModelConfigs

    configs = BaseModelConfigs.load("/content/Models/02_captcha_to_text/202404171552/configs.yaml")

    model = ImageToWordModel(model_path=configs.model_path, char_list=configs.vocab)

    df = pd.read_csv("/content/Models/02_captcha_to_text/202404171552/val.csv").values.tolist()

    accum_cer = []
    for image_path, label in tqdm(df):
        image = cv2.imread(image_path)

        prediction_text = model.predict(image)

        cer = model.get_cer(prediction_text, label)
        print(f"Image: {image_path}, Label: {label}, Prediction: {prediction_text}, CER: {cer}")

        accum_cer.append(cer)

    print(f"Average CER: {np.average(accum_cer)}")


 15%|█▌        | 16/104 [00:00<00:00, 154.37it/s]

Image: Datasets/captcha_images_v2/3ebnn.png, Label: 3ebnn, Prediction: 3ebnn, CER: 0.0
Image: Datasets/captcha_images_v2/wf684.png, Label: wf684, Prediction: wf6844, CER: 0.2
Image: Datasets/captcha_images_v2/6pfy4.png, Label: 6pfy4, Prediction: 6pfy4, CER: 0.0
Image: Datasets/captcha_images_v2/n7ff2.png, Label: n7ff2, Prediction: n7ff2, CER: 0.0
Image: Datasets/captcha_images_v2/478nx.png, Label: 478nx, Prediction: 478nx, CER: 0.0
Image: Datasets/captcha_images_v2/nnfx3.png, Label: nnfx3, Prediction: nnfxx3, CER: 0.2
Image: Datasets/captcha_images_v2/cnmnn.png, Label: cnmnn, Prediction: nmnnn, CER: 0.4
Image: Datasets/captcha_images_v2/ddcne.png, Label: ddcne, Prediction: ddne, CER: 0.2
Image: Datasets/captcha_images_v2/6cm6m.png, Label: 6cm6m, Prediction: 6m6m, CER: 0.2
Image: Datasets/captcha_images_v2/b28g8.png, Label: b28g8, Prediction: b28g8, CER: 0.0
Image: Datasets/captcha_images_v2/5n245.png, Label: 5n245, Prediction: 5n245, CER: 0.0
Image: Datasets/captcha_images_v2/244e2.png

 45%|████▌     | 47/104 [00:00<00:00, 145.64it/s]

Image: Datasets/captcha_images_v2/2p2y8.png, Label: 2p2y8, Prediction: 2p2yy8, CER: 0.2
Image: Datasets/captcha_images_v2/gpnxn.png, Label: gpnxn, Prediction: gpnxn, CER: 0.0
Image: Datasets/captcha_images_v2/g8gnd.png, Label: g8gnd, Prediction: g8gnnd, CER: 0.2
Image: Datasets/captcha_images_v2/mbf58.png, Label: mbf58, Prediction: mbf588, CER: 0.2
Image: Datasets/captcha_images_v2/4ycex.png, Label: 4ycex, Prediction: 4yex, CER: 0.2
Image: Datasets/captcha_images_v2/2356g.png, Label: 2356g, Prediction: 2356g, CER: 0.0
Image: Datasets/captcha_images_v2/mg5nn.png, Label: mg5nn, Prediction: mg5nn, CER: 0.0
Image: Datasets/captcha_images_v2/4w6mw.png, Label: 4w6mw, Prediction: 4w6mw, CER: 0.0
Image: Datasets/captcha_images_v2/bgd4m.png, Label: bgd4m, Prediction: bgd4m, CER: 0.0
Image: Datasets/captcha_images_v2/44fyb.png, Label: 44fyb, Prediction: 44fyb, CER: 0.0
Image: Datasets/captcha_images_v2/24pew.png, Label: 24pew, Prediction: 74pew, CER: 0.2
Image: Datasets/captcha_images_v2/5bb66.p

 74%|███████▍  | 77/104 [00:00<00:00, 145.84it/s]

Image: Datasets/captcha_images_v2/yyn57.png, Label: yyn57, Prediction: yyn57, CER: 0.0
Image: Datasets/captcha_images_v2/77wp4.png, Label: 77wp4, Prediction: 77wp44, CER: 0.2
Image: Datasets/captcha_images_v2/n4xx5.png, Label: n4xx5, Prediction: n4xxx5, CER: 0.2
Image: Datasets/captcha_images_v2/b4y5x.png, Label: b4y5x, Prediction: b4y5xx, CER: 0.2
Image: Datasets/captcha_images_v2/eng53.png, Label: eng53, Prediction: eng53, CER: 0.0
Image: Datasets/captcha_images_v2/245y5.png, Label: 245y5, Prediction: 245yy5, CER: 0.2
Image: Datasets/captcha_images_v2/ef4mn.png, Label: ef4mn, Prediction: ef4nmn, CER: 0.2
Image: Datasets/captcha_images_v2/d22bd.png, Label: d22bd, Prediction: d22bd, CER: 0.0
Image: Datasets/captcha_images_v2/xe6eb.png, Label: xe6eb, Prediction: xe6eb, CER: 0.0
Image: Datasets/captcha_images_v2/mc35n.png, Label: mc35n, Prediction: m35n, CER: 0.2
Image: Datasets/captcha_images_v2/p2dw7.png, Label: p2dw7, Prediction: p2dw7, CER: 0.0
Image: Datasets/captcha_images_v2/8w875

100%|██████████| 104/104 [00:00<00:00, 147.66it/s]

Image: Datasets/captcha_images_v2/c7gb3.png, Label: c7gb3, Prediction: 7gb3, CER: 0.2
Image: Datasets/captcha_images_v2/e3ndn.png, Label: e3ndn, Prediction: e3ndn, CER: 0.0
Image: Datasets/captcha_images_v2/4w76g.png, Label: 4w76g, Prediction: 4w76gg, CER: 0.2
Image: Datasets/captcha_images_v2/m23bp.png, Label: m23bp, Prediction: m23bp, CER: 0.0
Image: Datasets/captcha_images_v2/pybee.png, Label: pybee, Prediction: pybeee, CER: 0.2
Image: Datasets/captcha_images_v2/d2nbn.png, Label: d2nbn, Prediction: d2nbn, CER: 0.0
Image: Datasets/captcha_images_v2/gbxyy.png, Label: gbxyy, Prediction: gbxyyy, CER: 0.2
Image: Datasets/captcha_images_v2/wb3ed.png, Label: wb3ed, Prediction: wb3ed, CER: 0.0
Image: Datasets/captcha_images_v2/42dw4.png, Label: 42dw4, Prediction: 42dw4, CER: 0.0
Image: Datasets/captcha_images_v2/5p8fm.png, Label: 5p8fm, Prediction: 3p8fm, CER: 0.2
Image: Datasets/captcha_images_v2/mfb3x.png, Label: mfb3x, Prediction: mfb3x, CER: 0.0
Image: Datasets/captcha_images_v2/87nym.p


