DeCRUEHD Framework

Deep CAPTCHA Recognition Using Encapsulated Preprocessing and Heterogeneous Datasets

A research effort for using Deep Learning (DL) techniques to recognize text-based CAPTCHAs.

Research Contributions:

The capability to generate 'Heterogeneous' CAPTCHA image samples, whereby different CAPTCHA schemes are employed to create a diversified labelled dataset.
Integrating the CRABI algorithm (CAPTCHA Recognition with Attached Binary Images) to preprocess CAPTCHA samples by attaching black and white bars as markers to the bottom of CAPTCHA image copies. This allows for CAPTCHA-text recognition on a per-character basis without the use of segmentation.
Demonstrating the effectiveness of this CAPTCHA-recognition pipeline through transfer (continuous) learning. This project uses Convolutional Neural Networks (CNNs) to recognize characters in CAPTCHA images.

CRABI algorithm visualized:

Some CAPTCHA image samples:

Requirements

Python 3.9
Jupyter Notebook

Install the required Python modules by running the following command:

$ pip3 install -r requirements.txt

Example Training Workflow

New users are recommended to execute the following command for a sample workflow:

$ python3 run.py

This script will create a heterogeneous CAPTCHA dataset and will train a CNN using the MobileNet architecture.

Generate Dataset

$ python3 create_captcha_images.py
usage: create_captcha_images.py [-h] -i ITERATIONS -l {1,2,3,4,5} -t {SIMPLE,COMPLEX,MONOCHROME} -d DESTINATION

options:
  -h, --help            show this help message and exit
  -i ITERATIONS, --iterations ITERATIONS
                        The number of times to repeat unique CAPTCHA set generation.
  -l {1,2,3,4,5}, --length {1,2,3,4,5}
                        Number of characters for each CAPTCHA image.
  -t {SIMPLE,COMPLEX,MONOCHROME}, --captcha_type {SIMPLE,COMPLEX,MONOCHROME}
                        Variation of CAPTCHA image to create.
  -d DESTINATION, --destination DESTINATION
                        The name of the destination directory within "datasets" to save the CAPTCHA images to

Model Training

$ python3 create_captcha_recognition_model.py
usage: create_captcha_recognition_model.py [-h] -d DATA_DIRECTORY -l {1,2,3,4,5} -e EPOCHS -b {1,16,32,64,128} -a {VGG16,MOBILE-NET,RESNET,T-NET} -m MODEL_NAME -t
                                           TRAINING_HISTORY_FILE_NAME

options:
  -h, --help            show this help message and exit
  -d DATA_DIRECTORY, --data_directory DATA_DIRECTORY
                        Name of the sub-directory inside "datasets" holding the CAPTCHA images for training.
  -l {1,2,3,4,5}, --length {1,2,3,4,5}
                        Number of characters for each CAPTCHA image.
  -e EPOCHS, --epochs EPOCHS
                        Number of epochs when training the model.
  -b {1,16,32,64,128}, --batch_size {1,16,32,64,128}
                        Number of samples for the model at each iteration of training.
  -a {VGG16,MOBILE-NET,RESNET,T-NET}, --model_architecture {VGG16,MOBILE-NET,RESNET,T-NET}
                        Type of neural network architecture for the model.
  -m MODEL_NAME, --model_name MODEL_NAME
                        Name of the model file when saving to disk.
  -t TRAINING_HISTORY_FILE_NAME, --training_history_file_name TRAINING_HISTORY_FILE_NAME
                        Name of the destination file name for storing training history information.

Model Evaluation

Examples of Jupyter notebook files with model evaluation can be found in the notebooks subdirectory.

The notebook files contain 'evaluation' in their file names.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auxiliary

auxiliary

fonts

fonts

images

images

notebooks

notebooks

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

create_captcha_images.py

create_captcha_images.py

create_captcha_recognition_model.py

create_captcha_recognition_model.py

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

DeCRUEHD Framework

Research Contributions:

Requirements

Example Training Workflow

Generate Dataset

Model Training

Model Evaluation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
auxiliary		auxiliary
fonts		fonts
images		images
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_captcha_images.py		create_captcha_images.py
create_captcha_recognition_model.py		create_captcha_recognition_model.py
requirements.txt		requirements.txt
run.py		run.py

License

T-Visor/decruehd-framework

Folders and files

Latest commit

History

Repository files navigation

DeCRUEHD Framework

Research Contributions:

Requirements

Example Training Workflow

Generate Dataset

Model Training

Model Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages