CAPTCHAs_solver_CRNN

A deep learning model to break CAPTCHA codes perfectly using CRNN model (Convolutional Recurrent Neural Network).

Description

CAPTCHAs stands for the Completely Automated Public Turing test to tell Computers and Humans Apart. CAPTCHAs are tools you can use to differentiate between real users and automated users, such as bots. CAPTCHAs provide challenges that are difficult for computers to perform but relatively easy for humans.

However, these wormy and curvy text images are even difficult for humans to figure out sometimes.

Therefore, it would be great to have an super accurate machine learning model to help you reveal these correct texts everytime without fail.

Requirement

tensorflow 2.0+
scikit-learn
opencv-python
editdistance

Dataset

The dataset is generated by the most popular Wordpress CAPTCHAs plugin with nearly 8 millions downloads (https://wordpress.org/plugins/really-simple-captcha/)

It generates 9,955 images of 4-letter CAPTCHAs using a random mix of four different fonts.

Architecture

Ideally, we want to detect text from a text image:

However, character segmentation is not practical because:

Too time comsuming
Too expensive
Impossible in most cases

For example, the above character segmentation is fine but the below one is challenging. In fact, the traditional method will face a problem where two or more characters are too close to each other like this:

This project will use state of the art CRNN model which is a combination of CNN, RNN and CTC loss for image-based sequence recognition tasks, specially OCR (Optical Character Recognition) task which is perfect for CAPTCHAs.

This model is much more superior than traditional way which does not involve any bounding box detection for each character (character segmentation).

In this model, the image will be dissected by a fixed number of timesteps in the RNN layers so as long as each character is seperated by two or three parts to be processed and decoded later then the spacing between each character is irrelevant like so:

Here is more details of CRNN architecture:

As you can see in this diagram, the last layer of CNN produces a feature vector of the shape 4*8*4 then we flatten the first and third dimension to be 16 and keep the second dimension to be the same to produce 16*8. It's effective to cut the original image to be 8 vertical parts (red lines) and each parts contains 16 feature numbers. Since we have 8 parts to be processed as the output of CNN then we also choose 8 for our time step in the LSTM layer. After stacked LSTM layers with softmax (SM) activation function, we have CTC loss to optimize our probability table.

More information regarding the implementation can be found in the jupyter notebook in the github.

Result

We need to have the right evaluation/metrics for OCR task with edit distance library.

This is inspired from https://github.com/arthurflor23/handwritten-text-recognition/blob/master/src/data/evaluation.py

This only helps to calculate three evaluation metris for any OCR task:

CER (Character Error Rate)
WER (Word Error Rate)
SER (Sequence Error Rate)

Here is my result for a test set:

This is a easy dataset so I got absolutely perfect score for 200 images of the test set! Not even a challenge for CRNN power:

Character Error Rate: 0.0
Word Error Rate: 0.0
Sequence Error Rate: 0.0

Afterthought:

CRNN + CTC is not that challenging, just want sure we follow above process step by step like in the notebook
Keep our height and width is a power of 2 or at least even number is making our time much easier to divide by half (it is not really important, since it is related to design your model and preprocessing)
The number bi LSTM paramater is larger the number of timestep since our biLSTM /2 will be at least the size of hidden node for each single LSTM.
The max label length should be the same to the number of time steps, but some people report if they set it to be slightly lower than time step, it helps. But you should stick with the basics!
The data is super clean and same image dimension. So for other datasets, maybe a bit of noise cleaning and binarization may help!

Things to improve for other datasets:

Resize image logics with multiple image sizes (maybe as following):
- find min, max of height and width
- resize to a fixed height you want
- calculate the max width of all resized images
- padding to all images to that max width
Combine the logic of preprocessing of train set and test set together
Convert them to tfdataset pipeline (note that it is challenging since OpenCV won't work with tensor)\

License

This project is licensed under the MIT License - see the LICENSE.md file for details

🏆 Author

Huynh Nguyen Minh Thong (Tom Huynh) - tomhuynhsg@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
captcha_solver_crnn.ipynb		captcha_solver_crnn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

captcha_solver_crnn.ipynb

captcha_solver_crnn.ipynb

Repository files navigation

CAPTCHAs_solver_CRNN

Description

Requirement

Dataset

Architecture

Result

Afterthought:

Things to improve for other datasets:

License

🏆 Author

About

Releases

Packages

Languages

License

TomHuynhSG/Captchas-Solver-CRNN

Folders and files

Latest commit

History

Repository files navigation

CAPTCHAs_solver_CRNN

Description

Requirement

Dataset

Architecture

Result

Afterthought:

Things to improve for other datasets:

License

🏆 Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages