This is code for solving the text-based captchas based on the machine learning technologies. This approach is able to achieve a higher success rate than others whilst it requires significantly fewer real captchas because of using synthetic captcha generator. Here we exposed only code without dataset that can run independently on your data for security reasons. Note that it is not production ready. If you encounter any problems, please file an issue on GitHub.
The work was carried out with the support and on the basis of the laboratory of theoretical and interdisciplinary problems of Informatics of the Federal State Budgetary Institution of Science "St. Petersburg Federal Research Center of the Russian Academy of Sciences" (St. Petersburg FRC RAS). Official website: https://dscs.pro/
There are CAPTCHA that can be recognized by this solver. You can find some trained models in app/models.
pip install -r requirement.txt
git clone https://github.com/Alexander-Zadorozhnyy/CGAN_CAPTCHA_SOLVER.git
cd CGAN_CAPTCHA_SOLVER
conda create -n "CGAN_CAPTCHA_SOLVER" python=3.9.12
conda activate CGAN_CAPTCHA_SOLVER
pip install -r requirement.txt
If you have a lot of different styles in your CAPTCHA dataset, you can use the clustering algorithm:
python -m src.Clustering.clustering --dataset path_to_dataset
if you have quite a few original data, you can generate synthetic CAPTCHA:
python -m src.GAN.train --dataset_folder --symbols --model_name --saved_model_name
python -m src.GAN.create_dataset --dataset_folder --count
if you have quite a few original data, you can generate synthetic CAPTCHA:
python -m src.CNN.train --gen_data --num_gen_train --num_gen_test --saved_model_name --orig_data --num_orig_train --num_orig_test --model_name --saved_model_name'
Synthetic CAPTCHA
Time: 385 seconds to solve 5000 CAPTCHAs
Accuracy: ~99%
Real CAPTCHA
Time: 8 seconds to solve 100 CAPTCHAs
Accuracy: ~65%
You can check some details about this solver in the docs directory:
- docs/report.pdf - educational practice's report (RU)
Source code of this repository is released under the Apache-2.0 license