Skip to content
Bingjie YAN edited this page Sep 12, 2022 · 18 revisions

Welcome to the hcaptcha-model-factory wiki!

This project is about πŸ— hCAPTCHA binary classification model factory.

If this project is hopeful for you, please leave a ⭐star~!

Introduction and motivation

Image recognazation as a most common captcha category was provided by many captcha service like hCaptcha and reCaptcha. But it's can easyly be solved by deep learning. Collect and label data is the only thing you need to do.

Any image recognazation task can be regarded as a binary classification task for now. You just need to decide to "click" or "not click", "true" or "false".

So, this project is as a pluggable module in hcaptcha-challenger, which can quick iteration and update. When a new challenge comes, just train a simple resnet model for it is enough.

This ResNetMini model is only 295KB for onnx format. But I don't know how big the hCaptcha generation model is, haha!

Make AI great again!

Project structure

hcaptcha-model-factory
 β”œβ”€β”€ data
 β”‚   └── smiling_dog
 β”‚       β”œβ”€β”€ unlabel (If you use auto label tools, you need to place all images at here)
 β”‚       β”œβ”€β”€ bad (You need to place the images which not contain a smiling dog at here)
 β”‚       β”œβ”€β”€ yes (You need to place the images which contain a smiling dog at here)
 β”‚       β”œβ”€β”€ all.yaml (auto generated)
 β”‚       β”œβ”€β”€ train.yaml (auto generated)
 β”‚       β”œβ”€β”€ val.yaml (auto generated)
 β”‚       └── test.yaml (auto generated)
 β”œβ”€β”€ LICENSE
 β”œβ”€β”€ model (After the training, your model will be stored here)
 β”‚    └── smiling_dog
 β”‚        β”œβ”€β”€ smiling_dog.pth
 β”‚        β”œβ”€β”€ smiling_dog_100.pth
 β”‚        └── smiling_dog_200.pth
 β”œβ”€β”€ README.md
 β”œβ”€β”€ requirements.txt
 └── src

Model

  • ResNetMini
    • size: 295 KB
    • params: 75154 trainable parameters
    • structure: conv - bn - relu - conv - bn - conv - bn - relu

Usage

Recommended environment: Python 3.8, PyTorch==1.8.2 [Optional: CUDA>=10.2]

System: Windows/Linux/Mac

(It supports all system which can install PyTorch, but I just test it on Windows. Hoping you know, and Welcome a pr!)

Preparing

Run following command.

git clone https://github.com/beiyuouo/hcaptcha-model-factory.git
cd hcaptcha-model-factory
pip install -r requirements.txt
cd src

Full workflow to add a new challenge

Use this command to start a new challenge, and follow the prompt.

python main.py new
prompt[en] -> Please click each image containing a smiling dog
2022-09-12 19:17:08 | DEBUG - Diagnose task | task_name=smiling_dog
Use AI to automatically label datasets? {'y', 'n'} --> y
please put all the images in the `unlabel` folder and press any key to continue...
2022-09-12 19:17:55 | INFO - Found 1166 images in hcaptcha-model-factory\data\smiling_dog\unlabel
# after auto label you need to check and correct them.
2022-09-12 19:18:20 | INFO - Embeddings extracted
2022-09-12 19:18:20 | INFO - PCA..., shape of embs: (1166, 512)
2022-09-12 19:18:20 | INFO - PCA done, shape of embs: (1166, 128)
2022-09-12 19:18:20 | DEBUG - Clustering...
2022-09-12 19:18:20 | DEBUG - Clustering done
2022-09-12 19:18:20 | INFO - Saving labels...
2022-09-12 19:18:22 | DEBUG - Labels saved
2022-09-12 19:18:22 | SUCCESS - Auto labeling completed
Start automatic training? {'y', 'n'} --> y
# your model is generated in model folder.

One step usage

Auto Label

Train and Val

ps: main.py is the name of scaffold File. After the v0.1.x, the entry will be moved to main.py.

# hcaptcha-model-factory/src
python main.py trainval --task=[labelName]

You must use quotation marks when the '--task' argument contains Spaces.

python main.py trainval --task dog
python main.py trainval --task=smiling_dog
python main.py trainval --task="cat-shaped cookie"

The " ", "," and "-" characters are automatically replaced with "_".

  • "cat-shaped cookie" => "cat_shaped_cookie".
  • "cat with large, rounded head" => "cat_with_large_rounded_head"

Train

# hcaptcha-model-factory/src
python main.py train --task=[labelName]

Val

# hcaptcha-model-factory/src
python main.py val --task=[labelName]

Test

# hcaptcha-model-factory/src
python main.py test_onnx --task=[labelName] (optional)--flag=["all" | "train" | "val" | "test"]