High-performance slide-captcha solver powered by FastAPI, Ultralytics YOLO, ONNX Runtime, and PyTorch.
- Robust FastAPI service with DI via Dishka
- Single endpoint to solve slide-style captchas
- GPU acceleration on Linux/Windows (CUDA) when available; optimized CPU paths otherwise
- Production-friendly logging and error handling
- OpenAPI/Scalar docs included
- Python 3.10–3.11
- macOS, Linux, or Windows
- Optional: NVIDIA GPU + CUDA drivers for acceleration on Linux/Windows
- Clone the repository
git clone <repository-url>
cd sigil-solver- Create a virtual environment and install dependencies (recommended: uv)
pipx install uv || pip install uv
uv venv
source .venv/bin/activate # Windows: .venv\\Scripts\\activate
uv sync- Run the API
- Fastest way (ASGI):
uv run uvicorn sigil.main.api.native:app --host 0.0.0.0 --port 8000- Alternatively (CLI wrapper):
uv run python -c "from sigil.main.cli.app import run_cli; run_cli()" api --host 0.0.0.0 --port 8000- Explore docs
- Scalar UI: http://localhost:8000/scalar
- OpenAPI JSON: http://localhost:8000/docs
The app uses pydantic-settings with the SIGIL_ prefix. You can set variables in your environment or a .env file at the project root.
Available keys (nested config uses double underscores):
SIGIL_SECRET_KEY: Secret key (default:secret_key)SIGIL_DEBUG:true/falseto enable debug mode (default:false)SIGIL_OPENAI__API_KEY: Optional OpenAI API keySIGIL_OPENAI__MODEL: Optional OpenAI model (default:o3)SIGIL_ANTHROPIC__API_KEY: Optional Anthropic API keySIGIL_ANTHROPIC__MODEL: Optional Anthropic model (default:claude-opus-4-1-20250805)
Example .env:
SIGIL_DEBUG=true
SIGIL_SECRET_KEY=change_me
SIGIL_OPENAI__API_KEY=sk-xxxxx
SIGIL_ANTHROPIC__API_KEY=anthropic-xxxxxBase URL: http://localhost:8000
-
GET
/→ Service status- Response:
{ "success": true, "msg": "running" }
- Response:
-
GET
/health→ Health check- Response:
{ "status": "ok" }
- Response:
-
GET
/scalar→ Interactive API docs (Scalar UI) -
POST
/api/v1/captchas/slide→ Solve slide captcha- Request body fields (JSON):
puzzle_image_b64: Base64 data URI or raw base64 string of the puzzle image (optional)puzzle_image_url: URL to the puzzle image (optional)piece_image_b64: (reserved) Base64 of the slider piece (optional)piece_image_url: (reserved) URL of the slider piece (optional)shrink_size: Optional shrink size (default:340.0)
- Exactly one of
puzzle_image_b64orpuzzle_image_urlis required. - Response body:
status:successfulorfailedx: float, the estimated x-offset where the piece should slide
- Request body fields (JSON):
Using an image URL:
curl -X POST http://localhost:8000/api/v1/captchas/slide \
-H 'Content-Type: application/json' \
-d '{
"puzzle_image_url": "https://example.com/captcha.jpg"
}'Using base64 (data URI or raw base64):
BASE64=$(base64 -w 0 path/to/captcha.jpg) # macOS: base64 path/to/captcha.jpg | tr -d '\n'
curl -X POST http://localhost:8000/api/v1/captchas/slide \
-H 'Content-Type: application/json' \
-d "{ \"puzzle_image_b64\": \"$BASE64\" }"Example response:
{
"data": {
"status": "successful",
"x": 132.4
},
"meta": {}
}sigil.services.recognizer.RecognizerServiceloads two ONNX YOLO models fromsigil/models/yolo/.- The service predicts the likely gap location and returns an x-offset.
- CUDA is used automatically if available; otherwise, it falls back to CPU.
Run the server in reload mode:
uv run uvicorn sigil.main.api.native:app --reload --host 0.0.0.0 --port 8000Code style and tooling:
- Formatter: black, isort
- Lint: ruff, flake8, mypy
- Pre-commit hooks are available in the dev group
- Torch/ONNX install issues:
- macOS arm64 uses
onnxruntime-siliconautomatically; ensure Python 3.10–3.11. - Linux/Windows with NVIDIA GPU: ensure CUDA toolkits/drivers compatible with PyTorch 2.2.2.
- macOS arm64 uses
- Large model memory usage:
- Set
ENVIRONMENT=productionto avoid verbose stderr inferences. - Reduce
imgszor confidence thresholds inRecognizerService._predictif customizing.
- Set
- HTTP 400 when downloading images:
- Ensure the URL is reachable and returns an
image/*content type.
- Ensure the URL is reachable and returns an
This project is licensed under the terms of the MIT License. See LICENSE for details.