| title | Nepali handwritten digits |
|---|---|
| sdk | docker |
Train with train.py, extract crops with extract-digits.py, multi-image inference with predict_multi.py.
Training options (environment variables):
TRAIN_EPOCHS— max epochs (default120; early stopping usually stops earlier).EARLY_STOP_PATIENCE— patience for early stopping (default15).TRAIN_AUGMENT— set to0to disable augmentation.
If the validation split is fewer than 8 images, validation is merged into training and callbacks use training loss (test set stays separate), so the “best epoch” is not stuck at epoch 1.
uv sync # or: pip install -r requirements.txt
uv run uvicorn app:app --reload --host 0.0.0.0 --port 8000Open http://127.0.0.1:8000/ for the upload UI, or POST to /api/predict with form field file.
Environment variables:
MODEL_PATH— Keras model (default:best_nepali_digit_model.keras)DATASET_DIR— folder with0…9subfolders for embedding decode (default:dataset)CORS_ORIGINS— comma-separated origins (default:*)
- Create a Space → Docker (blank).
- Push this repo (include
Dockerfile,app.py,static/index.html,requirements.txt, model.keras, anddataset/if you use embedding decode). - The container listens on port 7860 (required for Spaces).
- Optional: add a thumbnail and description in the Space settings.
Large files: use Git LFS for .keras weights if the push fails.
Space will show the UI at the Space URL root; API remains at /api/predict.
Nothing is “wrong” with the JSON — the three heads measure different things:
-
Domain shift — The CNN was trained mostly on worksheet crops from
extract-digits.py(scanned paper, line removal, fixed cells). Canvas / phone photos have different stroke weight, contrast, and padding. The model was not trained on that distribution, so softmax can confidently pick the wrong class (e.g. all 7). -
Layout
grid-equal— Splits the image into three equal vertical strips. That only matches real data when you drew three digits in three columns. Freehand or one wide digit is cut wrong, so each “box” is not a real character — predictions become meaningless. -
Tiny data — Dozens of images is not enough for stable generalization. Embedding nearest-neighbor picks the closest training patch in feature space (often class 3 if those samples are nearest), not “the true” digit for a new style.
What to do
- For drawings, use layout
blobs(orgridif you have clear gaps), notgrid-equal, unless you really drew three equal columns. - Add real examples of how users write (canvas or camera) into
dataset/— useadd_real_samples.pywith correct labels — then retrain (TRAIN_AUGMENT=1is on by default for light augmentation). - Treat low agreement between softmax and embedding as “uncertain”; don’t trust a single head on out-of-domain input.