Skip to content

broisnischal/number

Repository files navigation

title Nepali handwritten digits
sdk docker

Nepali handwritten digits

Train with train.py, extract crops with extract-digits.py, multi-image inference with predict_multi.py.

Training options (environment variables):

  • TRAIN_EPOCHS — max epochs (default 120; early stopping usually stops earlier).
  • EARLY_STOP_PATIENCE — patience for early stopping (default 15).
  • TRAIN_AUGMENT — set to 0 to disable augmentation.

If the validation split is fewer than 8 images, validation is merged into training and callbacks use training loss (test set stays separate), so the “best epoch” is not stuck at epoch 1.

Web API (FastAPI)

uv sync   # or: pip install -r requirements.txt
uv run uvicorn app:app --reload --host 0.0.0.0 --port 8000

Open http://127.0.0.1:8000/ for the upload UI, or POST to /api/predict with form field file.

Environment variables:

  • MODEL_PATH — Keras model (default: best_nepali_digit_model.keras)
  • DATASET_DIR — folder with 09 subfolders for embedding decode (default: dataset)
  • CORS_ORIGINS — comma-separated origins (default: *)

Deploy on Hugging Face Spaces (Docker)

  1. Create a SpaceDocker (blank).
  2. Push this repo (include Dockerfile, app.py, static/index.html, requirements.txt, model .keras, and dataset/ if you use embedding decode).
  3. The container listens on port 7860 (required for Spaces).
  4. Optional: add a thumbnail and description in the Space settings.

Large files: use Git LFS for .keras weights if the push fails.

Space will show the UI at the Space URL root; API remains at /api/predict.

Why softmax, embedding, and pixel NN disagree (e.g. 777 vs 373 vs 070)

Nothing is “wrong” with the JSON — the three heads measure different things:

  1. Domain shift — The CNN was trained mostly on worksheet crops from extract-digits.py (scanned paper, line removal, fixed cells). Canvas / phone photos have different stroke weight, contrast, and padding. The model was not trained on that distribution, so softmax can confidently pick the wrong class (e.g. all 7).

  2. Layout grid-equal — Splits the image into three equal vertical strips. That only matches real data when you drew three digits in three columns. Freehand or one wide digit is cut wrong, so each “box” is not a real character — predictions become meaningless.

  3. Tiny data — Dozens of images is not enough for stable generalization. Embedding nearest-neighbor picks the closest training patch in feature space (often class 3 if those samples are nearest), not “the true” digit for a new style.

What to do

  • For drawings, use layout blobs (or grid if you have clear gaps), not grid-equal, unless you really drew three equal columns.
  • Add real examples of how users write (canvas or camera) into dataset/ — use add_real_samples.py with correct labels — then retrain (TRAIN_AUGMENT=1 is on by default for light augmentation).
  • Treat low agreement between softmax and embedding as “uncertain”; don’t trust a single head on out-of-domain input.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors