FlashDet

Ultra-lightweight real-time object detection with LoRA fine-tuning, Knowledge Distillation, and a full desktop UI
Train, distill, fine-tune, monitor, test, export, and quantize — all from a single application.

What is this?

FlashDet is an end-to-end object detection system based on the NanoDet-Plus architecture with a ShuffleNetV2 backbone. It ships with a modern PyQt5 desktop application that covers the entire workflow — from dataset conversion to INT8 quantized deployment — without writing a single line of code. It also integrates advanced training techniques inspired by torchtune: LoRA/QLoRA for parameter-efficient fine-tuning and Knowledge Distillation for training compact student models from larger teachers.

Key Features

Ultra-Lightweight — 0.49M to 2.44M parameters (0.9 – 4.7 MB FP16)
Real-Time — 100+ FPS on GPU, 30+ FPS on edge devices
LoRA / QLoRA Fine-Tuning — Freeze the backbone, train only low-rank adapters; QLoRA adds INT8/NF4 quantization of frozen weights for even lower memory
Knowledge Distillation — Train a small student model from a larger teacher using logit + feature distillation with configurable temperature and loss weights
Full Desktop UI — Sidebar navigation, live charts, visual comparison tools — including LoRA/KD configuration panels
End-to-End — Data conversion → Training → Dashboard → Inference → Export → Quantization
Multiple Formats — Import from YOLO / VOC / COCO, export to ONNX
Advanced Quantization — Static, Dynamic, QAT, Hessian-based with side-by-side visual comparison
Mixed Precision & Multi-GPU — AMP (FP16) training with DataParallel support
COCO Pretrained Weights — Auto-download official pretrained checkpoints for transfer learning

Screenshots

Catppuccin Mocha dark theme — deep navy base with warm pastel accents.

Annotation	Data Conversion

Full-featured image annotation tool with bounding box, four-point, and polygon drawing. Zoom, pan, brightness/contrast, crosshair, undo, and COCO JSON export.	Convert YOLO, VOC, or custom formats to COCO with a validation split slider and class name editor.

Training	LoRA Fine-tuning

Configure device, model size, hyperparameters, dataset paths, and resume from checkpoint.	Dedicated LoRA/QLoRA dashboard with variant selection (Standard, DoRA, LoRA+, AdaLoRA, OrthoLoRA, LoRA-FA), rank/alpha/dropout controls, target module selection, and memory estimates.

Knowledge Distillation	Real-Time Dashboard

Teacher-student distillation with configurable temperature, logit/feature loss weights, teacher checkpoint selection, and student architecture options.	Live loss charts (Total, QFL, BBox, DFL), epoch/iteration views, learning rate tracking, and detection preview.

Inference	Model Export

Test on images, folders, videos, or live camera. Supports both `.pth` and `.onnx` models. Adjustable confidence and NMS thresholds with detection summary.	Export to ONNX with optional graph simplification and dynamic batch size. View exported model metadata.

Quantization + Visual Comparison

Static / Dynamic / QAT / Hessian quantization with algorithm selection, calibration settings, comparison charts, and visual side-by-side inference.

Quick Start

Option 1: Pre-built Executable

Download the pre-built binary for your platform — no Python required.

Windows

Download FlashDet_Setup.exe from Releases
Run the installer
Launch from the Start Menu

Linux

tar -xzf FlashDet-linux.tar.gz
cd FlashDet && ./FlashDet

Option 2: Run from Source (Recommended)

# Clone
git clone https://github.com/GauravGoswami/NanoDet-Plus-Lite.git FlashDet
cd FlashDet

# One-command setup — creates venv, installs correct PyTorch + all deps
bash setup_env.sh              # auto-detects GPU
# bash setup_env.sh --cpu      # force CPU-only
# bash setup_env.sh --cuda 12.4  # force specific CUDA version

# Activate the environment
source venv/bin/activate

# Launch the UI
python ui/main.py          # or: ./run_ui.sh

Manual setup (if you prefer)

python3 -m venv venv && source venv/bin/activate

# Install PyTorch (pick ONE):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu      # CPU
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118     # CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124     # CUDA 12.x

# Install project dependencies
pip install -r requirements.txt

Option 3: Command Line

# Train on the included demo dataset
python train.py --epochs 50 --batch-size 16 --device cuda

# Train on a custom dataset (0.5x ultra-lite model)
python train.py \
  --model-size m-0.5x \
  --input-size 320 \
  --epochs 100 \
  --batch-size 16 \
  --save-dir workspace/my_project \
  --device cuda \
  --class-file classes/my_classes.txt \
  --train-images data/my_dataset/train \
  --val-images data/my_dataset/valid

# LoRA fine-tuning (parameter-efficient)
python train.py --lora --lora-rank 8 --epochs 50 --device cuda

# Knowledge Distillation (teacher → student)
python train_kd.py \
  --teacher-checkpoint workspace/teacher/best.pth \
  --teacher-size m-1.5x \
  --model-size m-0.5x \
  --kd-temperature 4.0 \
  --epochs 100 --device cuda

# Inference
python test.py --model workspace/my_project/checkpoint_best.pth --image photo.jpg

# Export to ONNX
python scripts/convert_pth_to_onnx.py \
  --checkpoint workspace/my_project/checkpoint_best.pth \
  --output model.onnx --simplify

# INT8 quantization
python scripts/fp16_to_int8_quantize.py --model model.onnx --output model_int8.onnx

Model Variants

Model	Backbone	FPN	Params	FP16 Size	INT8 Size	Input
m-0.5x	ShuffleNetV2 0.5x	96	0.49M	~1.2 MB	~0.6 MB	320
m (default)	ShuffleNetV2 1.0x	96	1.17M	~2.6 MB	~1.3 MB	320 / 416
m-1.5x	ShuffleNetV2 1.5x	128	2.44M	~5.2 MB	~2.6 MB	320 / 416

COCO val2017 Benchmarks (official NanoDet-Plus numbers)

Model	Input	mAP	CPU (ms)	GPU (ms)	GFLOPs
m	320	27.0	11.97	5.25	0.9
m	416	30.4	19.77	8.32	1.52
m-1.5x	320	29.9	15.90	7.21	1.75
m-1.5x	416	34.1	25.49	11.50	2.97

Architecture

Input (320x320x3)
       |
  ShuffleNetV2 Backbone (0.5x / 1.0x / 1.5x)
   C2 ──── C3 ──── C4
       |
  GhostPAN Neck (top-down + bottom-up FPN)
   P3 ──── P4 ──── P5
       |
  FlashDet detection head (based on NanoDet-Plus; per-level classification + regression)
   QFL + GIoU + DFL losses
       |
  NMS + Post-processing
       |
  Detections [x1, y1, x2, y2, score, class]

UI Features at a Glance

Tab	What it does
Annotation	Full-featured image annotation tool. Bounding box, four-point quadrilateral, and polygon drawing. Zoom/pan, brightness/contrast, crosshair, undo, and COCO JSON export.
Data Conversion	Convert YOLO / VOC / custom formats to COCO. Validation split slider, class name editor, symlink option, annotation viewer.
Training	Configure device (CPU/GPU/multi-GPU), model size (0.5x/1.0x/1.5x), hyperparameters (LR, batch, warmup, patience, grad accum), AMP, resume from checkpoint, COCO pretrained init.
LoRA Fine-tune	Dedicated LoRA/QLoRA dashboard. Select variant (Standard, DoRA, LoRA+, AdaLoRA, OrthoLoRA, LoRA-FA), configure rank/alpha/dropout, choose target modules, select base checkpoint, view memory estimates.
Distillation	Knowledge Distillation dashboard. Select teacher checkpoint and size, configure temperature, logit/feature loss weights, hard loss weight. Supports LoRA on student and activation checkpointing.
Dashboard	Live loss charts (Total, QFL, BBox, DFL) updated per batch. Epoch vs iteration views, LR tracking, live detection preview from workspace. Auto-discovers experiments.
Inference	Load `.pth` or `.onnx` model. Test on single images, folders, video files, or live webcam. Confidence/NMS sliders, detection summary table, FPS counter, zoom/pan support.
Export Model	Export to ONNX with optional graph simplification, dynamic batch size, and input size selection. View exported model metadata.
Quantization	Static, Dynamic, QAT, and Hessian-based quantization. Multiple algorithms per type (MinMax, Histogram, Entropy, MSE, Per-Channel, Fisher, etc.). Comparison charts (size, speed, accuracy). Visual Comparison tool runs both original and quantized models side-by-side on the same image.

Supported Dataset Formats

Format	Description
YOLO	`.txt` label files with `class cx cy w h` (normalised 0–1)
Pascal VOC	XML annotations with bounding boxes
COCO	JSON annotations (native format — no conversion needed)

The UI converts YOLO and VOC to COCO format automatically.

Quantization Options

Type	Algorithms	Size Reduction	Speed-up	Accuracy Loss
Static	MinMax, Histogram, Entropy, MSE, Per-Channel	~4x	3–4x	1–2%
Dynamic	PyTorch Dynamic, ONNX Runtime Dynamic	~2–4x	2–3x	1–2%
QAT	Default QAT, Per-Channel QAT	~4x	3–4x	< 1%
Hessian-based	Layer Sensitivity, Fisher Information	~4x	3–4x	< 1%

The Quantization tab includes a Visual Comparison tool that runs both the original and quantized model on the same images side-by-side, showing match rate, IoU, confidence changes, and latency.

Export & Deployment Targets

Target	Format	Notes
Edge (Raspberry Pi, Jetson)	ONNX INT8	Use static quantization with calibration data
Mobile (Android)	ONNX / NCNN	Convert ONNX to NCNN for best performance
Server	ONNX Runtime / TensorRT	FP16 or INT8 with dynamic batching
Web	ONNX.js / TF.js	Convert ONNX to TF.js for browser inference
Intel	OpenVINO	Direct ONNX import

Project Structure

FlashDet/
├── config/                     # Model, data, training configuration
├── data/demo/                  # Included demo dataset (64 train + 16 valid)
├── docs/screenshots/           # UI screenshots for README
├── scripts/
│   ├── prepare_data.py         # YOLO → COCO conversion
│   ├── convert_pth_to_onnx.py  # Export to ONNX
│   ├── fp16_to_int8_quantize.py# INT8 quantization
│   ├── build_executable.py     # PyInstaller packaging
│   ├── take_screenshots.py     # Capture UI screenshots
│   ├── build_linux.sh          # Linux build script
│   └── build_windows.bat       # Windows build script
├── src/
│   ├── models/
│   │   ├── backbone/           # ShuffleNetV2 (0.5x / 1.0x / 1.5x)
│   │   ├── neck/               # GhostPAN feature pyramid
│   │   ├── head/               # FlashDet detection head + aux head
│   │   ├── assignment/         # Dynamic soft label assigner
│   │   ├── lora.py             # LoRA / QLoRA implementation
│   │   └── detector.py         # FlashDet top-level module
│   ├── losses/
│   │   ├── focal_loss.py       # Quality Focal Loss (QFL), Distribution Focal Loss (DFL)
│   │   ├── iou_loss.py         # GIoU Loss
│   │   └── kd_loss.py          # Knowledge Distillation losses (logit + feature)
│   ├── data/                   # Dataset, dataloader, transforms, prepare
│   └── utils/                  # Visualization, metrics, checkpoint, box ops
├── ui/
│   ├── main.py                 # App entry point with sidebar navigation
│   ├── styles.py               # Shared colour palette and widget styles
│   ├── helpers.py              # Model/class listing helpers
│   ├── tabs/
│   │   ├── data_tab.py         # Data Conversion
│   │   ├── training_tab.py     # Standard Training
│   │   ├── lora_tab.py         # LoRA / QLoRA Fine-tuning
│   │   ├── kd_tab.py           # Knowledge Distillation
│   │   ├── dashboard_tab.py    # Dashboard
│   │   ├── inference_tab.py    # Inference
│   │   ├── export_tab.py       # Export Model
│   │   └── quantization_tab.py # Quantization + Visual Comparison
│   └── widgets/                # Shared file-dialog widgets
├── train.py                    # CLI training entry point (supports --lora, --qlora)
├── train_kd.py                 # CLI knowledge distillation entry point
├── test.py                     # CLI inference entry point
├── run_ui.sh                   # Launch desktop UI
└── requirements.txt            # Python dependencies

Training Details

Loss Functions

Quality Focal Loss (QFL) — joint classification and IoU quality
Generalised IoU (GIoU) — robust bounding-box regression
Distribution Focal Loss (DFL) — flexible localisation distribution

Data Augmentation

Random horizontal flip
Multi-scale resize (0.5x – 1.5x)
Colour jittering (brightness, contrast, saturation)
Mosaic (4-image combination)

Training Strategies

Cosine annealing LR schedule with warmup
Gradient clipping and accumulation
Mixed-precision training (FP16) with torch.amp
Early stopping with configurable patience
Multi-GPU via DataParallel
EMA (Exponential Moving Average) for stable evaluation

LoRA / QLoRA Fine-Tuning

FlashDet supports parameter-efficient fine-tuning inspired by torchtune, adapted for convolutional object detection:

Method	What it does	Memory Savings
LoRA	Injects low-rank decomposition matrices into Conv2d layers. Only adapters are trained.	~60–70% fewer trainable params
QLoRA	Same as LoRA but additionally quantizes frozen base weights to INT8 or NF4.	~75–85% memory reduction

Usage (CLI)

# LoRA fine-tuning (rank=8, alpha=16)
python train.py --lora --lora-rank 8 --lora-alpha 16 --epochs 50 --device cuda

# QLoRA with INT8 quantized base
python train.py --qlora --qlora-dtype int8 --lora-rank 8 --epochs 50 --device cuda

Usage (UI)

Open the LoRA Fine-tune tab in the sidebar:

Select a base model (COCO pretrained or custom checkpoint)
Choose LoRA or QLoRA mode and pick a variant (Standard, DoRA, LoRA+, AdaLoRA, OrthoLoRA, LoRA-FA)
Set Rank, Alpha, Dropout, and target modules
Click Start LoRA Fine-tuning

After training, LoRA weights are automatically merged into the base model for zero-overhead inference.

Knowledge Distillation

Train a smaller, faster student model by distilling knowledge from a larger, more accurate teacher:

Component	Loss Function	Purpose
Logit Distillation	KL-divergence (classification) + Smooth L1 (regression)	Transfer soft predictions
Feature Distillation	Normalised L2 on FPN feature maps	Transfer intermediate representations
Hard Loss	Standard QFL + GIoU + DFL	Ground-truth supervised learning

Usage (CLI)

python train_kd.py \
  --teacher-checkpoint workspace/teacher/best.pth \
  --teacher-size m-1.5x \
  --model-size m-0.5x \
  --kd-temperature 4.0 \
  --kd-logit-weight 1.0 \
  --kd-feature-weight 0.5 \
  --epochs 100 --device cuda

Usage (UI)

Open the Distillation tab in the sidebar:

Browse for the teacher checkpoint (.pth) and select teacher model size
Choose student model size and input resolution
Configure temperature, logit weight, feature weight, and hard loss weight
Click Start Distillation — launches train_kd.py with full KD configuration

Requirements

torch >= 2.0.0
torchvision >= 0.15.0
PyQt5 >= 5.15.0
opencv-python >= 4.5.0
matplotlib >= 3.5.0
numpy >= 1.20.0
Pillow >= 8.0.0
onnx >= 1.10.0
onnxruntime >= 1.15.0
pycocotools >= 2.0.0

Full list in requirements.txt.

Example Use Cases

FlashDet works for any object detection task where speed and small model size matter:

Safety & PPE — hardhat, vest, mask detection on construction sites
Container & Logistics — number plate / code reading on shipping containers
Retail — product detection, shelf monitoring
Agriculture — crop and pest identification
Industrial QA — defect and part-count inspection
Autonomous vehicles — pedestrian and sign detection at the edge

Build Desktop Executable

# Linux
./scripts/build_linux.sh

# Windows (Command Prompt)
scripts\build_windows.bat

See docs/BUILD.md for full packaging instructions.

References

NanoDet — original implementation by RangiLyu
torchtune — PyTorch fine-tuning library (LoRA/KD inspiration)
LoRA: Low-Rank Adaptation — parameter-efficient fine-tuning
QLoRA — quantized low-rank adaptation
ShuffleNetV2 — efficient backbone
GhostNet — ghost modules for lightweight FPN
Generalised Focal Loss — QFL and DFL

License

MIT License — see LICENSE for details.

_{Built by Gaurav Goswami}

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
classes		classes
config		config
container_num_0.5x		container_num_0.5x
data/demo		data/demo
docs		docs
samples		samples
scripts		scripts
src		src
ui		ui
workspace/indoor_detector		workspace/indoor_detector
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_ui.sh		run_ui.sh
setup_env.sh		setup_env.sh
test.py		test.py
train.py		train.py
train_kd.py		train_kd.py

Folders and files

Latest commit

History

Repository files navigation

FlashDet

What is this?

Key Features

Screenshots

Quick Start

Option 1: Pre-built Executable

Option 2: Run from Source (Recommended)

Option 3: Command Line

Model Variants

COCO val2017 Benchmarks (official NanoDet-Plus numbers)

Architecture

UI Features at a Glance

Supported Dataset Formats

Quantization Options

Export & Deployment Targets

Project Structure

Training Details

Loss Functions

Data Augmentation

Training Strategies

LoRA / QLoRA Fine-Tuning

Usage (CLI)

Usage (UI)

Knowledge Distillation

Usage (CLI)

Usage (UI)

Requirements

Example Use Cases

Build Desktop Executable

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages