-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Gaurav14cs17 edited this page Jun 21, 2026
·
2 revisions
Welcome to the FlashOptim documentation — a comprehensive model optimization toolkit for FlashVision models.
FlashOptim provides state-of-the-art model compression and optimization techniques:
- Quantization — Reduce model precision (INT8, FP16) for faster inference
- Pruning — Remove redundant weights and channels
- Distillation — Transfer knowledge from large to small models
- NAS — Automatically search for efficient architectures
- LoRA — Low-Rank Adaptation for efficient fine-tuning
- Export — Deploy to ONNX, TensorRT, OpenVINO
| Page | Description |
|---|---|
| Installation | Setup and installation guide |
| Quick Start | Get running in minutes |
| Quantization | PTQ and QAT guide |
| Pruning | Weight and channel pruning |
| Distillation | Knowledge distillation |
| NAS | Neural Architecture Search |
| FAQ | Frequently asked questions |
- Python >= 3.8
- PyTorch >= 2.0.0
- CUDA >= 11.8 (recommended for GPU acceleration)
FlashOptim — Model optimization toolkit | PyPI | MIT License