Skip to content

FlashVision/FlashOptim

Repository files navigation

⚡ FlashOptim

Model Optimization Toolkit for FlashVision

Quantization • Pruning • Distillation • Neural Architecture Search • Deployment

CI PyPI Python License Stars


🚀 What is FlashOptim?

FlashOptim is a comprehensive model optimization toolkit designed for FlashVision models. It provides state-of-the-art techniques to compress, accelerate, and deploy deep learning models efficiently on edge devices, mobile platforms, and cloud infrastructure.

Key Features

Feature Description
Quantization Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) — INT8, FP16, mixed precision
Pruning Unstructured (magnitude), Structured (channel/filter), Lottery Ticket Hypothesis
Distillation Knowledge distillation (logit-level), Feature distillation, Self-distillation
NAS Neural Architecture Search with configurable search spaces and strategies
LoRA Low-Rank Adaptation and QLoRA for efficient fine-tuning
Export ONNX, TensorRT, OpenVINO, CoreML export with optimization
Auto-Optimizer One-click optimization pipeline with automatic method selection
Benchmarking Latency, throughput, memory, and accuracy profiling

📦 Installation

From PyPI (recommended)

pip install flashoptim

From Source

git clone https://github.com/FlashVision/FlashOptim.git
cd FlashOptim
pip install -e ".[all]"

With Optional Dependencies

pip install flashoptim[export]        # ONNX export support
pip install flashoptim[quantization]  # Quantization extras
pip install flashoptim[analytics]     # Visualization & profiling
pip install flashoptim[all]           # Everything
pip install flashoptim[dev]           # Development tools

⚡ Quick Start

Quantize a Model (INT8)

from flashoptim import FlashOptim, PTQuantizer

model = FlashOptim("pretrained/model.pth")

quantizer = PTQuantizer(dtype="int8", calibration_samples=500)
quantized_model = quantizer.quantize(model, calibration_data="data/calibration/")

quantized_model.export("optimized/model_int8.onnx")

Prune a Model

from flashoptim import FlashOptim, UnstructuredPruner

model = FlashOptim("pretrained/model.pth")

pruner = UnstructuredPruner(sparsity=0.5, method="magnitude")
pruned_model = pruner.prune(model)

pruned_model.export("optimized/model_pruned.onnx")

Knowledge Distillation

from flashoptim import FlashOptim, KnowledgeDistiller, Trainer

teacher = FlashOptim("pretrained/teacher_large.pth")
student = FlashOptim("pretrained/student_small.pth")

distiller = KnowledgeDistiller(temperature=4.0, alpha=0.7)
trainer = Trainer(distiller=distiller, epochs=50)
trainer.train(teacher=teacher, student=student, data="data/train/")

Auto-Optimize (One-Click)

from flashoptim.solutions import AutoOptimizer

optimizer = AutoOptimizer(target="edge")  # "edge", "mobile", "server"
optimized = optimizer.optimize(model)
print(optimizer.get_report())

🖥️ CLI Usage

# Quantize a model
flashoptim quantize --config configs/flashoptim_quantize_int8.yaml

# Prune a model
flashoptim prune --config configs/flashoptim_prune_unstructured.yaml

# Knowledge distillation
flashoptim distill --config configs/flashoptim_distill_det.yaml

# Neural Architecture Search
flashoptim nas --config configs/flashoptim_nas_search.yaml

# Export optimized model
flashoptim export --model optimized/model.pth --format onnx

# Benchmark
flashoptim benchmark --model optimized/model.onnx --device cpu

📁 Project Structure

FlashOptim/
├── configs/          # YAML configuration files
├── docker/           # Docker support
├── docs/             # Documentation
├── examples/         # Runnable example scripts
├── flashoptim/       # Core library
│   ├── cfg/          # Configuration management
│   ├── data/         # Data loading & calibration
│   ├── engine/       # Training, validation, export
│   ├── models/       # Model architectures
│   ├── losses/       # Loss functions
│   ├── quantization/ # Quantization methods
│   ├── pruning/      # Pruning methods
│   ├── distillation/ # Distillation methods
│   ├── nas/          # Neural Architecture Search
│   ├── solutions/    # High-level optimization solutions
│   ├── analytics/    # Benchmarking & profiling
│   └── utils/        # Utilities
└── tests/            # Unit tests

📊 Benchmarks

Model Method Size Reduction Latency Speedup Accuracy Drop
FlashDet-S INT8 PTQ 2.3× < 0.5%
FlashDet-M 50% Pruning 1.8× < 1.0%
FlashDet-L Distillation 2.5× < 0.3%
FlashDet-S Auto-Optimize 3.1× < 1.0%

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.


📄 License

This project is licensed under the MIT License — see the LICENSE file for details.


🙏 Acknowledgements


Made with ❤️ by the FlashVision team

About

FlashOptim: Model optimization toolkit — quantization, pruning, distillation, and NAS for FlashVision models

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages