FAQ

Frequently Asked Questions

General

What is FlashOptim?

FlashOptim is a model optimization toolkit for compressing and accelerating deep learning models. It supports quantization, pruning, distillation, and neural architecture search.

Which models are supported?

FlashOptim works with any PyTorch model, with first-class support for FlashVision detection and classification models.

What hardware is required?

Training/Optimization: GPU recommended (CUDA 11.8+)
Inference: CPU, GPU, or edge devices (via ONNX/TensorRT export)

Quantization

PTQ vs QAT — which should I use?

PTQ is faster (no training required) but may lose 1-2% accuracy
QAT requires training but typically recovers accuracy to within 0.5%
Start with PTQ; use QAT if accuracy drop is unacceptable

How many calibration samples do I need?

Typically 100-500 representative samples are sufficient. More samples help with histogram-based calibration.

Pruning

Does unstructured pruning actually speed up inference?

Not without sparse hardware or sparse inference engines. Use structured pruning for guaranteed speedup on standard hardware.

Can I combine pruning with quantization?

Yes! Apply pruning first, fine-tune, then quantize. This gives both sparsity and reduced precision.

Distillation

Does the teacher need to be the same architecture?

No. Feature distillation with projection layers can handle different architectures. Logit distillation works regardless of architecture.

What temperature should I use?

Start with T=4. Higher temperatures (5-10) work better when the teacher is much larger.

NAS

How long does a NAS search take?

Depends on strategy and search space:

Random search: minutes to hours
Evolutionary: hours to days
Use proxy tasks to speed up evaluation

Deployment

What export formats are supported?

ONNX (recommended)
TensorRT (planned)
OpenVINO (planned)
CoreML (planned)

How do I benchmark my optimized model?

flashoptim benchmark --model optimized/model.onnx --device cpu --warmup 10 --runs 100

FlashOptim — Model optimization toolkit | PyPI | MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FAQ

Frequently Asked Questions

General

What is FlashOptim?

Which models are supported?

What hardware is required?

Quantization

PTQ vs QAT — which should I use?

How many calibration samples do I need?

Pruning

Does unstructured pruning actually speed up inference?

Can I combine pruning with quantization?

Distillation

Does the teacher need to be the same architecture?

What temperature should I use?

NAS

How long does a NAS search take?

Deployment

What export formats are supported?

How do I benchmark my optimized model?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FlashOptim Wiki

Clone this wiki locally