Skip to content
Gaurav14cs17 edited this page Jun 21, 2026 · 1 revision

Frequently Asked Questions

General

What is FlashOptim?

FlashOptim is a model optimization toolkit for compressing and accelerating deep learning models. It supports quantization, pruning, distillation, and neural architecture search.

Which models are supported?

FlashOptim works with any PyTorch model, with first-class support for FlashVision detection and classification models.

What hardware is required?

  • Training/Optimization: GPU recommended (CUDA 11.8+)
  • Inference: CPU, GPU, or edge devices (via ONNX/TensorRT export)

Quantization

PTQ vs QAT — which should I use?

  • PTQ is faster (no training required) but may lose 1-2% accuracy
  • QAT requires training but typically recovers accuracy to within 0.5%
  • Start with PTQ; use QAT if accuracy drop is unacceptable

How many calibration samples do I need?

Typically 100-500 representative samples are sufficient. More samples help with histogram-based calibration.


Pruning

Does unstructured pruning actually speed up inference?

Not without sparse hardware or sparse inference engines. Use structured pruning for guaranteed speedup on standard hardware.

Can I combine pruning with quantization?

Yes! Apply pruning first, fine-tune, then quantize. This gives both sparsity and reduced precision.


Distillation

Does the teacher need to be the same architecture?

No. Feature distillation with projection layers can handle different architectures. Logit distillation works regardless of architecture.

What temperature should I use?

Start with T=4. Higher temperatures (5-10) work better when the teacher is much larger.


NAS

How long does a NAS search take?

Depends on strategy and search space:

  • Random search: minutes to hours
  • Evolutionary: hours to days
  • Use proxy tasks to speed up evaluation

Deployment

What export formats are supported?

  • ONNX (recommended)
  • TensorRT (planned)
  • OpenVINO (planned)
  • CoreML (planned)

How do I benchmark my optimized model?

flashoptim benchmark --model optimized/model.onnx --device cpu --warmup 10 --runs 100