Skip to content

[quanization] Enable prefill-decode modeling #586

@stamalakhov

Description

@stamalakhov

What

Let's implement prefill-decode logic for LLama-based model quantization.

How

quantize_full_qmodel_with_gptq.py should produce two circle models (prefill and decode) on demand with some quality values.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions