Merge llama.cpp commit b9fd7ee (remove bit-shuffling) breaking changes to old quantized models

llama.cpp recently removed bit-shuffling in some of the quantized file formats to improve performance, this a breaking change which currently doesn't have an upgrade path other then re-quantizing the base models (some users only have the quantized ggml models).

I support any change that makes inference faster as I think we still have a long way to go and don't think we should slow down the pace of improvements for what is still essentially still alpha software. That being said I also don't want to cause too much disruption to existing users who have _working_ setups. I'll create a PR for this change and keep it open for at least the weekend before publishing a new PyPI release with the changes. After that the old version will remain available on PyPI and through the releases so people can keep it pinned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge llama.cpp commit b9fd7ee (remove bit-shuffling) breaking changes to old quantized models #200

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Merge llama.cpp commit b9fd7ee (remove bit-shuffling) breaking changes to old quantized models #200

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions