Skip to content

Merge llama.cpp commit b9fd7ee (remove bit-shuffling) breaking changes to old quantized models #200

@abetlen

Description

@abetlen

llama.cpp recently removed bit-shuffling in some of the quantized file formats to improve performance, this a breaking change which currently doesn't have an upgrade path other then re-quantizing the base models (some users only have the quantized ggml models).

I support any change that makes inference faster as I think we still have a long way to go and don't think we should slow down the pace of improvements for what is still essentially still alpha software. That being said I also don't want to cause too much disruption to existing users who have working setups. I'll create a PR for this change and keep it open for at least the weekend before publishing a new PyPI release with the changes. After that the old version will remain available on PyPI and through the releases so people can keep it pinned.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions