Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendations model quantisation #6

Merged
merged 1 commit into from
Jun 11, 2024
Merged

Recommendations model quantisation #6

merged 1 commit into from
Jun 11, 2024

Conversation

ae9is
Copy link
Owner

@ae9is ae9is commented Jun 11, 2024

Closes #5

By default, model quantisation (int4) is turned off. (Results are much worse and avoids a breaking change.)

The Quanto integration in the Transformers library is fairly new, and it's broken in the few releases that have it. So for now a temporary build from main was created at ae9is/transformers that fixes it. (This seemed simpler than setting up the Docker builds to checkout and build the source repository.)

With int4 quantisation the model api needs only ~256M instead of ~512M, which was right around the cut-off for small VMs and causing out of memory errors.

@ae9is ae9is linked an issue Jun 11, 2024 that may be closed by this pull request
@ae9is ae9is merged commit d08f56d into main Jun 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support recommendations model quantisation
1 participant