Recommendations model quantisation #6

ae9is · 2024-06-11T13:18:32Z

Closes #5

By default, model quantisation (int4) is turned off. (Results are much worse and avoids a breaking change.)

The Quanto integration in the Transformers library is fairly new, and it's broken in the few releases that have it. So for now a temporary build from main was created at ae9is/transformers that fixes it. (This seemed simpler than setting up the Docker builds to checkout and build the source repository.)

With int4 quantisation the model api needs only ~256M instead of ~512M, which was right around the cut-off for small VMs and causing out of memory errors.

feat: support model quantisation and add env flag

95cfea2

ae9is linked an issue Jun 11, 2024 that may be closed by this pull request

Support recommendations model quantisation #5

Closed

ae9is merged commit d08f56d into main Jun 11, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommendations model quantisation #6

Recommendations model quantisation #6

ae9is commented Jun 11, 2024 •

edited

Loading

Recommendations model quantisation #6

Recommendations model quantisation #6

Conversation

ae9is commented Jun 11, 2024 • edited Loading

ae9is commented Jun 11, 2024 •

edited

Loading