You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Theoretically, it is possible to have different formats for different matrices. As I saw, not all matrices in RWKV have same pattern of outliers. Also, there is INT8 matmul coming into ggml, which will make Q4_0/Q4_1 formats even better for RWKV.
I think with the new Q5_0 and Q5_1 formats, which are almost as good as unquantized, and very fast, there is not need to introduce additional complexity of handling different quantization formats for different matrices.
Thanks for the great work. Maybe we can use Q4_1 for some of the matrices? (and Q4_1_O for others)
The text was updated successfully, but these errors were encountered: