Adding bert - WIP #328

michaelfeil · 2024-02-05T03:16:28Z

#4 Trying to implement Bert. Opening this PR for visibility.

I am blocked at the following:

no previous LayerNorm -> cannot be scaled based on the previous layer -> ignored
other MLP Layers are not converted
=> Ouputs are very far from identical

Feel free to pick up this PR if it is helpful

casper-hansen · 2024-02-05T10:44:40Z

Hi @michaelfeil, great work on this! I am indeed interested in having support for BERT models. However, the main issues you highlighted were the same ones I ran into.

Do you have any ideas on how to solve the blockers? Or do you plan to leave it as-is for now?

michaelfeil · 2024-02-05T17:31:34Z

@casper-hansen I saw that the outputs of the model really differ in embedding space.

Do I need to quantize all layers? I saw that all layers are replaced with GEMM, but I only quantized a few of them. (see the code)
Do you have some idea what is the reason for this difference?

Don't have the time to invest in the PR during the week atm.

casper-hansen · 2024-02-05T18:01:39Z

Do I need to quantize all layers? I saw that all layers are replaced with GEMM, but I only quantized a few of them. (see the code)

The layers that are not defined will use the RTN method to round down to 4-bit. You can also make use of the modules_to_not_convert argument like we do in Mixtral.

Do you have some idea what is the reason for this difference?

A good start is to use a standard benchmark for the model. For LLMs, we usually measure perplexity. A 1-2% degradation in a benchmark is acceptable. The reason could be manifold and it is hard to reason about. One potential issue is that some layers are very sensitive to quantization.

michaelfeil · 2024-02-10T01:53:09Z

Thanks for the hint, I have not tried out modules_to_not_convert - are you refering to this example?

AutoAWQ/examples/mixtral_quant.py

Line 6 in 29ee66d

modules_to_not_convert = ["gate"]

I am trying to directly use Cosine-Similarity between query and paragraph as metric, the result was similar to a random initialized model in this case.

progress bert

c2acbd3

michaelfeil mentioned this pull request Feb 10, 2024

AWQ-Bert / 4-bit Bert michaelfeil/infinity#95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding bert - WIP #328

Adding bert - WIP #328

michaelfeil commented Feb 5, 2024

casper-hansen commented Feb 5, 2024

michaelfeil commented Feb 5, 2024

casper-hansen commented Feb 5, 2024

michaelfeil commented Feb 10, 2024

Adding bert - WIP #328

Are you sure you want to change the base?

Adding bert - WIP #328

Conversation

michaelfeil commented Feb 5, 2024

casper-hansen commented Feb 5, 2024

michaelfeil commented Feb 5, 2024

casper-hansen commented Feb 5, 2024

michaelfeil commented Feb 10, 2024