Hey there,
I'm learning about quantization, and I learn best when I explain stuff - so here are my notes which over time should morph into a "zero to hero" for quanting.
The goal is to understand how quanting is done (ie you can then implement it from scratch), not evaluating quanting methods. Doing that well would take more resources than I can allocate to this.
You can help by opening an issue if
- you don't understand something
- something is wrong
- you think I should cover a certain topic
I invite you to ask many questions. I don't bite :)
Cheers from Germany, Umer (@UmerHAdil)