This is an implementation of the DoWG optimization algorithm as laid out by Khaled et al.. I am unsure if the implementation is correct, and I took the liberty of creating a quantized implementation under DoWG8bit as the ordinary version absolutely gobbles up Vram.
Pull requests are welcome.