This is work from the paper "Patching Leaks in the Charformer for Efficient Character-Level Generation" (ArXiv link to be added soon).
The aim is to show that the GBST layer in Charformer makes character-level models more efficient, however it cannot function out-of-the-box for generative tasks. This work is the result of my transitioning to researching more into character-level models, which is far from complete. I am expanding this work in a future paper so there may be bugs in this repo that I have fixed, but it should be capable of reproducing all of my experiments in the ArXiv paper.