Investigate triton #258
Labels
difficulty/hard
May take a week or more
project/model
Related to modeling decisions and implementations
severity/could
A nice-to-have that we might not get to
I've been messing around with triton to see if it makes sense to start replacing some of our components with a triton implementation. So far preliminary results look good.
I have been working on a triton version of
LayerNorm
both with and without the element-wise affine transform. These are the benchmarking results for a batch of 4096 tokens andd_model
of 4096 (representative of a typical microbatch with our medium model) or 8192 (our large model), on an A100 GPU.The units are in GBPS (throughput), so larger is better.
The text was updated successfully, but these errors were encountered: