Xopes: Toolbox for Accelerating Deep Learning Operators.
- 240910
- add fwd_fn, bwd_fn for lrpe, md_lrpe;
- add test, benchmark for flao_fal;
- add lrpe document;
- 240911
- add md_lrpe document;
- add act;
- add softmax
- relu
- sigmoid
- silu
- none
- add jit act
- add flao_fal document;
- clear flao code, add interface;
- fuse act + lrpe;
- relu
- sigmoid
- silu
- none
- softmax
- 240912
- add md_lrpe document;
- add act;
- fuse act + lrpe;
- softmax
- test fused act + lrpe + linear attention with output gate
- 240913
- add md_lrpe document;
- add act;
- add mask for lrpe sp 1d
- test fused act + lrpe + linear attention with output gate
- left softmax + dim = -2
- 240914
- add md_lrpe document;
- add act;
- add mask for lrpe sp 1d
- test fused act + lrpe + linear attention with output gate
- left softmax + dim = -2
- custom benchmark function
- 240918
- add md_lrpe cosine document;
- add feature mask for lrpe cosine 1d;
- add feature mask for lrpe cosine md;
- triton
- triton cache
- add act for lrpe cosine md;
- triton
- triton cache
- left softmax + dim = -2
- benchmark lrpe cosine md with act;
- add act;
- add mask for lrpe sp 1d
- test fused act + lrpe + linear attention with output gate
- left softmax + dim = -2
- custom benchmark function
- 240919
- add act for lrpe cosine md;
- triton
- triton cache
- triton block parallel
- left softmax + dim = -2
- left bwd
- left softmax + dim = -2
- add act;
- test fused act + lrpe + linear attention with output gate
- left softmax + dim = -2
- lrpe cosine md
- custom benchmark function
- add act for lrpe cosine md;
- 240923
- add act for lrpe cosine md;
- triton block parallel
- left bwd
- triton block parallel
- add act;
- test fused act + lrpe + linear attention with output gate
- left softmax + dim = -2
- lrpe cosine md
- custom benchmark function
- multinomial
- torch
- torch online
- triton online
- triton parallel
- add act for lrpe cosine md;
- 240924
- add act for lrpe cosine md;
- triton block parallel
- left bwd
- triton block parallel
- add act;
- test fused act + lrpe + linear attention with output gate
- left softmax + dim = -2
- lrpe cosine md
- custom benchmark function
- multinomial
- triton online
- triton parallel
- triton parallel gumbel
- document
- add act for lrpe cosine md;
- 240929
- multinomial
- triton parallel gumbel small vocab bug
- unify input shape
- reduce kernel bug
- online_gumbel_multinomial_torch
- multinomial
[Feature Add]
[Bug Fix]
[Benchmark Add]
[Document Add]
Symbol Explanation: When benchmarking, use o
to represent the output of the function. For the function name, use fn_version
where fn
is the function name and version
can be either torch
or triton
.