Skip to content

dumpmemory/100kernels

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

122 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kernels in CUDA || Triton

kernels of different DL funcs

activation

  • ELU (fp32, fp16, fp16x2, fp16x8_packed)
  • GeLU (fp32, fp16, fp16x4_packed)
  • Sigmoid (fp32, fp16, fp16x8_packed)
  • ReLU (fp32, fp16)
  • Swish (fp32, fp16)

embedding

  • similar kernel to torch.nn.functional.embedding in fp32 & fp16

About

100 days of learning & making kernels in cuda / triton

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Cuda 79.8%
  • Python 19.9%
  • C++ 0.3%