Skip to content

Tutorials for Triton, a language for writing gpu kernels

Notifications You must be signed in to change notification settings

VikParuchuri/triton_tutorial

Repository files navigation

Triton tutorials

Triton is a language for writing GPU kernels. It's easier to use than CUDA, and interoperates well with PyTorch.

If you want to speed up PyTorch training or inference speed, you can try writing kernels for the heavier operations using Triton. (flash attention is a good example of a custom GPU kernel that speeds up training)

This repo has my notes as I learn to use Triton. They include a lot of code, and some discussion of the key concepts. They're geared towards people new to GPU programming and Triton.

Hopefully you will find them useful.

Contents

  1. GPU Basics
  2. Vector Addition
  3. Matrix Multiplication
  4. Softmax forward and backward
  5. Block matmul
  6. Matmul forward and backward

Install

To install Triton, just do pip install triton. You need a CUDA-compatible GPU with CUDA installed to use it.

References

Material in these notebooks came from the following sources (and they're generally good documentation):

About

Tutorials for Triton, a language for writing gpu kernels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published