-
Notifications
You must be signed in to change notification settings - Fork 0
Ghiora/TritonLearning
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Use the same instructions as in
../transformer_translation_python/HowToTrain.README
python transformer_translation_triton.py
python transformer_translation_triton.py --train train.en train.de
RTX 4090 optimizations:
Block sizes tuned for SM89 architecture (BLOCK_M=64, BLOCK_N=64)
Memory-efficient attention avoids materializing full attention matrix
Fused operations reduce global memory traffic
About
Comments I added when going through the Triton GPU Tutorial code. (using Claud/AI)
Resources
Stars
Watchers
Forks
Releases
No releases published