Skip to content

MARD1NO/nanoPyC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prerequisite

  • pytorch
  • cupy

You will also need a NVidia GPU to run the code.

Day 1

Implement a JIT compiler using Python decorator!

Day 2

Implement a simple matrix exp function in CUDA!

Day 3

Make the exp kernel more efficient by using more parallelism! Now the performance already matches cuBLAS.

Day 4

Simplify the kernel code by using 2D partitioning. The pitfall is partitioning the rows to x dim.

Day 5

First taste of fusion by creating a fused exp-div kernel!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 85.1%
  • Cuda 14.9%