Optimizations #4

Jerry-Master · 2024-05-06T18:02:51Z

With the benchmark from here I wrote some optimizations to your cuda kernels to improve the backward while maintaining numerical accuracy with a tolerance of 1e-4. The result is below, compare fusedfourierkan-gpu with myfusedfourierkan-gpu.

                       |      forward  |     backward  |      forward  |     backward  |   num params  |  num trainable params
----------------------------------------------------------------------------------------------------------------------------------
effkan-gpu             |      4.41 ms  |      5.97 ms  |      0.13 GB  |      0.19 GB  |     10010000  |              10010000
fourierkan-gpu         |     17.98 ms  |     14.73 ms  |      1.96 GB  |      2.01 GB  |     10011001  |              10011001
fusedfourierkan-gpu    |     29.08 ms  |   2218.09 ms  |      0.09 GB  |      0.13 GB  |     10011001  |              10011001
myfusedfourierkan-gpu  |     30.46 ms  |     49.09 ms  |      0.09 GB  |      0.13 GB  |     10011001  |              10011001
mlp-gpu                |      0.37 ms  |      1.09 ms  |      0.10 GB  |      0.13 GB  |     10020001  |              10020001

If you change license to MIT or Apache I will make a pull request. There are more optimizations to make. I will continue adding when I have time. So if you want them, just change license.

The text was updated successfully, but these errors were encountered:

unrealwill · 2024-05-06T21:41:18Z

@Jerry-Master Thanks for your interest, and good work optimizing the backward pass :) .

I'll probably add some optimizations in the future (the backward pass looks really bad :) but it was getting late and I wanted to push).

For now you can enjoy having an edge to investigate more efficiently the properties of fourierKAN :)

Sorry, I don't want to change the license to a more open one.

I need to earn some money, and this project is kind of an experiment to try to monetize some research algorithms. It's about striking a balance between offering enough so that people can investigate the properties, while not offering too much that it get blatantly copied and people don't have an edge to gain by using the commercial version. Kind of like selling more efficient pickaxes to miners during a gold rush.

There are probably plenty of private research labs around the world like mine, sitting on tons of tricks, techniques and algorithms of various values, looking for ways to monetize it while having some positive impact on the world.

The whole economics of deep learning and algorithmic research is completely messed up :

You've got big actors in favorable positions milking their cows for as long as possible while trying to release as slowly as possible and controlling research tools ; Wanna-be big actors running on VC fumes selling at a loss to gain market share ; Hardware manufacturers controlling the compute ; Public universities offering research for free ; Small actors looking for attention to exist ; State-actors sponsoring their flocks to various degrees for various purposes ;

Interesting times ahead :)

unrealwill · 2024-05-06T23:51:19Z

I've just pushed an optimization for the backward pass. It should be much better (probably on par with what you've done though I've not yet bench-marked it).

Jerry-Master · 2024-05-07T10:07:44Z

Thanks for the answer. It is true that the business models are a bit messed up. In any case, I will probably publish my tricks on my own.

Jerry-Master · 2024-05-09T20:01:54Z

I have updated my benchmark with your new implementation and with mine. Cross-posting here.

                     |      forward  |     backward  |      forward  |     backward  |   num params  |  num trainable params
----------------------------------------------------------------------------------------------------------------------------------
effkan-cpu           |     33.31 ms  |     43.63 ms  |       nan GB  |       nan GB  |     10010000  |              10010000
effkan-gpu           |      4.15 ms  |      3.69 ms  |      0.13 GB  |      0.19 GB  |     10010000  |              10010000
fourierkan-cpu       |    798.43 ms  |    929.11 ms  |       nan GB  |       nan GB  |     10011001  |              10011001
fourierkan-gpu       |     19.20 ms  |     14.80 ms  |      1.96 GB  |      2.01 GB  |     10011001  |              10011001
fusedfourierkan-cpu  |    914.66 ms  |   1646.11 ms  |       nan GB  |       nan GB  |     10011001  |              10011001
fusedfourierkan-gpu  |     30.14 ms  |     84.01 ms  |      0.09 GB  |      0.13 GB  |     10011001  |              10011001
cufkan-cpu           |   1454.64 ms  |   3807.97 ms  |       nan GB  |       nan GB  |     10011001  |              10011001
cufkan-gpu           |      6.24 ms  |     50.71 ms  |      0.09 GB  |      0.13 GB  |     10011001  |              10011001
chebykan-cpu         |     22.16 ms  |     26.90 ms  |       nan GB  |       nan GB  |     10010000  |              10010000
chebykan-gpu         |      5.89 ms  |      8.03 ms  |      0.14 GB  |      0.13 GB  |     10010000  |              10010000
mlp-cpu              |      6.60 ms  |     10.56 ms  |       nan GB  |       nan GB  |     10020001  |              10020001
mlp-gpu              |      0.45 ms  |      1.06 ms  |      0.10 GB  |      0.13 GB  |     10020001  |              10020001
----------------------------------------------------------------------------------------------------------------------------------
pykan-cpu            |     15.59 ms  |     17.53 ms  |       nan GB  |       nan GB  |         2431  |                  1551
pykan-gpu            |     50.56 ms  |     93.93 ms  |      0.02 GB  |      0.02 GB  |         2431  |                  1551

Mine is cufkan. Your accesses in the forward were not coalesced. The standard indexing of a monolithic kernel seems to be faster than your grid-stride loop. And separating the bias addition helps too. If you make further optimizations please let me know and I will update.

akaashdash mentioned this issue May 7, 2024

time complexity Blealtan/efficient-kan#7

Closed

unrealwill mentioned this issue May 7, 2024

Slow running time for FusedFourierKAN #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations #4

Optimizations #4

Jerry-Master commented May 6, 2024

unrealwill commented May 6, 2024

unrealwill commented May 6, 2024

Jerry-Master commented May 7, 2024

Jerry-Master commented May 9, 2024 •

edited

Loading

Optimizations #4

Optimizations #4

Comments

Jerry-Master commented May 6, 2024

unrealwill commented May 6, 2024

unrealwill commented May 6, 2024

Jerry-Master commented May 7, 2024

Jerry-Master commented May 9, 2024 • edited Loading

Jerry-Master commented May 9, 2024 •

edited

Loading