-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations #4
Comments
@Jerry-Master Thanks for your interest, and good work optimizing the backward pass :) . I'll probably add some optimizations in the future (the backward pass looks really bad :) but it was getting late and I wanted to push). For now you can enjoy having an edge to investigate more efficiently the properties of fourierKAN :) Sorry, I don't want to change the license to a more open one. I need to earn some money, and this project is kind of an experiment to try to monetize some research algorithms. It's about striking a balance between offering enough so that people can investigate the properties, while not offering too much that it get blatantly copied and people don't have an edge to gain by using the commercial version. Kind of like selling more efficient pickaxes to miners during a gold rush. There are probably plenty of private research labs around the world like mine, sitting on tons of tricks, techniques and algorithms of various values, looking for ways to monetize it while having some positive impact on the world. The whole economics of deep learning and algorithmic research is completely messed up : You've got big actors in favorable positions milking their cows for as long as possible while trying to release as slowly as possible and controlling research tools ; Wanna-be big actors running on VC fumes selling at a loss to gain market share ; Hardware manufacturers controlling the compute ; Public universities offering research for free ; Small actors looking for attention to exist ; State-actors sponsoring their flocks to various degrees for various purposes ; Interesting times ahead :) |
I've just pushed an optimization for the backward pass. It should be much better (probably on par with what you've done though I've not yet bench-marked it). |
Thanks for the answer. It is true that the business models are a bit messed up. In any case, I will probably publish my tricks on my own. |
I have updated my benchmark with your new implementation and with mine. Cross-posting here.
Mine is cufkan. Your accesses in the forward were not coalesced. The standard indexing of a monolithic kernel seems to be faster than your grid-stride loop. And separating the bias addition helps too. If you make further optimizations please let me know and I will update. |
With the benchmark from here I wrote some optimizations to your cuda kernels to improve the backward while maintaining numerical accuracy with a tolerance of 1e-4. The result is below, compare fusedfourierkan-gpu with myfusedfourierkan-gpu.
If you change license to MIT or Apache I will make a pull request. There are more optimizations to make. I will continue adding when I have time. So if you want them, just change license.
The text was updated successfully, but these errors were encountered: