FastKAN approximates KANs with RBFs achieving 3+ times acceleration #141

ZiyaoLi · 2024-05-09T11:49:30Z

As a 3-order BSpline is used (most commonly), they can be well approximated by Gaussian RBF functions.

LayerNorm is useful to avoid grid re-scaling.

Two summing up to a much faster implementation (approximation) of KAN: FastKAN. See here

ZiyaoLi · 2024-05-10T03:29:17Z

Now FastKAN is 3+ times faster (fwd) compared with efficient_kan. Believe you all want to try this.

https://github.com/ZiyaoLi/fast-kan

1ssb · 2024-05-10T04:28:40Z

Very interesting, what is your benchmark and validation performance?

ZiyaoLi · 2024-05-10T13:56:49Z

Very interesting, what is your benchmark and validation performance?

Results are shown in the provided repo. I tested the forward speed of my FastKANLayer and efficient KAN's KANLinear, and the results are 740us -> 220us.

AthanasiosDelis · 2024-05-10T14:09:39Z

I think it is even faster:

https://github.com/AthanasiosDelis/fast-kan-playground

I run benchmarks similar to https://github.com/Jerry-Master/KAN-benchmarking for uniformity of comparisons.

For me, the most important thing is to test if Pykan indeed has the continuous learning capabilities that it promises and if Fast-Kan inherits these capabilities as well as the ability for pruning and symbolic regression.

1ssb · 2024-05-10T15:35:13Z

My version gets around:

Forward pass took 0.000297 - 0.0016 (Best-Worst) seconds

while hitting ~98% accuracy on MNIST, what is the comparison of time vs accuracy for FastKAN?

Edit: In the training dynamics, its supposed to be variable, depending on what the loss function, data dimensionality and a host of other things, what is the benchmark exactly?

AthanasiosDelis · 2024-05-10T17:27:11Z

I inspected your github and adjusted my
lr=1e-3,
weight_decay=1e-5,
gamma=0.85,
with yours, @1ssb.

So for the results I got after 15 epochs:

-with FastKAN([28 * 28, 64, 10], grid_min = -3., grid_max = 3., num_grids = 4, exponent = 2, denominator = 1.7)

Total parameters: 255858
Trainable parameters: 255850

100%|█| 938/938 [00:16<00:00, 58.10it/s, accuracy=0.969, loss=0.045, lr=0.0
Epoch 15, Val Loss: 0.07097885620257162, Val Accuracy: 0.9798964968152867

-with MLP(layers=[28 * 28, 320, 10], device='cuda')

Total parameters: 254410
Trainable parameters: 254410

100%|█| 938/938 [00:15<00:00, 59.52it/s, accuracy=0.969, loss=1.47, lr=0.00
Epoch 15, Val Loss: 1.4862790791092404, Val Accuracy: 0.9756170382165605

Results from comparison of these networks in a dataset with comparable scale to MNIST, generated with the create_dataset:

	forward	backward	forward	backward	num params	num trainable params
fastkan-gpu	0.83 ms	1.27 ms	0.02 GB	0.02 GB	255858	255850
mlp-gpu	0.25 ms	0.62 ms	0.02 GB	0.02 GB	254410	254410
effkan-gpu	2.39 ms	2.18 ms	0.03 GB	0.03 GB	508160	508160

Result accuracy comparaple, network width smaller, Fast-KAN remains slower, memory comparable (have not tested with the original Fast-KAN yet).

Still, the big questions are how you adapt FastKAN to perform symbolic regression, testing for continuous learning, and also the relationship between RBF parameters and the B-Spline grid parameters. Are you working currently in any of those 3 @ZiyaoLi ?

1ssb · 2024-05-10T17:31:53Z

Thanks, kindly take a look at KAL Net that I have just released if you get time.

AthanasiosDelis · 2024-05-10T17:41:47Z

@1ssb

1ssb · 2024-05-10T17:47:01Z

Thanks a lot, I think it indicates that my model is a bit bulkier as expected because of the recursive stacking.

AthanasiosDelis · 2024-05-10T17:59:06Z

I also put the original FastKAN in the game. For some reason, I cannot easily minimize the trainable parameters of effecient-kan using only the grid_size and spline_order.

1ssb · 2024-05-10T18:03:08Z

I think its a bit incomplete to not have a measure on the expressivity or performances, I think FastKAN outperforms overall?

AthanasiosDelis · 2024-05-10T18:30:34Z

I think yes, FastKAN-like implementations that use RBF approximations are the Fastest. I am aware of 3 implementations so far, that are all more or less extremly similar:

RBF-KAN
fast-kan-playground
fast-kan

Later tonight, I will compare also with RBF-KAN.

LiZhenzhuBlog · 2024-05-11T00:52:05Z

wonderful

ZiyaoLi · 2024-05-11T01:10:53Z

I think yes, FastKAN-like implementations that use RBF approximations are the Fastest. I am aware of 3 implementations so far, that are all more or less extremly similar:

RBF-KAN fast-kan-playground fast-kan

Later tonight, I will compare also with RBF-KAN.

This RBF-KAN is simply a copy of my FastKAN code without acknowledgement, even with the same variable names.

ZiyaoLi · 2024-05-11T01:20:52Z

I inspected your github and adjusted my lr=1e-3, weight_decay=1e-5, gamma=0.85, with yours, @1ssb.

So for the results I got after 15 epochs:

-with FastKAN([28 * 28, 64, 10], grid_min = -3., grid_max = 3., num_grids = 4, exponent = 2, denominator = 1.7)

Total parameters: 255858 Trainable parameters: 255850

100%|█| 938/938 [00:16<00:00, 58.10it/s, accuracy=0.969, loss=0.045, lr=0.0 Epoch 15, Val Loss: 0.07097885620257162, Val Accuracy: 0.9798964968152867

-with MLP(layers=[28 * 28, 320, 10], device='cuda')

Total parameters: 254410 Trainable parameters: 254410

100%|█| 938/938 [00:15<00:00, 59.52it/s, accuracy=0.969, loss=1.47, lr=0.00 Epoch 15, Val Loss: 1.4862790791092404, Val Accuracy: 0.9756170382165605

Results from comparison of these networks in a dataset with comparable scale to MNIST, generated with the create_dataset:

forward backward forward backward num params num trainable params
fastkan-gpu 0.83 ms 1.27 ms 0.02 GB 0.02 GB 255858 255850
mlp-gpu 0.25 ms 0.62 ms 0.02 GB 0.02 GB 254410 254410
effkan-gpu 2.39 ms 2.18 ms 0.03 GB 0.03 GB 508160 508160
Result accuracy comparaple, network width smaller, Fast-KAN remains slower, memory comparable (have not tested with the original Fast-KAN yet).

Still, the big questions are how you adapt FastKAN to perform symbolic regression, testing for continuous learning, and also the relationship between RBF parameters and the B-Spline grid parameters. Are you working currently in any of those 3 @ZiyaoLi ?

Not exactly. What FastKAN found is that KANs are essentially RBF networks. If you check the history of RBF networks you'll see that it's been widely inspected. Efficiency wouldn't be the most important problem. The problem would now be: if KANs are really that good.

1ssb · 2024-05-11T01:54:22Z

I don't think that always holds: *KANs are RBFs*. If you approximate splines with RBFs that may be true but I do not agree with this blanket generalisation. This is why I think one should enquire deeper into other approximations, and if these start exhibiting a variety of properties that just proves that KANs are dominated by the approximate basis functions, which is intuitive. First can you clearly outline why you think KANs are essentially RBFs? From the history of RBFs we know they are not very scalable nor expressive (requires to see almost all of the data or poor extrapolators) as their affine counterparts.

…

On Sat, 11 May, 2024, 11:21 am Ziyao Li, ***@***.***> wrote: I inspected your github and adjusted my lr=1e-3, weight_decay=1e-5, gamma=0.85, with yours, @1ssb <https://github.com/1ssb>. So for the results I got after 15 epochs: -with FastKAN([28 * 28, 64, 10], grid_min = -3., grid_max = 3., num_grids = 4, exponent = 2, denominator = 1.7) Total parameters: 255858 Trainable parameters: 255850 100%|█| 938/938 [00:16<00:00, 58.10it/s, accuracy=0.969, loss=0.045, lr=0.0 Epoch 15, Val Loss: 0.07097885620257162, Val Accuracy: 0.9798964968152867 -with MLP(layers=[28 * 28, 320, 10], device='cuda') Total parameters: 254410 Trainable parameters: 254410 100%|█| 938/938 [00:15<00:00, 59.52it/s, accuracy=0.969, loss=1.47, lr=0.00 Epoch 15, Val Loss: 1.4862790791092404, Val Accuracy: 0.9756170382165605 Results from comparison of these networks in a dataset with comparable scale to MNIST, generated with the create_dataset: forward backward forward backward num params num trainable params fastkan-gpu 0.83 ms 1.27 ms 0.02 GB 0.02 GB 255858 255850 mlp-gpu 0.25 ms 0.62 ms 0.02 GB 0.02 GB 254410 254410 effkan-gpu 2.39 ms 2.18 ms 0.03 GB 0.03 GB 508160 508160 Result accuracy comparaple, network width smaller, Fast-KAN remains slower, memory comparable (have not tested with the original Fast-KAN yet). Still, the big questions are how you adapt FastKAN to perform symbolic regression, testing for continuous learning, and also the relationship between RBF parameters and the B-Spline grid parameters. Are you working currently in any of those 3 @ZiyaoLi <https://github.com/ZiyaoLi> ? Not exactly. What FastKAN found is that KANs are essentially RBF networks. If you check the history of RBF networks you'll see that it's been widely inspected. Efficiency wouldn't be the most important problem. The problem would now be: if KANs are really that good. — Reply to this email directly, view it on GitHub <#141 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJWHFEABETFZVZWMQOQLRKDZBVXAVAVCNFSM6AAAAABHOVTANOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVGQZTCMZXGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ZiyaoLi · 2024-05-11T02:06:35Z

I don't think that always holds: KANs are RBFs. If you approximate splines with RBFs that may be true but I do not agree with this blanket generalisation. This is why I think one should enquire deeper into other approximations, and if these start exhibiting a variety of properties that just proves thatt KANs are dominated by the approximate basis functions, which is intuitive. First can you clearly outline why you think KANs are essentially RBFs? From the history of RBFs we know they are not very scalable nor expressive as their affine counterparts.

@1ssb This would be an interesting discussion that is far beyond this issue lol.

The claim isn't steady indeed. My claim that KANs are RBFs should be narrowed as "3-order B-Spline KANs as implemented in pykan are very much the same as FastKAN, which is a univariate RBF network". This is because that "3-order B-Spline basis can be numerically approximated by univariate Gaussian RBFs", as you've concluded.

1ssb · 2024-05-11T02:09:48Z

Very interesting! Unfortunately this analysis is not novel. There is an entire literature to how there are Gaussian approximations to Neural Network outputs some which hold ar infinite width, some which hold at small widths and layers but none general. I have a feeling we are doomed, xD.

…

On Sat, 11 May, 2024, 12:06 pm Ziyao Li, ***@***.***> wrote: I don't think that always holds: *KANs are RBFs*. If you approximate splines with RBFs that may be true but I do not agree with this blanket generalisation. This is why I think one should enquire deeper into other approximations, and if these start exhibiting a variety of properties that just proves thatt KANs are dominated by the approximate basis functions, which is intuitive. First can you clearly outline why you think KANs are essentially RBFs? From the history of RBFs we know they are not very scalable nor expressive as their affine counterparts. @1ssb <https://github.com/1ssb> This would be an interesting discussion that is far beyond this issue lol. The claim isn't steady indeed. My claim that *KANs are RBFs* should be narrowed as "3-order B-Spline KANs as implemented in pykan are very much the same as FastKAN, which is a univariate RBF network". This is because that "3-order B-Spline basis can be numerically approximated by univariate Gaussian RBFs", as you've concluded. — Reply to this email directly, view it on GitHub <#141 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJWHFEEJ42NVTSFZZQFYPRTZBV4MBAVCNFSM6AAAAABHOVTANOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBVGQ2TINBRGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

ZiyaoLi · 2024-05-11T02:14:17Z

Haha relax, it's just about bridging KANs with the theories that you mentioned. It's not going to be something as important as MLPs or say Transformers anyway :p

1ssb · 2024-05-11T02:25:07Z

I was really hopeful 😭

AthanasiosDelis · 2024-05-12T15:49:16Z

I think yes, FastKAN-like implementations that use RBF approximations are the Fastest. I am aware of 3 implementations so far, that are all more or less extremly similar:
RBF-KAN fast-kan-playground fast-kan
Later tonight, I will compare also with RBF-KAN.

This RBF-KAN is simply a copy of my FastKAN code without acknowledgement, even with the same variable names.

Yeah it was my sad realisation also.

AthanasiosDelis · 2024-05-12T15:54:39Z

I have updates. I switched from RBF to approximation with the RSWAF approximation function:

I also zeroed the SiLu part. The results are even faster, and MNIST still yields 97.7% accuracy:

Now that I have significantly distanced myself from FastKAN's implementation in terms of mathematics, I thought it proper to rename it FasterKAN, but I still keep the original references because, nevertheless, I am based on @ZiyaoLi's code base.

ZiyaoLi changed the title ~~A fast approximation of KAN wrt BSpline & grid scaling~~ FastKAN approximates KANs with RBFs achieving 3+ times acceleration May 11, 2024

ZiyaoLi mentioned this issue May 11, 2024

Are KANs RBF Networks? #162

Closed

KindXiaoming closed this as completed Jul 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastKAN approximates KANs with RBFs achieving 3+ times acceleration #141

FastKAN approximates KANs with RBFs achieving 3+ times acceleration #141

ZiyaoLi commented May 9, 2024

ZiyaoLi commented May 10, 2024

1ssb commented May 10, 2024

ZiyaoLi commented May 10, 2024

AthanasiosDelis commented May 10, 2024 •

edited

Loading

1ssb commented May 10, 2024 •

edited

Loading

AthanasiosDelis commented May 10, 2024 •

edited

Loading

1ssb commented May 10, 2024

AthanasiosDelis commented May 10, 2024

1ssb commented May 10, 2024 •

edited

Loading

AthanasiosDelis commented May 10, 2024

1ssb commented May 10, 2024

AthanasiosDelis commented May 10, 2024

LiZhenzhuBlog commented May 11, 2024

ZiyaoLi commented May 11, 2024

ZiyaoLi commented May 11, 2024

1ssb commented May 11, 2024 via email •

edited

Loading

ZiyaoLi commented May 11, 2024

1ssb commented May 11, 2024 via email

ZiyaoLi commented May 11, 2024

1ssb commented May 11, 2024

AthanasiosDelis commented May 12, 2024

AthanasiosDelis commented May 12, 2024

FastKAN approximates KANs with RBFs achieving 3+ times acceleration #141

FastKAN approximates KANs with RBFs achieving 3+ times acceleration #141

Comments

ZiyaoLi commented May 9, 2024

ZiyaoLi commented May 10, 2024

1ssb commented May 10, 2024

ZiyaoLi commented May 10, 2024

AthanasiosDelis commented May 10, 2024 • edited Loading

1ssb commented May 10, 2024 • edited Loading

AthanasiosDelis commented May 10, 2024 • edited Loading

1ssb commented May 10, 2024

AthanasiosDelis commented May 10, 2024

1ssb commented May 10, 2024 • edited Loading

AthanasiosDelis commented May 10, 2024

1ssb commented May 10, 2024

AthanasiosDelis commented May 10, 2024

LiZhenzhuBlog commented May 11, 2024

ZiyaoLi commented May 11, 2024

ZiyaoLi commented May 11, 2024

1ssb commented May 11, 2024 via email • edited Loading

ZiyaoLi commented May 11, 2024

1ssb commented May 11, 2024 via email

ZiyaoLi commented May 11, 2024

1ssb commented May 11, 2024

AthanasiosDelis commented May 12, 2024

AthanasiosDelis commented May 12, 2024

AthanasiosDelis commented May 10, 2024 •

edited

Loading

1ssb commented May 10, 2024 •

edited

Loading

AthanasiosDelis commented May 10, 2024 •

edited

Loading

1ssb commented May 10, 2024 •

edited

Loading

1ssb commented May 11, 2024 via email •

edited

Loading