Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI: convolutional layer with KAN #145

Open
StarostinV opened this issue May 9, 2024 · 17 comments
Open

FYI: convolutional layer with KAN #145

StarostinV opened this issue May 9, 2024 · 17 comments

Comments

@StarostinV
Copy link

https://github.com/StarostinV/convkan

A basic yet efficient (thanks to efficient-kan) implementation of the convolutional operation with KAN implemented using F.unfold.

@paulestano
Copy link

Are you really sure you coded all that on your own my friend ? 😉
IMG_3922

@StarostinV
Copy link
Author

StarostinV commented May 10, 2024

I am pretty sure. Please share the link so that we can compare the implementation. Everybody would benefit from that!

EDIT: I found it, looks good! My implementation supports grouped convolutions and is tested, but otherwise it is very similar.

@paulestano
Copy link

Be my guest mate
https://github.com/paulestano/LeKan

@StarostinV
Copy link
Author

Be my guest mate https://github.com/paulestano/LeKan

I was not aware that one could use unfold as a module. However, your implementation lacks support for padding_mode and groups, and it has not been thoroughly tested. In contrast, my implementation serves as a direct replacement for Conv2d. Sharing the code for the benefit of everyone is more productive than making accusations of theft. Frankly, it's an obvious idea to implement convolution with KAN. The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package. Cheers!

@paulestano
Copy link

The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package

except I had done it 4 days ago and shared it on the said issue (#9 (comment) ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?

@paulestano
Copy link

On a more scientific note I can’t wait for you to share convincing results on cifar. Unless that thing is obvious as well 😉

@StarostinV
Copy link
Author

The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package

except I had done it 4 days ago and shared it on the said issue (#9 (comment) ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?

I meant this comment. As you can see, it was made two days ago and they also didn't know about your code. Seriously, if you think that your github account and your comment are so visible, I don't know what to say. For instance, there are dozens independent implementations of efficient kan - are you gonna accuse them of stealing ideas, too? I am trying to be polite, but it is just nonsense.

@paulestano
Copy link

Concurrent work happens but the phrasing as well as the timeline are unfortunate here. Everyone will make their own mind…

@hesamsheikh
Copy link

Could you guys please explain what do you mean by implementing "conv layer in KAN"? KAN is the equivalent of dense, conv layer is an operation defined in mathematics. How can you implement an operation in KAN and why would you do it?

it seems more plausible to replace the classification dense layers with KAN, but the feature extraction?

@minh-nguyenhoang
Copy link

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

@hesamsheikh
Copy link

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

@minh-nguyenhoang
Copy link

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

@hesamsheikh I don't say they are the same, as Linear isn't local or spatial too. But I just say that we can implement the convolution operation using a matrix multiplication operation (with some modifications on the input of course). That's what encourage us to think a way to incorporate KAN to convolution.

KAN is just rephrase the way to compute the next layer feature, instead of taking the weighted sum of the input features and then do some activation, we would take the weighted sum of the b-spline functions, which is a better way to interpret how NN works. Then instead of taking a whole input space as the potential contributors (just like a Linear layer), we instead just look at a neighborhood of features, and the way we "judge" all neighborhood is the same, then what we get should be similar to a convolution layer.

@StarostinV
Copy link
Author

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

@hesamsheikh
Copy link

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?

@minh-nguyenhoang
Copy link

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?

Yep that's the whole idea.

@XiangboGaoBarry
Copy link

Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo
We evaluate the result on CIFAR10 dataset.

@StarostinV
Copy link
Author

Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo We evaluate the result on CIFAR10 dataset.

You can also make a pull request to add your repo to this collection of KANs https://github.com/mintisan/awesome-kan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants