Pure-Julia convolutions #9

MikeInnes · 2017-10-25T16:53:32Z

Accepts an input W+ C N and a kernel W+ Cin Cout (similar to Knet, although we don't flip the channel dimension of the kernel).

This is fairly naive – it's very fast for basic convolutions but struggles a bit more when you have multiple channels.

iblislin · 2017-10-26T02:32:21Z

need to bump REQUIRE?

MikeInnes · 2017-10-26T08:49:15Z

Good catch.

FWIW there's a bunch of stuff up for grabs here, e.g. implementing gradients and pooling. I'll happily take extremely slow / naive implementations to get things going.

iblislin · 2017-11-21T13:41:22Z

Any example usage for this API?

dfdx · 2017-11-21T22:16:43Z

A couple of notes to keep in mind:

At least padding and strides should be supported.
Backpropagation also requires gradient of convolution. Basic gradient is also a convolution, but I'm not sure if the same parameters - padding and strides - can be passed the same way.
I haven't seen a deep learning practitioner doing many convolutions on CPU yet, so it's worth to align API with that of GPU counterparts (most notably, cuDNN). In particular, cuDNN always assume 4D data (with, height, channels and batch size).

MikeInnes · 2017-11-21T22:45:28Z

Yes, although in Julia we can implement e.g. a pad function that doesn't copy, so it doesn't have to be special-cased in the core convolution algorithm; you can just write conv(pad(x, 2), w). I'm hoping to prototype this soon.
If you do the above you will also get gradients for free (though you might still want fused versions of common cases).
We do need to be able to take advantage of cudnn's optimisations, but I don't think that will pose an issue (e.g. we can easily reshape to 4D where necessary in wrappers). Cudnn is also not known for being an exceptionally clean API, so it's not necessarily where we want to take inspiration from.

MikeInnes · 2017-11-21T23:06:54Z

@iblis17 the API is pretty simple, we just define a weight like

w = randn(2, 2, 3, 5)

which is a 2x2 convolution from 3 channels to 5. Then we can call it with

im = rand(100, 100, 3)
conv(im, w)

A nice property of this is that you can drop trailing dimensions of the image (particularly the batch dimension, which is implicitly 1 in this case). It's also completely generic across number of dimensions, which seems a lot nicer than having several convNd functions up to some arbitrary N. (Most current systems do this, but only because being generic is impractical in other languages). Would be interested to hear of potential downsides though.

dfdx · 2017-11-22T07:20:54Z

A nice property of this is that you can drop trailing dimensions of the image (particularly the batch dimension, which is implicitly 1 in this case). It's also completely generic across number of dimensions, which seems a lot nicer than having several convNd functions up to some arbitrary N

Not quite, this way you wouldn't be able to distinguish between a single 2D image and a batch of 1D inputs or a single 3D image and a batch of 2D ones. Maybe we can properly dispatch using the second argument, but it still doesn't sound like a clear API for me.

you can just write conv(pad(x, 2), w)

I believe we need to preserve the way people use convolutions in other languages/libraries which is to use keyword. Also, it's again unclear how to map this to cuDNN case.

Anyway, I'm more worried about strides. I don't think you can implement them using any kind of view, but even if you can, this would break array memory contiguity and, I suppose, invalidate of slow down some algorithms.

All this stuff requires quite a lot of investigation, I should say.

If you do the above you will also get gradients for free (though you might still want fused versions of common cases).

Once again, strides are harder to handle than padding. Also keep in mind pooling which requires strides ~99% of times and gradients for them should be thought out separately.

ChrisRackauckas · 2018-02-15T14:02:41Z

w = randn(2, 2, 3, 5)

How come the weight is like that instead of rand(2,2,3)? It seems to me like you're talking about a 2x2 weight stencil for 3 channels, so I don't understand what the 5 is for.

ChrisRackauckas · 2018-02-15T14:12:17Z

Does this accept static array stencils?

iblislin · 2018-02-15T15:54:43Z

How come the weight is like that instead of rand(2,2,3)? It seems to me like you're talking about a 2x2 weight stencil for 3 channels, so I don't understand what the 5 is for.

5 indicates there are 5 filters (each fiter is 2x2 with 3 channel)

ChrisRackauckas · 2018-02-15T15:57:01Z

But if the channels are independent (the stencil doesn't apply between channels, just on each channel), is this the same operation as if it were reshaped to rand(2,2,15)?

iblislin · 2018-02-15T16:05:11Z

But if the channels are independent, is this the same operation as if it were reshaped to rand(2,2,15)?

well, I think it depends on your data. There is a slightly different.

Consider that in case of 2x2x3x5: we have image input in R,G,B order, so our filter will learn the relationship between R,G,B. So, for testing phase, we won't want to the R,G,B input order.

In case of "channels are independent", please checkout depth-wise convolution.

staticfloat · 2019-04-11T01:03:16Z

Superseded by an absolutely mind-bogglingly less efficient amount of code (in terms of SLOC) in #94.

Add group support for convolutions

basic convolution

39d88fc

MikeInnes mentioned this pull request Oct 25, 2017

Convolutions and pooling on CPU #2

Merged

MikeInnes added the help wanted label Oct 26, 2017

add ImageFiltering dep

ec061a3

MikeInnes mentioned this pull request Nov 1, 2017

Convolutions FluxML/Flux.jl#102

Closed

FluxML deleted a comment from codecov-io Nov 21, 2017

staticfloat closed this Apr 11, 2019

ToucheSir pushed a commit that referenced this pull request Feb 13, 2023

Merge pull request #9 from pxl-th/master

5761515

Add group support for convolutions

ToucheSir pushed a commit that referenced this pull request Feb 13, 2023

Merge pull request #9 from pxl-th/master

42189ce

Add group support for convolutions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pure-Julia convolutions #9

Pure-Julia convolutions #9

MikeInnes commented Oct 25, 2017

iblislin commented Oct 26, 2017

MikeInnes commented Oct 26, 2017

iblislin commented Nov 21, 2017

dfdx commented Nov 21, 2017

MikeInnes commented Nov 21, 2017 •

edited

MikeInnes commented Nov 21, 2017

dfdx commented Nov 22, 2017

ChrisRackauckas commented Feb 15, 2018

ChrisRackauckas commented Feb 15, 2018

iblislin commented Feb 15, 2018

ChrisRackauckas commented Feb 15, 2018 •

edited

iblislin commented Feb 15, 2018 •

edited

staticfloat commented Apr 11, 2019

Pure-Julia convolutions #9

Pure-Julia convolutions #9

Conversation

MikeInnes commented Oct 25, 2017

iblislin commented Oct 26, 2017

MikeInnes commented Oct 26, 2017

iblislin commented Nov 21, 2017

dfdx commented Nov 21, 2017

MikeInnes commented Nov 21, 2017 • edited

MikeInnes commented Nov 21, 2017

dfdx commented Nov 22, 2017

ChrisRackauckas commented Feb 15, 2018

ChrisRackauckas commented Feb 15, 2018

iblislin commented Feb 15, 2018

ChrisRackauckas commented Feb 15, 2018 • edited

iblislin commented Feb 15, 2018 • edited

staticfloat commented Apr 11, 2019

MikeInnes commented Nov 21, 2017 •

edited

ChrisRackauckas commented Feb 15, 2018 •

edited

iblislin commented Feb 15, 2018 •

edited