Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a working default Flux chain builder for ImageClassifier #162

Closed
ablaom opened this issue Jun 15, 2021 · 6 comments · Fixed by #208
Closed

Define a working default Flux chain builder for ImageClassifier #162

ablaom opened this issue Jun 15, 2021 · 6 comments · Fixed by #208
Labels
bug Something isn't working

Comments

@ablaom
Copy link
Collaborator

ablaom commented Jun 15, 2021

The current default Short() is not suitable for images and throws an error. Any suggestions for a sensible image default builder? (Recall a builder is instructions for defining a Flux chain once the data has been inspected).

The relatively simple builder defined in the mnist example works for any Image scitype. It is three pairs of convolution/max_pool operarations. The filter size and the number of channels after each convolution is user-specifiable. Perhaps a modification/generalization of this would serve?

Choosing the right default can make or break adoption, so it would be good to get some expert advise.

@ToucheSir @ayush-1506 @CarloLucibello @darsnack @lorenzoh

@ablaom ablaom added the bug Something isn't working label Jun 15, 2021
@lorenzoh
Copy link
Member

Would giving the full input and output sizes work, i.e. (256, 256, 3) and (10,) for a 10-class classification problem? This is how more complicated architectures like the U-Net are dynamically constructed in FastAI.jl. Though that makes liberal use of Flux.outputsize as a way to get the output sizes of arbitrary model building blocks.

@darsnack
Copy link
Member

In terms of generalizing the MNIST builder, using Flux.outputsize like Lorenz suggested will be helpful. You can check out vgg.jl and resnet.jl from this PR to see how we wrote "skeletons" for VGG/ResNet that could warp to whatever data the user wants to pass in. Of course, in that code, the user specifies the input sizes, but maybe in MLJ some of that info can be inferred?

In terms of what model should be a default, I think it depends on what you want out of the model. If the user wants a model that roughly approximates SOTA, then a small ResNet default makes sense. If you want a model that is straightforward and simple to understand/debug but non-trivial, then a VGG default makes sense. I wouldn't go for other variants as a default, because they are either more niche or require more compute. The VGG default is the generalized version of your MNIST builder in my mind.

@ablaom
Copy link
Collaborator Author

ablaom commented Jun 16, 2021

Thanks both for the awesome suggestions.

It looks like vgg.jl provides what we need here as is. That is, the vgg constructor is essentially a "builder" in our sense.

I might be wrong, but it looks like the resnet.jl builder has a fixed number of output classes (1000) and I don't see how to control the image size. Is this configurable?

It would be convenient to just call one of these Metalhead builders. @darsnack How far away is a merge of that PR, if you can say? Also, is it possible to suggest a good default for the vgg argument config (one of those given here, I guess?)?

I have not taken a closer look at U-Net. @lorenzoh Could you say a wee bit about what a "backbone" is? In particular, is this something that depends on the image size you want to classify? Or is this some kind of transfer learning?

By the way, I didn't know about outputsize - that's very handy.

@lorenzoh
Copy link
Member

As you correctly noted, the output size of a classification model has to change depending on the number of classes. A backbone is itself a model that can be adapted for different learning tasks. For example, in computer vision we often use ResNets as backbones but with the dense and classification layers hacked off, meaning only the convolutional and pooling layers remain. This splits the model into a task-agnostic convolutional feature extractor (the "backbone") and a task-specific classification "head".

Backbones can be used for different tasks Now we can use the same backbone to create a classification model with a different number of classes (by tacking on a different head), but also to create a semantic segmentation model by tacking on a segmentation head (sequence of upsampling layers).

Backbones for transfer learning One advantage of this is also that we can reuse most of the weights of a (pre)trained classification model for different tasks (transfer learning).

U-Net is a specific model architecture for pixel-level prediction tasks like semantic segmentation or image-to-image tasks that produces outputs with the same height and width as its input (see graphic below). It is constructed from a backbone by adding upsampling layers and skip connections for every downsampling layer in the backbone. FastAI.UNetDynamic implements that for arbitrary convolutional backbones.
image

Image sizes Also, since ResNets can take inputs of different resolutions, they can be used with different image sizes (as long as the sides are divisible by some factor of 2 corresponding to the number of downsampling/pooling steps). Since for classification, usually a Global Average Pooling layer is used before the dense layers, the output size is always the same, hence input size does not need to be configured.

outputsize also works on arbitrary models without needing to run a forward pass so there is not really a cost associated with using them.

@darsnack
Copy link
Member

I might be wrong, but it looks like the resent.jl builder has a fixed number of output classes (1000) and I don't see how to control the image size. Is this configurable?

That seems like an oversight. Like @lorenzoh mentioned, the pooling operation before the fully-connected layers means the input to the fully-connected layers is fixed independently of image size. Generally, you can use different ResNet configurations without worrying too much about the input image size. The output number of classes is hardcoded, but it doesn't need to be! I'll make that configurable.

How far away is a merge of that PR

I am trying to run an validation evaluation on the pre-trained models so I can add the numbers to the PR. It's taken a bit longer because I had CUDA.jl bugs on my normal setup with ImageNet, and moving the dataset to a different machine has been a PITA. But the copy just finished overnight, so hopefully that will be done soon. As the PR author, I think it is ready to merge. It's been reviewed many times, and it should be a simple merge at this point 🤞🏾. I would like to see it merged by the end of the week.

Also, is it possible to suggest a good default for the vgg argument config

Each config corresponds to a variant introduced in the paper. I would normally go for VGG16 (:D). Each variant is just a larger/deeper version of the previous one where the number (11 in VGG11) corresponds to the depth. I don't think any option is a bad choice, it just depends how big of a network you want.

@ablaom
Copy link
Collaborator Author

ablaom commented Sep 28, 2021

Waiting for:

  • A post 0.3.5 release of Metalhead.jl; see here.

ablaom added a commit that referenced this issue Jun 28, 2022
first attempt Metalhead integration (with hack); tests lacking

minor

add docstring comment

rm invalidated test

mv metalhead stuff out to separate src file

add show methods for Metalhead wraps

add forgotten files with tests

fix test

rename metal -> image_builder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants