Define a working default Flux chain builder for `ImageClassifier` #162

ablaom · 2021-06-15T03:32:11Z

The current default Short() is not suitable for images and throws an error. Any suggestions for a sensible image default builder? (Recall a builder is instructions for defining a Flux chain once the data has been inspected).

The relatively simple builder defined in the mnist example works for any Image scitype. It is three pairs of convolution/max_pool operarations. The filter size and the number of channels after each convolution is user-specifiable. Perhaps a modification/generalization of this would serve?

Choosing the right default can make or break adoption, so it would be good to get some expert advise.

@ToucheSir @ayush-1506 @CarloLucibello @darsnack @lorenzoh

The text was updated successfully, but these errors were encountered:

lorenzoh · 2021-06-15T07:45:30Z

Would giving the full input and output sizes work, i.e. (256, 256, 3) and (10,) for a 10-class classification problem? This is how more complicated architectures like the U-Net are dynamically constructed in FastAI.jl. Though that makes liberal use of Flux.outputsize as a way to get the output sizes of arbitrary model building blocks.

darsnack · 2021-06-15T12:51:04Z

In terms of generalizing the MNIST builder, using Flux.outputsize like Lorenz suggested will be helpful. You can check out vgg.jl and resnet.jl from this PR to see how we wrote "skeletons" for VGG/ResNet that could warp to whatever data the user wants to pass in. Of course, in that code, the user specifies the input sizes, but maybe in MLJ some of that info can be inferred?

In terms of what model should be a default, I think it depends on what you want out of the model. If the user wants a model that roughly approximates SOTA, then a small ResNet default makes sense. If you want a model that is straightforward and simple to understand/debug but non-trivial, then a VGG default makes sense. I wouldn't go for other variants as a default, because they are either more niche or require more compute. The VGG default is the generalized version of your MNIST builder in my mind.

ablaom · 2021-06-16T03:59:48Z

Thanks both for the awesome suggestions.

It looks like vgg.jl provides what we need here as is. That is, the vgg constructor is essentially a "builder" in our sense.

I might be wrong, but it looks like the resnet.jl builder has a fixed number of output classes (1000) and I don't see how to control the image size. Is this configurable?

It would be convenient to just call one of these Metalhead builders. @darsnack How far away is a merge of that PR, if you can say? Also, is it possible to suggest a good default for the vgg argument config (one of those given here, I guess?)?

I have not taken a closer look at U-Net. @lorenzoh Could you say a wee bit about what a "backbone" is? In particular, is this something that depends on the image size you want to classify? Or is this some kind of transfer learning?

By the way, I didn't know about outputsize - that's very handy.

lorenzoh · 2021-06-16T07:39:30Z

As you correctly noted, the output size of a classification model has to change depending on the number of classes. A backbone is itself a model that can be adapted for different learning tasks. For example, in computer vision we often use ResNets as backbones but with the dense and classification layers hacked off, meaning only the convolutional and pooling layers remain. This splits the model into a task-agnostic convolutional feature extractor (the "backbone") and a task-specific classification "head".

Backbones can be used for different tasks Now we can use the same backbone to create a classification model with a different number of classes (by tacking on a different head), but also to create a semantic segmentation model by tacking on a segmentation head (sequence of upsampling layers).

Backbones for transfer learning One advantage of this is also that we can reuse most of the weights of a (pre)trained classification model for different tasks (transfer learning).

U-Net is a specific model architecture for pixel-level prediction tasks like semantic segmentation or image-to-image tasks that produces outputs with the same height and width as its input (see graphic below). It is constructed from a backbone by adding upsampling layers and skip connections for every downsampling layer in the backbone. FastAI.UNetDynamic implements that for arbitrary convolutional backbones.

Image sizes Also, since ResNets can take inputs of different resolutions, they can be used with different image sizes (as long as the sides are divisible by some factor of 2 corresponding to the number of downsampling/pooling steps). Since for classification, usually a Global Average Pooling layer is used before the dense layers, the output size is always the same, hence input size does not need to be configured.

outputsize also works on arbitrary models without needing to run a forward pass so there is not really a cost associated with using them.

darsnack · 2021-06-16T12:47:02Z

I might be wrong, but it looks like the resent.jl builder has a fixed number of output classes (1000) and I don't see how to control the image size. Is this configurable?

That seems like an oversight. Like @lorenzoh mentioned, the pooling operation before the fully-connected layers means the input to the fully-connected layers is fixed independently of image size. Generally, you can use different ResNet configurations without worrying too much about the input image size. The output number of classes is hardcoded, but it doesn't need to be! I'll make that configurable.

How far away is a merge of that PR

I am trying to run an validation evaluation on the pre-trained models so I can add the numbers to the PR. It's taken a bit longer because I had CUDA.jl bugs on my normal setup with ImageNet, and moving the dataset to a different machine has been a PITA. But the copy just finished overnight, so hopefully that will be done soon. As the PR author, I think it is ready to merge. It's been reviewed many times, and it should be a simple merge at this point 🤞🏾. I would like to see it merged by the end of the week.

Also, is it possible to suggest a good default for the vgg argument config

Each config corresponds to a variant introduced in the paper. I would normally go for VGG16 (:D). Each variant is just a larger/deeper version of the previous one where the number (11 in VGG11) corresponds to the depth. I don't think any option is a bad choice, it just depends how big of a network you want.

ablaom · 2021-09-28T02:12:59Z

Waiting for:

A post 0.3.5 release of Metalhead.jl; see here.

first attempt Metalhead integration (with hack); tests lacking minor add docstring comment rm invalidated test mv metalhead stuff out to separate src file add show methods for Metalhead wraps add forgotten files with tests fix test rename metal -> image_builder

ablaom added the bug Something isn't working label Jun 15, 2021

ablaom mentioned this issue Jun 28, 2021

Refactor repo with more complete models and documentation FluxML/Metalhead.jl#70

Merged

27 tasks

ablaom mentioned this issue Mar 23, 2022

How to use Metalhead'model as ImageClassifier model? #200

Open

This was referenced Jun 23, 2022

Integrate Metalhead "builders" into MLJFlux #205

Open

Metalhead integration #206

Closed

ablaom mentioned this issue Jun 28, 2022

Add preliminary Metalhead.jl integration #208

Merged

ablaom closed this as completed in #208 Aug 22, 2022

ablaom mentioned this issue Aug 22, 2022

For a 0.2.8 release #212

Merged

ablaom mentioned this issue Aug 24, 2023

Bump compat for Metalhead #232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a working default Flux chain builder for `ImageClassifier` #162

Define a working default Flux chain builder for `ImageClassifier` #162

ablaom commented Jun 15, 2021

lorenzoh commented Jun 15, 2021

darsnack commented Jun 15, 2021

ablaom commented Jun 16, 2021 •

edited

Loading

lorenzoh commented Jun 16, 2021

darsnack commented Jun 16, 2021

ablaom commented Sep 28, 2021

Define a working default Flux chain builder for ImageClassifier #162

Define a working default Flux chain builder for ImageClassifier #162

Comments

ablaom commented Jun 15, 2021

lorenzoh commented Jun 15, 2021

darsnack commented Jun 15, 2021

ablaom commented Jun 16, 2021 • edited Loading

lorenzoh commented Jun 16, 2021

darsnack commented Jun 16, 2021

ablaom commented Sep 28, 2021

Define a working default Flux chain builder for `ImageClassifier` #162

Define a working default Flux chain builder for `ImageClassifier` #162

ablaom commented Jun 16, 2021 •

edited

Loading