Skip to content
This repository has been archived by the owner on Jul 2, 2021. It is now read-only.

Add ResNet #321

Merged
merged 47 commits into from
Mar 6, 2018
Merged

Add ResNet #321

merged 47 commits into from
Mar 6, 2018

Conversation

yuyu2172
Copy link
Member

@yuyu2172 yuyu2172 commented Jul 6, 2017

Merge after #265 (DONE)

EDIT:
merge after #427 (DONE)

@yuyu2172
Copy link
Member Author

yuyu2172 commented Sep 22, 2017

I did a quick survey on various architectures that are called ResNet.
There seems to be three.

  1. Original architecture (https://arxiv.org/pdf/1512.03385.pdf)
  2. Facebook's ResNet.
  • The difference between the original architecture is with "strided convolution" (link).
  • TensorFlow calls this ResNet v1. They have no implementation for the original architecture in the official repository (?) link
  1. ResNet v2 (also called Pre-ResNet in Torch)

Names

I think naming the ResNet class as ResNet*** (e.g. ResNet50) is good.

  • We do not need to support ResNet v2 because it seems to be unpopular compared to the other variants.
  • We can support both the original architecture and FB ResNet in one class by switching between the two using a variable. The logic is relatively simple.

@yuyu2172 yuyu2172 changed the title [WIP] Add ResNet Add ResNet Sep 22, 2017
@yuyu2172 yuyu2172 added this to the v0.8 milestone Sep 22, 2017
@Hakuyume
Copy link
Member

The third model is called Pre-ResNet in torch (https://github.com/facebook/fb.resnet.torch/blob/master/models/preresnet.lua).

@yuyu2172
Copy link
Member Author

Thanks. I added that information to the summary.

@yuyu2172 yuyu2172 mentioned this pull request Sep 29, 2017
6 tasks
@yuyu2172
Copy link
Member Author

yuyu2172 commented Oct 8, 2017

@Hakuyume

Can you briefly take a look at resnet.py?
https://github.com/yuyu2172/chainercv/blob/9288eb869a64f6f17aaf8dbf90f4206faf6cfc42/chainercv/links/model/resnet/resnet.py

Note that the uploaded pretrained weight would not work with the current organization of weights.

@yuyu2172 yuyu2172 modified the milestones: v0.8, v0.9 Dec 19, 2017
@yuyu2172
Copy link
Member Author

yuyu2172 commented Feb 9, 2018

mode sounds like switching between inference and training.
arch is better.

@yuyu2172
Copy link
Member Author

@Hakuyume
Could you check this?

@Hakuyume
Copy link
Member

Hakuyume commented Feb 27, 2018

@yuyu2172 OK, I'll check. Please fix the coding style first.

@yuyu2172
Copy link
Member Author

Oh. Sorry about that.

Copy link
Member

@Hakuyume Hakuyume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First comments

nobias=True)
self.conv3 = Conv2DBNActiv(mid_channels, out_channels, 1, 1, 0,
initialW=initialW, nobias=True,
activ=lambda x: x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Caonv2DBActiv without activ looks tricky. How about adding Conv2DBN? (It is ok to make it a private class)

Copy link
Member Author

@yuyu2172 yuyu2172 Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...
I think the current version is fine, but alternatively I can explicitly use conv and bn.

I do not like the idea of adding a private class. This usage comes up a lot.
I am less hesitant on the idea of adding Conv2DBN, but personally it looks redundant given that this is just a Conv2DBNActiv with no activation....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't like adding a private class, using Conv2D + BatchNormalization is better.
Another solution is adding active='no' option. to Conv2DBNActiv.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I like the second idea.
How about setting the name of the special string to identity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the default value of active is chainer.functions.relu (https://github.com/chainer/chainercv/blob/master/chainercv/links/connection/conv_2d_bn_activ.py#L65), we can use active=None for no activation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

stride (int or tuple of ints): Stride of filter application.
initialW (4-D array): Initial weight value used in
the convolutional layers.
conv_shortcut (bool): If :obj:`True`, apply a 1x1 convolution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about residual_conv? I prefer residual to shortcut. Using both residual and shortcut is confusing and residual is more common.


class ResNet(PickableSequentialChain):

"""Base class for ResNet Network.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResNet architecture is better because ResNet Network is Residual Network Network.


"""Base class for ResNet Network.

This is a feature extraction link.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pickable sequential link? We have not defined feature extraction link officially.

This is only supported when :obj:`arch=='he'`.

Args:
model_name (str): Name of the resnet model to instantiate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about n_layer? It takes one of 50, 101, 152 as integer. model_name is difficult to understand.

the mean value used to train the pretrained model is used.
Otherwise, the mean value calculated from ILSVRC 2012 dataset
is used.
initialW (callable): Initializer for the weights.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights for convolution kernels?

@@ -20,6 +20,7 @@ Feature extraction links extract feature(s) from given images.
.. toctree::

links/vgg
links/resnet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alphabetical order

@yuyu2172
Copy link
Member Author

yuyu2172 commented Mar 5, 2018

@Hakuyume

"""A bottleneck layer.

Args:
in_channels (int): The number of channels of input arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input arrays -> (the) input array? From my understanding, this link takes only one array.

Args:
in_channels (int): The number of channels of input arrays.
mid_channels (int): The number of channels of intermediate arrays.
out_channels (int): The number of channels of output arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

| VGG16 | 27.1 % | |
| ResNet50 | 23.0 % | 22.9 % [2] |
| ResNet101 |21.8 % | 21.8 % [2] |
| ResNet152 |21.4 % | 21.4 % [2] |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| 21.4 % (space after |).

@yuyu2172
Copy link
Member Author

yuyu2172 commented Mar 5, 2018

Thanks.

for name in self._forward:
l = getattr(self, name)
x = l(x)
return x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer using PickableSequentialChain to managing _forward manually. Although the pickable feature is not used, it will be more simple.

Copy link
Member

@Hakuyume Hakuyume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Hakuyume Hakuyume merged commit fbe0331 into chainer:master Mar 6, 2018
@yuyu2172 yuyu2172 deleted the resnet-link branch March 6, 2018 02:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants