Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Modifications when batch_size of data and label are not the same #1972

Closed
wants to merge 1 commit into from
Closed

Modifications when batch_size of data and label are not the same #1972

wants to merge 1 commit into from

Conversation

jonbakerfish
Copy link
Contributor

Modifications for the python training code, when the of the batch_size of input data and label are not the same. Related to issue #214. Modifications for fast-rcnn training.

For now we can call model.fit(X=your_data_iter) to train the network where your_data_iter has different batch_size for the image and label.

@piiswrong
Copy link
Contributor

Can you work around this by reshaping the label into (data_batch_size, label_batch_size/data_batch_size, ...)

@jonbakerfish
Copy link
Contributor Author

jonbakerfish commented Apr 27, 2016

@piiswrong Can I add a flatten layer for the label and used the flattened label as input for the loss layer?

Currently, the network looks like this:

data = mx.symbol.Variable(name="data")
rois_data = mx.symbol.Variable(name="rois_data")

# group 1
conv1_1 = mx.symbol.Convolution(data=data, kernel=(3, 3), pad=(1, 1), num_filter=64, name="conv1_1")
relu1_1 = mx.symbol.Activation(data=conv1_1, act_type="relu", name="relu1_1")
pool1 = mx.symbol.Pooling(
    data=relu1_1, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool1")
# group 2
conv2_1 = mx.symbol.Convolution(
    data=pool1, kernel=(3, 3), pad=(1, 1), num_filter=128, name="conv2_1")
relu2_1 = mx.symbol.Activation(data=conv2_1, act_type="relu", name="relu2_1")
pool2 = mx.symbol.Pooling(
    data=relu2_1, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool2")
# group 3
conv3_1 = mx.symbol.Convolution(
    data=pool2, kernel=(3, 3), pad=(1, 1), num_filter=256, name="conv3_1")
relu3_1 = mx.symbol.Activation(data=conv3_1, act_type="relu", name="relu3_1")
conv3_2 = mx.symbol.Convolution(
    data=relu3_1, kernel=(3, 3), pad=(1, 1), num_filter=256, name="conv3_2")
relu3_2 = mx.symbol.Activation(data=conv3_2, act_type="relu", name="relu3_2")
pool3 = mx.symbol.Pooling(
    data=relu3_2, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool3")
# group 4
conv4_1 = mx.symbol.Convolution(
    data=pool3, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv4_1")
relu4_1 = mx.symbol.Activation(data=conv4_1, act_type="relu", name="relu4_1")
conv4_2 = mx.symbol.Convolution(
    data=relu4_1, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv4_2")
relu4_2 = mx.symbol.Activation(data=conv4_2, act_type="relu", name="relu4_2")
pool4 = mx.symbol.Pooling(
    data=relu4_2, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool4")
# group 5
conv5_1 = mx.symbol.Convolution(
    data=pool4, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv5_1")
relu5_1 = mx.symbol.Activation(data=conv5_1, act_type="relu", name="relu5_1")
conv5_2 = mx.symbol.Convolution(
    data=relu5_1, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv5_2")
relu5_2 = mx.symbol.Activation(data=conv5_2, act_type="relu", name="conv1_2")

# roi pooling
roi_5 = mx.sym.ROIPooling(
    data=relu5_2, rois=rois_data, pooled_size=(7, 7), spatial_scale=1/16.0, name='roi_pool5')

# group 6
fc6 = mx.symbol.FullyConnected(data=roi_5, num_hidden=4096, name="fc6")
relu6 = mx.symbol.Activation(data=fc6, act_type="relu", name="relu6")
drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
# group 7
fc7 = mx.symbol.FullyConnected(data=drop6, num_hidden=4096, name="fc7")
relu7 = mx.symbol.Activation(data=fc7, act_type="relu", name="relu7")
drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
# output
fc8 = mx.symbol.FullyConnected(data=drop7, num_hidden=num_classes, name="fc8")
softmax = mx.symbol.SoftmaxOutput(data=fc8, name='softmax')

@pluskid
Copy link
Contributor

pluskid commented Apr 27, 2016

Yes you can add a flat layer to flat labels (see the current symbol constructed for char-LSTM for example). There is however one caveat for that: once you start applying operators on the label, during prediction time, if you want to bind an executor without providing label shape (because there is no label), it will fail because the shape inference for the label pipeline will fail. Of course you can always use a slightly different symbol without the whole label pathway. But just to let you know of this potential issue.

@jonbakerfish
Copy link
Contributor Author

Which value of the batch_size for optimizer's rescale_grad=(1.0/batch_size) should be set in this case? Is it the batch_size of label?

@jonbakerfish jonbakerfish deleted the forpull branch April 28, 2016 19:47
@zdwong
Copy link

zdwong commented Jul 26, 2017

What should the rescale_grad set in this situation?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants