Modifications when batch_size of data and label are not the same #1972

jonbakerfish · 2016-04-27T01:54:08Z

Modifications for the python training code, when the of the batch_size of input data and label are not the same. Related to issue #214. Modifications for fast-rcnn training.

For now we can call model.fit(X=your_data_iter) to train the network where your_data_iter has different batch_size for the image and label.

…put data and label are not the same.

piiswrong · 2016-04-27T04:47:06Z

Can you work around this by reshaping the label into (data_batch_size, label_batch_size/data_batch_size, ...)

jonbakerfish · 2016-04-27T05:11:20Z

@piiswrong Can I add a flatten layer for the label and used the flattened label as input for the loss layer?

Currently, the network looks like this:

data = mx.symbol.Variable(name="data")
rois_data = mx.symbol.Variable(name="rois_data")

# group 1
conv1_1 = mx.symbol.Convolution(data=data, kernel=(3, 3), pad=(1, 1), num_filter=64, name="conv1_1")
relu1_1 = mx.symbol.Activation(data=conv1_1, act_type="relu", name="relu1_1")
pool1 = mx.symbol.Pooling(
    data=relu1_1, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool1")
# group 2
conv2_1 = mx.symbol.Convolution(
    data=pool1, kernel=(3, 3), pad=(1, 1), num_filter=128, name="conv2_1")
relu2_1 = mx.symbol.Activation(data=conv2_1, act_type="relu", name="relu2_1")
pool2 = mx.symbol.Pooling(
    data=relu2_1, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool2")
# group 3
conv3_1 = mx.symbol.Convolution(
    data=pool2, kernel=(3, 3), pad=(1, 1), num_filter=256, name="conv3_1")
relu3_1 = mx.symbol.Activation(data=conv3_1, act_type="relu", name="relu3_1")
conv3_2 = mx.symbol.Convolution(
    data=relu3_1, kernel=(3, 3), pad=(1, 1), num_filter=256, name="conv3_2")
relu3_2 = mx.symbol.Activation(data=conv3_2, act_type="relu", name="relu3_2")
pool3 = mx.symbol.Pooling(
    data=relu3_2, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool3")
# group 4
conv4_1 = mx.symbol.Convolution(
    data=pool3, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv4_1")
relu4_1 = mx.symbol.Activation(data=conv4_1, act_type="relu", name="relu4_1")
conv4_2 = mx.symbol.Convolution(
    data=relu4_1, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv4_2")
relu4_2 = mx.symbol.Activation(data=conv4_2, act_type="relu", name="relu4_2")
pool4 = mx.symbol.Pooling(
    data=relu4_2, pool_type="max", kernel=(2, 2), stride=(2,2), name="pool4")
# group 5
conv5_1 = mx.symbol.Convolution(
    data=pool4, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv5_1")
relu5_1 = mx.symbol.Activation(data=conv5_1, act_type="relu", name="relu5_1")
conv5_2 = mx.symbol.Convolution(
    data=relu5_1, kernel=(3, 3), pad=(1, 1), num_filter=512, name="conv5_2")
relu5_2 = mx.symbol.Activation(data=conv5_2, act_type="relu", name="conv1_2")

# roi pooling
roi_5 = mx.sym.ROIPooling(
    data=relu5_2, rois=rois_data, pooled_size=(7, 7), spatial_scale=1/16.0, name='roi_pool5')

# group 6
fc6 = mx.symbol.FullyConnected(data=roi_5, num_hidden=4096, name="fc6")
relu6 = mx.symbol.Activation(data=fc6, act_type="relu", name="relu6")
drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
# group 7
fc7 = mx.symbol.FullyConnected(data=drop6, num_hidden=4096, name="fc7")
relu7 = mx.symbol.Activation(data=fc7, act_type="relu", name="relu7")
drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
# output
fc8 = mx.symbol.FullyConnected(data=drop7, num_hidden=num_classes, name="fc8")
softmax = mx.symbol.SoftmaxOutput(data=fc8, name='softmax')

pluskid · 2016-04-27T13:16:39Z

Yes you can add a flat layer to flat labels (see the current symbol constructed for char-LSTM for example). There is however one caveat for that: once you start applying operators on the label, during prediction time, if you want to bind an executor without providing label shape (because there is no label), it will fail because the shape inference for the label pipeline will fail. Of course you can always use a slightly different symbol without the whole label pathway. But just to let you know of this potential issue.

jonbakerfish · 2016-04-28T00:52:47Z

Which value of the batch_size for optimizer's rescale_grad=(1.0/batch_size) should be set in this case? Is it the batch_size of label?

zdwong · 2017-07-26T13:55:00Z

What should the rescale_grad set in this situation?

Modifications for the training code, when the of the batch_size of in…

5965963

…put data and label are not the same.

jonbakerfish closed this Apr 28, 2016

jonbakerfish deleted the forpull branch April 28, 2016 19:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifications when batch_size of data and label are not the same #1972

Modifications when batch_size of data and label are not the same #1972

jonbakerfish commented Apr 27, 2016

piiswrong commented Apr 27, 2016

jonbakerfish commented Apr 27, 2016 •

edited

Loading

pluskid commented Apr 27, 2016

jonbakerfish commented Apr 28, 2016

zdwong commented Jul 26, 2017

Modifications when batch_size of data and label are not the same #1972

Modifications when batch_size of data and label are not the same #1972

Conversation

jonbakerfish commented Apr 27, 2016

piiswrong commented Apr 27, 2016

jonbakerfish commented Apr 27, 2016 • edited Loading

pluskid commented Apr 27, 2016

jonbakerfish commented Apr 28, 2016

zdwong commented Jul 26, 2017

jonbakerfish commented Apr 27, 2016 •

edited

Loading