Runtime errors during forward in a custom gluon.Block #8593

jdchoi77 · 2017-11-08T23:53:43Z

I created a dummy Block, which takes a 2D array, performs a 2D convolution, and feeds the convoluted output to a fully connected layer :

class DummyBlock(gluon.Block):
    def __init__(self, **kwargs):
        super(DummyBlock, self).__init__(**kwargs)
        with self.name_scope():
            self.conv = gluon.nn.Conv2D(channels=3, kernel_size=(1, 5), strides=(1, -1), activation='relu')
            self.fc = gluon.nn.Dense(5)

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)
        x = self.fc(x)
        return x

I tested DummyBlock using the following code:

import numpy as np
import mxnet as mx
from mxnet import gluon, nd, autograd

X = nd.array([
    [[1,0,0,0,0],[2,0,0,0,0],[3,0,0,0,0],[4,0,0,0,0]],
    [[0,1,0,0,0],[0,2,0,0,0],[0,3,0,0,0],[0,4,0,0,0]],
    [[0,0,1,0,0],[0,0,2,0,0],[0,0,3,0,0],[0,0,4,0,0]],
    [[0,0,0,1,0],[0,0,0,2,0],[0,0,0,3,0],[0,0,0,4,0]],
    [[0,0,0,0,1],[0,0,0,0,2],[0,0,0,0,3],[0,0,0,0,4]]
])

Y = nd.array([0,1,2,3,4])

ctx = mx.cpu()
net = DummyBlock()
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

batch_size = 2
loss_func = gluon.loss.SoftmaxCrossEntropyLoss()
data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, Y), batch_size=batch_size)

for i, (data, label) in enumerate(data):
    data = data.as_in_context(ctx)
    data = data.reshape((0, 1, data.shape[1], data.shape[2]))
    label = label.as_in_context(ctx)
    with autograd.record():
        output = net(data)
        loss = loss_func(output, label)
        loss.backward()
    trainer.step(data.shape[0])

Besides the fact that it doesn't do anything useful, this runs fine without any error. When I transpose x and feed it into the fully connected layer:

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)

        # transpose: <NDArray 2x1x4x3 @cpu(0)>
        x = nd.array([nd.transpose(a).asnumpy() for a in x])
        
        x = self.fc(x)
        return x

it fails after the first batch and gives the following error message:

Traceback (most recent call last):
  File "/Users/jdchoi/workspace/elit/elit/component/postag.py", line 519, in <module>
    trainer.step(data.shape[0])
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/gluon/trainer.py", line 147, in step
    %(param.name, str(data.context)))
UserWarning: Gradient of Parameter `dummyblock0_conv0_weight` on context cpu(0) has not been updated by backward since last `step`. This could mean a bug in your model that maked it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient

In fact, it gives the same error message if I make a copy of x and pass it to the fully connected layer:

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)
        x = x.copy()
        x = self.fc(x)
        return x

When I reshape x and copy transposed values to x, it runs fine:

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)

        # reshape and copy: <NDArray 2x1x4x3 @cpu(0)>
        y = [nd.transpose(a).asnumpy() for a in x]
        x = x.reshape((-1, 1, x.shape[2], x.shape[1]))
        for i in range(len(x)): x[i] = y[i]

        x = self.fc(x)
        return x

This is very hacky and not efficient. Could someone explain to me why the first two approaches fail? I often need to transpose the output of the convolution (or even concatenate another vector with the output), and feed into the next layer, so it will be great to know if I could do with with Gluon. Thank you.

The text was updated successfully, but these errors were encountered:

szha · 2018-02-08T00:26:32Z

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

safrooze · 2018-03-14T02:13:14Z

This is a question that preferably is asked on discuss.mxnet.io. To answer your question, you cannot use asnumpy() in the middle of computational graph because autograd can only record operations performed on ndarray.

szha added the needs triage label Feb 8, 2018

yzhliu added Question HowTo Autograd Gluon and removed needs triage labels Mar 15, 2018

yzhliu closed this as completed Mar 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime errors during forward in a custom gluon.Block #8593

Runtime errors during forward in a custom gluon.Block #8593

jdchoi77 commented Nov 8, 2017

szha commented Feb 8, 2018

safrooze commented Mar 14, 2018

Runtime errors during forward in a custom gluon.Block #8593

Runtime errors during forward in a custom gluon.Block #8593

Comments

jdchoi77 commented Nov 8, 2017

szha commented Feb 8, 2018

safrooze commented Mar 14, 2018