Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Runtime errors during forward in a custom gluon.Block #8593

Closed
jdchoi77 opened this issue Nov 8, 2017 · 2 comments
Closed

Runtime errors during forward in a custom gluon.Block #8593

jdchoi77 opened this issue Nov 8, 2017 · 2 comments

Comments

@jdchoi77
Copy link

jdchoi77 commented Nov 8, 2017

I created a dummy Block, which takes a 2D array, performs a 2D convolution, and feeds the convoluted output to a fully connected layer :

class DummyBlock(gluon.Block):
    def __init__(self, **kwargs):
        super(DummyBlock, self).__init__(**kwargs)
        with self.name_scope():
            self.conv = gluon.nn.Conv2D(channels=3, kernel_size=(1, 5), strides=(1, -1), activation='relu')
            self.fc = gluon.nn.Dense(5)

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)
        x = self.fc(x)
        return x

I tested DummyBlock using the following code:

import numpy as np
import mxnet as mx
from mxnet import gluon, nd, autograd

X = nd.array([
    [[1,0,0,0,0],[2,0,0,0,0],[3,0,0,0,0],[4,0,0,0,0]],
    [[0,1,0,0,0],[0,2,0,0,0],[0,3,0,0,0],[0,4,0,0,0]],
    [[0,0,1,0,0],[0,0,2,0,0],[0,0,3,0,0],[0,0,4,0,0]],
    [[0,0,0,1,0],[0,0,0,2,0],[0,0,0,3,0],[0,0,0,4,0]],
    [[0,0,0,0,1],[0,0,0,0,2],[0,0,0,0,3],[0,0,0,0,4]]
])

Y = nd.array([0,1,2,3,4])

ctx = mx.cpu()
net = DummyBlock()
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

batch_size = 2
loss_func = gluon.loss.SoftmaxCrossEntropyLoss()
data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, Y), batch_size=batch_size)

for i, (data, label) in enumerate(data):
    data = data.as_in_context(ctx)
    data = data.reshape((0, 1, data.shape[1], data.shape[2]))
    label = label.as_in_context(ctx)
    with autograd.record():
        output = net(data)
        loss = loss_func(output, label)
        loss.backward()
    trainer.step(data.shape[0])

Besides the fact that it doesn't do anything useful, this runs fine without any error. When I transpose x and feed it into the fully connected layer:

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)

        # transpose: <NDArray 2x1x4x3 @cpu(0)>
        x = nd.array([nd.transpose(a).asnumpy() for a in x])
        
        x = self.fc(x)
        return x

it fails after the first batch and gives the following error message:

Traceback (most recent call last):
  File "/Users/jdchoi/workspace/elit/elit/component/postag.py", line 519, in <module>
    trainer.step(data.shape[0])
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/mxnet/gluon/trainer.py", line 147, in step
    %(param.name, str(data.context)))
UserWarning: Gradient of Parameter `dummyblock0_conv0_weight` on context cpu(0) has not been updated by backward since last `step`. This could mean a bug in your model that maked it only use a subset of the Parameters (Blocks) for this iteration. If you are intentionally only using a subset, call step with ignore_stale_grad=True to suppress this warning and skip updating of Parameters with stale gradient

In fact, it gives the same error message if I make a copy of x and pass it to the fully connected layer:

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)
        x = x.copy()
        x = self.fc(x)
        return x

When I reshape x and copy transposed values to x, it runs fine:

    def forward(self, x):
        # 2D convolution: <NDArray 2x3x4x1 @cpu(0)>
        x = self.conv(x)

        # reshape and copy: <NDArray 2x1x4x3 @cpu(0)>
        y = [nd.transpose(a).asnumpy() for a in x]
        x = x.reshape((-1, 1, x.shape[2], x.shape[1]))
        for i in range(len(x)): x[i] = y[i]

        x = self.fc(x)
        return x

This is very hacky and not efficient. Could someone explain to me why the first two approaches fail? I often need to transpose the output of the convolution (or even concatenate another vector with the output), and feed into the next layer, so it will be great to know if I could do with with Gluon. Thank you.

@szha
Copy link
Member

szha commented Feb 8, 2018

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

@safrooze
Copy link
Contributor

This is a question that preferably is asked on discuss.mxnet.io. To answer your question, you cannot use asnumpy() in the middle of computational graph because autograd can only record operations performed on ndarray.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants