Deferred Initialization Error after a forward pass #9226

rlantz-cfa · 2017-12-29T12:42:46Z

Description

I'm following the Gluon Tutorial, and am attempting to build a custom object detection model by modifying the code found here. (http://gluon.mxnet.io/chapter08_computer-vision/object-detection.html) After building out the architecture, I try running the training loop and end up with a Deferred initialization error even after I've (apparently) made a successful forward pass.

Environment info (Required)

Using SageMaker with the python 3.6 mxnet kernel. The version of mxnet is 0.12.1

Error Message:

I'm here
and here
and here too

[ 0.16604483  0.12181112  0.13917704  0.12500151  0.33701715  0.10331827
  0.13935642  0.16480497  0.22345504  0.48216474  0.11068322  0.14671497]
<NDArray 12 @cpu(0)>
---------------------------------------------------------------------------
DeferredInitializationError               Traceback (most recent call last)
<ipython-input-26-170751fb746e> in <module>()
     24             print(loss)
     25             loss.backward()
---> 26         trainer.step(batch_size)
     27         cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
     28         box_metric.update([box_target], [box_predictions * box_mask])

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in step(self, batch_size, ignore_stale_grad)
    160         """
    161         if not self._kv_initialized:
--> 162             self._init_kvstore()
    163 
    164         self._optimizer.rescale_grad = self._scale / batch_size

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in _init_kvstore(self)
    101 
    102     def _init_kvstore(self):
--> 103         arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
    104         kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
    105                                                      arg_arrays)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in <dictcomp>(.0)
    101 
    102     def _init_kvstore(self):
--> 103         arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
    104         kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
    105                                                      arg_arrays)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
    359         NDArray on ctx
    360         """
--> 361         return self._check_and_get(self._data, ctx)
    362 
    363     def list_data(self):

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
    161                 "Please pass one batch of data through the network before accessing Parameters. " \
    162                 "You can also avoid deferred initialization by specifying in_units, " \
--> 163                 "num_features, etc., for network layers."%(self.name))
    164         raise RuntimeError(
    165             "Parameter %s has not been initialized. Note that " \

DeferredInitializationError: Parameter toyssd1_conv2_weight has not been initialized yet because 
initialization was deferred. Actual initialization happens during the first forward pass. Please pass one 
batch of data through the network before accessing Parameters. You can also avoid deferred 
initialization by specifying in_units, num_features, etc., for network layers.

Minimum reproducible example

for epoch in range(start_epoch, epochs):
    train_data.reset()
    cls_metric.reset()
    box_metric.reset()
    tic = time.time()
    for i, batch in enumerate(train_data):
        btic = time.time()
        print(batch)
        with ag.record():
            print("I'm here")
            x = batch.data[0].as_in_context(ctx)
            y = batch.label[0].as_in_context(ctx)
            print("and here")
            default_anchors, class_predictions, box_predictions = net(x)
            box_target, box_mask, cls_target = training_targets(default_anchors, class_predictions, y)
            print("and here too")
            loss1 = cls_loss(class_predictions, cls_target)
            loss2 = box_loss(box_predictions, box_target, box_mask)
            
            loss = loss1 + loss2
            print(loss)
            loss.backward()
        trainer.step(batch_size)
        cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
        box_metric.update([box_target], [box_predictions * box_mask])
        if (i + 1) % log_interval == 0:
            name1, val1 = cls_metric.get()
            name2, val2 = box_metric.get()
            print('[Epoch %d Batch %d] speed: %f samples/s, training: %s=%f, %s=%f'
                  %(epoch ,i, batch_size/(time.time()-btic), name1, val1, name2, val2))
            
    name1, val1 = cls_metric.get()
    name2, val2 = box_metric.get()
    print('[Epoch %d] training: %s=%f, %s=%f'%(epoch, name1, val1, name2, val2))
    print('[Epoch %d] time cost: %f'%(epoch, time.time()-tic))

# we can save the trained parameters to disk
net.save_params('ssd_%d.params' % epochs)

If helpful I can paste in the network architecture as well, though it's pretty much the same as in the Gluon tutorial linked above with the small exception that my data input shape is (400, 400).

The text was updated successfully, but these errors were encountered:

sampathchanda · 2018-02-27T20:14:23Z

Am also facing the same issue. Is there any update on this ?

sampathchanda · 2018-02-27T23:58:48Z

Turned out that I was using not using some layers in the forward function, that were already defined under the blocks scope. Fixed now!

SuperLinguini · 2018-03-20T16:29:18Z

Proposed Labels: Python, Example, HowTo

ThomasDelteil · 2018-07-04T20:41:08Z

@rlantz-cfa I think your problem is the same as @sampathchanda, you have some parameters that have not been initialized because they have never been needed in the first forward pass. However you are trying to update them, probably because you passed net.collect_params() in your Trainer, which gave all the parameters of your network to the Trainer.
@rlantz-cfa If you want to follow-up please create a post on https://discuss.mxnet.io Thanks
@indhub could you please close this issue?

antran89 · 2018-07-16T02:38:21Z

@ThomasDelteil sorry, I am a newbie into MXNet, what should we pass into the Trainer?

indhub · 2018-07-16T13:50:40Z

Problem is not what you pass to Trainer. Problem is, you have some layer that is not being used in forward pass. Please ask the question at discuss.mxnet.io with a reproducible example. This is not a bug. Hence closing this issue.

eric-haibin-lin added HowTo Python Example labels Mar 22, 2018

indhub closed this as completed Jul 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deferred Initialization Error after a forward pass #9226

Deferred Initialization Error after a forward pass #9226

rlantz-cfa commented Dec 29, 2017 •

edited

sampathchanda commented Feb 27, 2018 •

edited

sampathchanda commented Feb 27, 2018

SuperLinguini commented Mar 20, 2018

ThomasDelteil commented Jul 4, 2018

antran89 commented Jul 16, 2018 •

edited

indhub commented Jul 16, 2018

Deferred Initialization Error after a forward pass #9226

Deferred Initialization Error after a forward pass #9226

Comments

rlantz-cfa commented Dec 29, 2017 • edited

Description

Environment info (Required)

Error Message:

Minimum reproducible example

sampathchanda commented Feb 27, 2018 • edited

sampathchanda commented Feb 27, 2018

SuperLinguini commented Mar 20, 2018

ThomasDelteil commented Jul 4, 2018

antran89 commented Jul 16, 2018 • edited

indhub commented Jul 16, 2018

rlantz-cfa commented Dec 29, 2017 •

edited

sampathchanda commented Feb 27, 2018 •

edited

antran89 commented Jul 16, 2018 •

edited