Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Deferred Initialization Error after a forward pass #9226

Closed
rlantz-cfa opened this issue Dec 29, 2017 · 6 comments
Closed

Deferred Initialization Error after a forward pass #9226

rlantz-cfa opened this issue Dec 29, 2017 · 6 comments

Comments

@rlantz-cfa
Copy link

rlantz-cfa commented Dec 29, 2017

Description

I'm following the Gluon Tutorial, and am attempting to build a custom object detection model by modifying the code found here. (http://gluon.mxnet.io/chapter08_computer-vision/object-detection.html) After building out the architecture, I try running the training loop and end up with a Deferred initialization error even after I've (apparently) made a successful forward pass.

Environment info (Required)

Using SageMaker with the python 3.6 mxnet kernel. The version of mxnet is 0.12.1

Error Message:

I'm here
and here
and here too

[ 0.16604483  0.12181112  0.13917704  0.12500151  0.33701715  0.10331827
  0.13935642  0.16480497  0.22345504  0.48216474  0.11068322  0.14671497]
<NDArray 12 @cpu(0)>
---------------------------------------------------------------------------
DeferredInitializationError               Traceback (most recent call last)
<ipython-input-26-170751fb746e> in <module>()
     24             print(loss)
     25             loss.backward()
---> 26         trainer.step(batch_size)
     27         cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
     28         box_metric.update([box_target], [box_predictions * box_mask])

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in step(self, batch_size, ignore_stale_grad)
    160         """
    161         if not self._kv_initialized:
--> 162             self._init_kvstore()
    163 
    164         self._optimizer.rescale_grad = self._scale / batch_size

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in _init_kvstore(self)
    101 
    102     def _init_kvstore(self):
--> 103         arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
    104         kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
    105                                                      arg_arrays)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/trainer.py in <dictcomp>(.0)
    101 
    102     def _init_kvstore(self):
--> 103         arg_arrays = {param.name: param.data(self._contexts[0]) for param in self._params}
    104         kvstore, update_on_kvstore = _create_kvstore(self._kvstore, len(self._contexts),
    105                                                      arg_arrays)

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
    359         NDArray on ctx
    360         """
--> 361         return self._check_and_get(self._data, ctx)
    362 
    363     def list_data(self):

~/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
    161                 "Please pass one batch of data through the network before accessing Parameters. " \
    162                 "You can also avoid deferred initialization by specifying in_units, " \
--> 163                 "num_features, etc., for network layers."%(self.name))
    164         raise RuntimeError(
    165             "Parameter %s has not been initialized. Note that " \

DeferredInitializationError: Parameter toyssd1_conv2_weight has not been initialized yet because 
initialization was deferred. Actual initialization happens during the first forward pass. Please pass one 
batch of data through the network before accessing Parameters. You can also avoid deferred 
initialization by specifying in_units, num_features, etc., for network layers.

Minimum reproducible example

for epoch in range(start_epoch, epochs):
    train_data.reset()
    cls_metric.reset()
    box_metric.reset()
    tic = time.time()
    for i, batch in enumerate(train_data):
        btic = time.time()
        print(batch)
        with ag.record():
            print("I'm here")
            x = batch.data[0].as_in_context(ctx)
            y = batch.label[0].as_in_context(ctx)
            print("and here")
            default_anchors, class_predictions, box_predictions = net(x)
            box_target, box_mask, cls_target = training_targets(default_anchors, class_predictions, y)
            print("and here too")
            loss1 = cls_loss(class_predictions, cls_target)
            loss2 = box_loss(box_predictions, box_target, box_mask)
            
            loss = loss1 + loss2
            print(loss)
            loss.backward()
        trainer.step(batch_size)
        cls_metric.update([cls_target], [nd.transpose(class_predictions, (0, 2, 1))])
        box_metric.update([box_target], [box_predictions * box_mask])
        if (i + 1) % log_interval == 0:
            name1, val1 = cls_metric.get()
            name2, val2 = box_metric.get()
            print('[Epoch %d Batch %d] speed: %f samples/s, training: %s=%f, %s=%f'
                  %(epoch ,i, batch_size/(time.time()-btic), name1, val1, name2, val2))
            
    name1, val1 = cls_metric.get()
    name2, val2 = box_metric.get()
    print('[Epoch %d] training: %s=%f, %s=%f'%(epoch, name1, val1, name2, val2))
    print('[Epoch %d] time cost: %f'%(epoch, time.time()-tic))

# we can save the trained parameters to disk
net.save_params('ssd_%d.params' % epochs)

If helpful I can paste in the network architecture as well, though it's pretty much the same as in the Gluon tutorial linked above with the small exception that my data input shape is (400, 400).

@sampathchanda
Copy link

sampathchanda commented Feb 27, 2018

Am also facing the same issue. Is there any update on this ?

@sampathchanda
Copy link

Turned out that I was using not using some layers in the forward function, that were already defined under the blocks scope. Fixed now!

@SuperLinguini
Copy link

Proposed Labels: Python, Example, HowTo

@ThomasDelteil
Copy link
Contributor

@rlantz-cfa I think your problem is the same as @sampathchanda, you have some parameters that have not been initialized because they have never been needed in the first forward pass. However you are trying to update them, probably because you passed net.collect_params() in your Trainer, which gave all the parameters of your network to the Trainer.
@rlantz-cfa If you want to follow-up please create a post on https://discuss.mxnet.io Thanks
@indhub could you please close this issue?

@antran89
Copy link

antran89 commented Jul 16, 2018

@ThomasDelteil sorry, I am a newbie into MXNet, what should we pass into the Trainer?

@indhub
Copy link
Contributor

indhub commented Jul 16, 2018

Problem is not what you pass to Trainer. Problem is, you have some layer that is not being used in forward pass. Please ask the question at discuss.mxnet.io with a reproducible example. This is not a bug. Hence closing this issue.

@indhub indhub closed this as completed Jul 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants