Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Symbolic .json file not compatible with .params file generated since MXNet 1.2 #11091

Closed
ThomasDelteil opened this issue May 29, 2018 · 5 comments

Comments

@ThomasDelteil
Copy link
Contributor

ThomasDelteil commented May 29, 2018

Since MXNet 1.2.0 one possible way of serializing Gluon models is not working anymore.

Reproducible here:

import mxnet as mx
from mxnet import gluon
ctx = mx.cpu()

# Create network
net = gluon.nn.HybridSequential(prefix="test_")
with net.name_scope():
    net.add(gluon.nn.Conv2D(10, (3, 3)))
    net.add(gluon.nn.Dense(50))
net.initialize()
net(mx.nd.ones((1,1,50,50)))

# Save network
a = net(mx.sym.var('data'))
a.save('test.json')
net.save_params('test.params')

# Load network
net2 = gluon.nn.HybridSequential(prefix="test_")
with net2.name_scope():
    sym = mx.sym.load_json(open('test.json', 'r').read())
    net2.add(gluon.nn.SymbolBlock(outputs=sym, inputs=mx.sym.var('data')))
net2.load_params('test.params', ctx=ctx)

Gives the following error:

AssertionError: Parameter 'conv0_weight' is missing in file 'test.params', which contains parameters: '0.weight', '0.bias', '1.weight', '1.bias'. Set allow_missing=True to ignore missing parameters.

Whilst it worked in 1.1.0.

This way of exporting symbol is recommended in this tutorial on the straight dope

The current recommended way, as described in the upcoming tutorial here, is to use the hybridized .export() function.

this would look like that, and works in 1.1.0 and 1.2.0:

import mxnet as mx
from mxnet import gluon
ctx = mx.cpu()

# Create network
net = gluon.nn.HybridSequential(prefix="test_")
with net.name_scope():
    net.add(gluon.nn.Conv2D(10, (3, 3)))
    net.add(gluon.nn.Dense(50))
net.initialize()

# Save network    
net.hybridize()
net(mx.nd.ones((1,1,50,50)))
net.export('test', epoch=0)

# Load network
sym = mx.sym.load_json(open('test-symbol.json', 'r').read())
net2 = gluon.nn.SymbolBlock(outputs=sym, inputs=mx.sym.var('data'))
net2.load_params('test-0000.params', ctx=ctx)

This is affecting people who were until now using this method.

@ifeherva reported this issue is affecting his team.

@piiswrong @marcoabreu @szha

@chinakook
Copy link
Contributor

Using a module to save_checkpoint or gluon’s export would be OK. Getting mixed with Gluon’s save and Symbol’s save is so bad.

@anirudhacharya
Copy link
Member

@nswamy please label - "Breaking","Bug", "Gluon"

@ThomasDelteil
Copy link
Contributor Author

ThomasDelteil commented Jun 4, 2018

One way I think we could fix that issue would be to have by default save_params(filename, format='named_params')
And an option to have save_params(filename, format='numbered_params') or something like that that uses the new behaviour. And switch the default behaviour in 2.0.0
What do you think @piiswrong ?

@wikier
Copy link
Member

wikier commented Jun 6, 2018

SGTM

@ThomasDelteil
Copy link
Contributor Author

ThomasDelteil commented Jun 13, 2018

A more in-depth analysis written by @piiswrong about the cause of the issue and possible solutions:

Background
Gluon provides save_params API, which saves the model parameters (but not model definition) as a binary file ‘xxx.params’. It can be loaded back with API load_params. The saved file is intended to be opaque but you can load it with mx.nd.load and see the internal content (we don’t advertise this).
Gluon also provides export API for saving a gluon model definition (.json) and parameters (.params), which can be loaded with MXNet Module or other language bindings.
Gluon provides SymbolBlock API that can load .json model definition and .params parameter file. But an import helper for this functionality is missing.

The change
We changed the internal structure of the .params file saved by save_params to resolve a bug. Parameters saved by previous versions can still be loaded in new version.

The complaint from user
A user saved model definition and model parameters with mx.sym.save_json and save_params following the straight dope book. Because the book doesn’t show how to load it back, customer invented a hacky way to load it into SymbolBlock with load_params. User's code broke after upgrading from 1.1 to 1.2.

The cause
User's hack depended on internal similarities between .params files saved by save_params and export. After the change of save_params format, this hack stopped working.

Faults on our part

  • The straight dope book should have recommended saving model definition with export instead of mx.sym.save_json and save_params.
  • We should have provided an import utility so that user doesn’t need to invent their own hacks to load model definition into SymbolBlock.
  • (?) We changed the file format saved by save_params. Although it is intended as an opaque binary whose format is not defined in documentation, some customers could be depending on undefined behavior.

Solutions

  1. Revert save_params to previous format. Add new API save_parameters for new format.

Pros

  • User relying on internal format of save_params won’t see breakage.

Cons

  • All users need to manually migrate to new API save_parameters
  • Having both save_params and save_parameters could be confusing.


2. Issue warnings and error messages to instruct users to move to `export` and `import`, and stop depending on undefined behavior

Pros

  • Most users won’t see breakage and won’t need to do anything.

Cons

  • Users depending on save_params’s internal format will see breakage.

For both solutions we can add more documentation and helper API to minimize impact.

Current open PRs related to this issue: #11236 #11127 #11210

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants