Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.8: "an integer is required (got type bytes)" when loading models saved under older Python versions #171

Closed
ManifoldFR opened this issue Oct 1, 2020 · 13 comments
Labels
bug Something isn't working

Comments

@ManifoldFR
Copy link
Contributor

ManifoldFR commented Oct 1, 2020

Describe the bug

There is an issue loading trained agent files in Python 3.8.

Code example

I checked this on the rl-baselines3-zoo repository.

(base) manifold$ python enjoy.py --algo sac --env Pendulum-v0
Loading latest experiment, id=1
Traceback (most recent call last):
  File "enjoy.py", line 225, in <module>
    main()
  File "enjoy.py", line 141, in main
    model = ALGOS[algo].load(model_path, env=env, **kwargs)
  File "/home/manifold/stable-baselines3/stable_baselines3/common/base_class.py", line 561, in load
    data, params, pytorch_variables = load_from_zip_file(path, device=device)
  File "/home/manifold/stable-baselines3/stable_baselines3/common/save_util.py", line 390, in load_from_zip_file
    data = json_to_data(json_data)
  File "/home/manifold/stable-baselines3/stable_baselines3/common/save_util.py", line 164, in json_to_data
    deserialized_object = cloudpickle.loads(base64_object)
TypeError: an integer is required (got type bytes)

System Info
Describe the characteristic of your environment:

  • SB3 downloaded from git trunk
  • Python version 3.8.5
  • PyTorch version 1.6.0
  • Gym version: latest from git
  • Cloudpickle 1.6.0

Additional context

I printed out info and saw that it occurs when loading the data key learning_rate on the SAC example above. For policies trained using recent versions of SB3 it happens for lr_schedule.

@ManifoldFR
Copy link
Contributor Author

ManifoldFR commented Oct 1, 2020

After further investigation, this also applies to clip_range in PPO.

I think the common point is that these can all be callables, right? They were also created using common.utils.constant_fn from what I see in the bytestring.

@Miffyli Miffyli added the bug Something isn't working label Oct 1, 2020
@Miffyli
Copy link
Collaborator

Miffyli commented Oct 1, 2020

Does the same happen with saving/loading models with stable-baselines3 alone without the files in zoo? The code for saving and loading has changed since those zoo models were trained so it might have broken a thing or two.

@ManifoldFR
Copy link
Contributor Author

It happens for models trained using a version of SB3 pulled from Git a week ago, albeit with Python 3.7

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 1, 2020

Just to clarify: You train a model with older version of SB3 (week ago), save it, update to current SB3 (master) and loading fails?

@araffin
Copy link
Member

araffin commented Oct 1, 2020

Just to clarify: You train a model with older version of SB3 (week ago), save it, update to current SB3 (master) and loading fails?

the issue comes with the pretrained models (using python 3.6) that are available in the zoo.

@araffin
Copy link
Member

araffin commented Oct 1, 2020

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 1, 2020

Hmm either rl-zoo should provide parameters of networks which can be loaded (which then skips training parameters...) or update loading code to catch errors like this and then ask for user to provide that object as part of the custom objects (that override the loaded values), and then rl-zoo provides these values.

@ManifoldFR
Copy link
Contributor Author

Rolling back to commit 00595b0 was not sufficient to get the rl-zoo examples working again. I think it's indeed a problem with pickling in 3.8.
AFAIK it might not be the protocol version because pickle/cloudpickle knows to use the right protocol version given a file or byte sequence.

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 1, 2020

Given that pickle/cloudpickle are not meant for long term usage (the explicitly state that it should not be used for such purpose), I think this kind of errors are expected. Moving completely away from pickling does not make much sense since models can have custom agents, but I think we need to add something to make this kind of situations easier to handle (e.g. like providing the invalid values manually).

@ManifoldFR
Copy link
Contributor Author

ManifoldFR commented Oct 1, 2020

I found that it comes from this PR way, way upstream: python/cpython#12701, referenced here. It breaks the constructor of code objects, which include functions defined from functions and saved using cloudpickle.

Seeing this I agree having a nice fallback would be more elegant until things are sorted out upstream.

@ManifoldFR ManifoldFR changed the title Python 3.8: "an integer is required (got type bytes)" when loading models Python 3.8: "an integer is required (got type bytes)" when loading models saved under older Python versions Oct 1, 2020
@pablogsal
Copy link

pablogsal commented Oct 1, 2020

Seeing this I agree having a nice fallback would be more elegant until things are sorted out upstream.

✋ Hi, CPython maintainer here. Unfortunately, there is nothing to be done in our side, as the signature of code objects is not considered stable between versions. The constructor is purpously not even documented and we have this advice here:

If you instantiate any of these types, note that signatures may vary between Python versions.

(https://docs.python.org/3/library/types.html#standard-interpreter-types)

You can also use this new convenience method that we added to make the transition easier for people using code objects:

https://docs.python.org/3/library/types.html#types.CodeType.replace

@ManifoldFR
Copy link
Contributor Author

ManifoldFR commented Oct 1, 2020

Thanks for your input, I'll see about contributing a fallback for our specific problem and a more general fix for the cloudpickle dependency, though their loading code seems actually be pickle.load and this isn't really recommended/supported usage of the library.

@araffin
Copy link
Member

araffin commented Oct 3, 2020

closing this in favor of #172 (and as it is not an issue coming from SB3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants