Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pickle support to ReadOnly{Dict,List} #508

Closed
wants to merge 2 commits into from

Conversation

AdamGleave
Copy link
Contributor

@AdamGleave AdamGleave commented Jul 11, 2019

Closes #499.

Initially I tried to add pickle support using __getstate__ and __setstate. However, pickle, seems to special-case built-in container types: my __setstate__ method never even got called in ReadOnlyDict. Accordingly I decided to switch to a proxy style, where we are wrapping a dict/list object. This is similar to the MappingProxyType, which we could probably replace ReadOnlyDict with once Python 2.7 support is dropped (it's only available in Python 3.3+).

@coveralls
Copy link

coveralls commented Jul 11, 2019

Coverage Status

Coverage decreased (-0.03%) to 85.047% when pulling 3c5fc9f on HumanCompatibleAI:readonly-picklable into b3ed1f1 on IDSIA:master.

@boeddeker
Copy link
Contributor

How about removing ReadOnly{Dict,List} from sacred?
That container introduces much trouble.

  • Original type gets lost: When the user inherent from dict/list, this type gets lost
  • The config is json/pickle/yaml/... incompatible
  • Using collections.Mapping will destroy isinstance(..., dict)

There may come up even more difficulties.

@AdamGleave
Copy link
Contributor Author

The config shouldn't be modified, so I am sympathetic to the intended purpose of ReadOnly{Dict,List}.

If the only downside is that type-checking against dict or list fails I'm happy to live with this: you should normally be duck-typing in Python anyway, and collections.Mapping is a better thing to check against than dict.

The lack of JSON/YAML support is a downside I hadn't considered. I think you can get around this by doing copy.deepcopy first. Not ideal though!

@boeddeker
Copy link
Contributor

I do not say, that the config should be modified.
I also like it, when you get an error msg, when you did something wrong.
But I hate it, when I get an error msg, when I did everything correct.
Writing the config object to the disk with yaml or json is one of these cases.

While the advantages of this feature are nice, but not necessary, the disadvantages, that are known until now are terrible: Implicit changing the type and no support to dump the config.

@JarnoRFB
Copy link
Collaborator

Well this turns out to make more trouble than expected. In the original discussion with @thequilo we decided that it was a priority to keep type checking intact. @AdamGleave you are of course right that type checking against explicit types it bad practice anyway. Checking against collections.abc should be the way the go. Maybe the only downside is that it is not widely used in the community.
However, if we now decide to have read only containers that type check only against collections.abc we could use pyristent right away, so we do not have to maintain our own immutable data structures.
@boeddeker could you please explain me what you mean by

The config is json/pickle/yaml/... incompatible

@JarnoRFB
Copy link
Collaborator

@AdamGleave by the way python 2 support has already been dropped, so no need to write any compatibility code anymore :)

@AdamGleave
Copy link
Contributor Author

For me pickle support is a priority. I'm writing distributed experiments using Sacred, and so objects often get pickled to send to another machine or process, and it's not always obvious where this happens. I expect this is a fairly common use case.

Breaking type checking is sad, but seems to me like the least-bad evil. Inheriting from built-in types have a lot of pitfalls anyway, e.g. this gotcha. It should probably be announced as a breaking change in the changelist though.

pyristent looks nice -- hadn't seen that before. Assuming it does what it says on the tin I'd be in favour of using that over my hacky implementations.

I think @boeddeker meant that just calling e.g. json.dump on a ReadOnlyDict will fail.

@boeddeker
Copy link
Contributor

When you dump custom types with yaml, you get

!!python/object/new:__main__.ReadOnlyDict
dictitems:
  '1': 2
state:
  message: This ReadOnlyDict is read-only!

or when you follow the advice from [1] you can not dump custom types.

I tested now also json. I didn't know that it does an isinstance check , so json still works.
But the people may use other serializes that support build in types, but not custom types (e.g. pickle)

pyristent is even worse than custom types, because it does not duck type the build in types:
e.g.: list.append is inplace pyrsistent.v.append is not inplace.

[1] https://github.com/yaml/pyyaml/wiki/PyYAML-yaml.load(input)-Deprecation

@JarnoRFB
Copy link
Collaborator

My suggestion would be that we specify the requirements that we have towards serializability of the config objects clearly using some tests. Then we can search more specifically for possible solutions.

@Qwlouse
Copy link
Collaborator

Qwlouse commented Jul 12, 2019

How about this:

class ReadOnlyDict(dict):
    def __init__(self, container, _message=None):
        super().__init__(container)
        self._message = _message or 'This dict is read-only!'
        
    def _readonly(self, *args, **kwargs):
        raise SacredError(
            self.message,
            filter_traceback='always'
        )
        
    def __reduce__(self):
        return ReadOnlyDict, (dict(self.items()), self._message)
       
    clear = _readonly
    pop = _readonly
    popitem = _readonly
    setdefault = _readonly
    update = _readonly
    __setitem__ = _readonly
    __delitem__ = _readonly

This can be serialized/deserialized with pickle and json and should work the same as the current version. I removed the inheritance from ReadOnlyContainer because it is a bit tricky to get right, and I don't think it added much. But it might be possible to bring it back if a common base-class turns out to be important.

@boeddeker
Copy link
Contributor

There are always serializers that check the type and don't use isinstance.
And for a serializer this is reasonable.

yaml.safe_dump is a good example where the ReadOnlyDict fails.
I think it is impossible to define a ReadOnlyDict that can be dumped with yaml.safe_dump.

@AdamGleave
Copy link
Contributor Author

@Qwlouse this seems to be a good trade-off of the different requirements. It's a shame YAML doesn't support it out the box but yaml.safe_dump(copy.deepcopy(cfg)) should still work (if we add back in the methods that are in the current implementation).

@boeddeker
Copy link
Contributor

@AdamGleave : It is correct, that YAML does not support this.
First for security reasons, see warning in pickle documentations and second automatic fallback for a serialization protocol that supports custom types is a bad idea.

@gabrieldemarmiesse
Copy link
Collaborator

Wouldn't SETTINGS.CONFIG.READ_ONLY_CONFIG = False fix this issue?

@AdamGleave
Copy link
Contributor Author

Wouldn't SETTINGS.CONFIG.READ_ONLY_CONFIG = False fix this issue?

This is a viable workaround, but I think it has some drawbacks as a solution. You obviously lose the benefit of read-only dictionaries/lists. Serialization also seems like a very common use case, so I'd strongly advocate for making it work out of the box, without setting obscure flags. If that's not possible, then I'd suggest at least having an informative error message. In my case, it was quite hard to even identify that Sacred was the problem (pickle failed on a function, which happened to close over a data structure, which deeply inside of it contained a Sacred config).

@stale
Copy link

stale bot commented Nov 21, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 21, 2019
@stale stale bot closed this Nov 28, 2019
@shwang
Copy link
Contributor

shwang commented Mar 29, 2020

Well this turns out to make more trouble than expected. In the original discussion with @thequilo we decided that it was a priority to keep type checking intact. @AdamGleave you are of course right that type checking against explicit types it bad practice anyway. Checking against collections.abc should be the way the go. Maybe the only downside is that it is not widely used in the community.
However, if we now decide to have read only containers that type check only against collections.abc we could use pyristent right away, so we do not have to maintain our own immutable data structures.

@JarnoRFB: To check that I'm understanding this thread right (and that this hasn't changed over the past few months), would the Sacred maintainers likely accept a variant of this PR that uses pyrsistent.PMap to replace ReadOnlyDict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReadOnlyDict not pickleable
7 participants