Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make config read only #472

Merged
merged 2 commits into from
Jun 6, 2019
Merged

Make config read only #472

merged 2 commits into from
Jun 6, 2019

Conversation

thequilo
Copy link
Collaborator

@thequilo thequilo commented May 3, 2019

Add new ReadOnlyContainer types and use them for the config. This solves #370.

Copy link
Collaborator

@JarnoRFB JarnoRFB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that we have progress on this! Also looking forward on using the ReadOnlyContainers in incense.

sacred/config/custom_containers.py Show resolved Hide resolved
sacred/config/custom_containers.py Outdated Show resolved Hide resolved
sacred/config/custom_containers.py Show resolved Hide resolved
@thequilo
Copy link
Collaborator Author

thequilo commented May 3, 2019

I now added some more unittests for the container types with and without the experiment.

I didn't know that UserDict and UserList exist. What is the benefit of using them?

@thequilo
Copy link
Collaborator Author

thequilo commented May 3, 2019

Also, currently test_double_nested_config fails due to the modifications because it is doing _config.pop('seed'). Should we include some mechanism to create a writable copy of the read-only containers? Maybe implement __copy__ and __deepcopy__ to return a mutable dict?

@JarnoRFB
Copy link
Collaborator

JarnoRFB commented May 4, 2019

I didn't know that UserDict and UserList exist. What is the benefit of using them?

These classes where designed for inheritance, so all the methods that access some internal state go through the dunder methods (__setitem__, __delitem__). This would mean that you only have to overwrite these methods to get a functioning read only container. Other public methods like clear should than work as expected right away. You can read more about it here https://treyhunner.com/2019/04/why-you-shouldnt-inherit-from-list-and-dict-in-python/

@thequilo
Copy link
Collaborator Author

thequilo commented May 5, 2019

That makes sense. But it is apparently not correct that all methods that access the internal state go through the dunder methods, at least not for UserList. Also for UserDict, setdefault does not use __setitem__ when there is already a value set for that key and clear only calls __delitem__ when the dict is not empty (which makes sense), so those two methods must again be overwritten explicitly. I made the containers inherit from the user types, but it did not make the code cleaner, and certainly not faster.

When used in interactive mode (ipyhton, jupyter notebook), it may make sense to overwrite all methods that can potentially change the contents to provide the user with a corresponding docstring.

Another possibility would be to completely drop the inheritance from any dict or list type and write a wrapper around it that delegates the method calls to the wrapped object.

@thequilo
Copy link
Collaborator Author

thequilo commented May 5, 2019

The approach to drop the inheritance to all list/dict types would look something like this:

class ReadOnlyList:
    def __init__(self, *args, **kwargs):
        self.data = list(*args, **kwargs)

    def __repr__(self):
        return repr(self.data)

    def __lt__(self, other):
        return self.data < self.__cast(other)

    def __le__(self, other):
        return self.data <= self.__cast(other)

    def __eq__(self, other):
        return self.data == self.__cast(other)

    def __gt__(self, other):
        return self.data > self.__cast(other)

    def __ge__(self, other):
        return self.data >= self.__cast(other)

    def __cast(self, other):
        return other.data if isinstance(other, self.__class__) else other
    
    def __contains__(self, item):
        return item in self.data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, i):
        return self.data[i]

    def __add__(self, other):
        if isinstance(other, self.__class__):
            return self.__class__(self.data + other.data)
        elif isinstance(other, type(self.data)):
            return self.__class__(self.data + other)
        return self.__class__(self.data + list(other))

    def __radd__(self, other):
        if isinstance(other, self.__class__):
            return self.__class__(other.data + self.data)
        elif isinstance(other, type(self.data)):
            return self.__class__(other + self.data)
        return self.__class__(list(other) + self.data)

    def __iadd__(self, other):
        if isinstance(other, self.__class__):
            self.data += other.data
        elif isinstance(other, type(self.data)):
            self.data += other
        else:
            self.data += list(other)
        return self

    def __mul__(self, n):
        return self.__class__(self.data * n)

    __rmul__ = __mul__

    def __imul__(self, n):
        self.data *= n
        return self

    def copy(self):
        return self.__class__(self)

    def count(self, item):
        return self.data.count(item)

    def index(self, item, *args):
        return self.data.index(item, *args)

which is basically a copy of UserList without the modifying methods.

@boeddeker
Copy link
Contributor

Maybe two alternatives should be considered:

  • Use tuple and types.MappingProxyType instead of list and dict
  • Use deepcopy for all arguments

Advantages of the second:

  • no captured function can change the global config. This is guaranteed.
  • the datatype is preserved
  • isinstance checks work

When the ReadOnlyList does not inherit from list, why not using tuple? Tuple is an immutable (ReadOnly) list.

@thequilo
Copy link
Collaborator Author

thequilo commented May 7, 2019

Use tuple and types.MappingProxyType instead of list and dict

Then, isinstance checks don't work anymore. They neither work with UserList and UserDict though.

When the ReadOnlyList does not inherit from list, why not using tuple? Tuple is an immutable (ReadOnly) list.

Correct. Does not make sense to make ReadOnlyList not inherit from list.

Use deepcopy for all arguments

There are two possibilities: copy when (a) the configuration is initialized or (b) when the function is called.

  • a) This makes sure that the global configuration is not changed. But still, the (captured function) local configuration could accidentally be changed and the changes would persist between successive calls of the same captured function, again, not recorded.
  • b) This solves the problems of (a) and lets the user do anything he wants inside the captured function (which is nice), but performing a deepcopy operation every time a captured function is called could drastically increase the overhead, depending on the size of the configuration.

If we assume that the only container types in the configuration are tuple, list and dict (which is obviously not guaranteed), I would prefer to subclass from those types and raise an exception if there is an attempt to change something. Then, isinstance checks still work and the user gets feedback where exactly he tries to change the configuration.

We could also save a copy of the configuration and check after each function call if any value was changed. This probably introduces less overhead than doing a deepcopy on every call, and it works without any new types.

I'm now completely unsure about what is the best solution for this. @JarnoRFB what is your opinion on this?

@JarnoRFB
Copy link
Collaborator

JarnoRFB commented May 8, 2019

This seems more complex than I initially thought. To throw something else in the arena we might consider using freeze for converting everything to pyristent data structures https://github.com/tobgu/pyrsistent. However, I have not personally worked with it, but maybe it is good for our usecase

@thequilo
Copy link
Collaborator Author

thequilo commented May 9, 2019

This seems more complex than I initially thought.

Definitely.

I listed all options that currently are under debate below. What we might want to achieve:

  • keep isinstance checks intact
  • give user feedback where the write to the config values happens
  • provide a way to get a mutable copy of the config
  • low overhead

No custom types

  • types are preserved -> isinstance and similar things still work as expected
  • works for any type, even user-defined classes, not just the built-in collection types

deepcopy on every call

  • no feedback, user can do anything in a captured function
  • high overhead

deepcopy on initialization

  • does not prevent changes of local config values that persist between function calls
  • no feedback, invalid writes to the config are not detected

deepcopy on initialization and equals checks after each function call

  • low-resolution feedback, invalid writes are detected
  • a "mutable" copy can be obtained in the captured function doing a deepcopy again
  • lower overhead than deepcopy on every call

Special Types

  • Everything listed here introduces new types for dict and list, and only prevents writes to those collections. Any write to user-defined types is not prevented.
  • have a low overhead during runtime (for the call of a captured function), only introduce overhead during initialization

pyrsistent

  • isinstance checks don't work
  • new dependency, maybe too complicated for our use case
  • no feedback
  • a modified copy can be created from the config

subclass of UserDict/UserList

  • isinstance does not work as expected
  • feedback is possible on a very fine-grained level

subclass of list/dict

  • isinstance intact
  • feedback on very fine-grained level possible
  • probably faster than UserDict/UserList

tuple/types.MappingProxyType

  • breaks isinstance
  • uses the same type for list and tuple config values, which may be confusing or break existing code

I think we shouldn't break isinstance checks, so from the list of special types only the subclasses of list/dict are possible. But those still only prevent writes to lists and dicts and don't work with user defined types. At the moment, I think we should keep a copy of the original config and check if the config passed to the function was modified. This works with all types, does not require a deepcopy on every call of a captured function and keeps the types intact while still detecting invalid writes and giving the user feedback.

@thequilo
Copy link
Collaborator Author

What about something along these lines? (look into captured_function.py)

@JarnoRFB
Copy link
Collaborator

Thanks for the detailed and structured overview! After thinking a bit about it I would say that it is more or less impossible to guarantee immutability to data that is intrinsically mutable. The only solution that disables changes that persist between function calls is really deepcopy on every call. But as you said that might lead to large overheads. I believe deepcopy on initialization and equals checks after each function call that you implemented in your last PR will only work for custom types that implement __eq__. For everything else, comparisons against deep copied versions will fail. Therefore I would go with the following.

  1. subclass of list/dict so we do at least prevent obvious mistakes and keep isinstance checks intact
  2. provide an optional setting for deepcopy on every call in a separate PR. I think this would be useful for everyone that does not put custom types in their configs
  3. As an idea to achieve something close to equality checks we might consider comparing hash values of configs as proposed in Suggestion: MD5-hashed config for duplication checking #467 .

@thequilo
Copy link
Collaborator Author

I believe deepcopy on initialization and equals checks after each function call that you implemented in your last PR will only work for custom types that implement eq. For everything else, comparisons against deep copied versions will fail.

You are right. But also computing a hash of the configuration only works for types that implement __hash__, and btw, dict is not hashable. It is a completely different discussion of how to compute a hash value for the configuration, and I assume that it involves some serious thinking about it.

I'll then implement your suggestions and squash the commits of this pull request in the following week.

@JarnoRFB
Copy link
Collaborator

Thanks for the change! Time to get this merged. I think the CI error could just be fixed by explicitly setting the seed in config.

@ex.config
def config():
    a = 1
    seed = 42

...

@ex.main
def main(_config):
    assert _config == {
        'a': 1,
        'seed': 42,
        'sub_sub_ing': {'d': 3},
        'sub_ing': {'c': 2},
        'ing': {'b': 1}
    }, _config

    ing_main()
    sub_ing_main()
    sub_sub_ing_main()

Then just fix the flake8 errors and I think this is good to go.

@thequilo thequilo force-pushed the read_only_config branch 2 times, most recently from 7203945 to 585153f Compare June 6, 2019 06:51
@coveralls
Copy link

coveralls commented Jun 6, 2019

Coverage Status

Coverage increased (+0.2%) to 84.617% when pulling a93d46a on thequilo:read_only_config into fe3a95a on IDSIA:master.

@thequilo
Copy link
Collaborator Author

thequilo commented Jun 6, 2019

Python 2.7 is driving me crazy. And Codacity is complaining about unused variables in config scopes, intentional uses of assert in testcases and intentionally using type==... instead of isinstance.

@JarnoRFB
Copy link
Collaborator

JarnoRFB commented Jun 6, 2019

@thequilo thanks! Looks good now. I hope to get rid of py 27 asap, but I can't make a release myself. If it takes to long, we could also just switch without a release... Codacy is bullshit anyway, most of the time just false alarms. Merging then!

@JarnoRFB JarnoRFB merged commit cd1501a into IDSIA:master Jun 6, 2019
@Qwlouse
Copy link
Collaborator

Qwlouse commented Jun 20, 2019

Wow, thanks @JarnoRFB and @thequilo. This is very well thought through feature. Thanks a lot for this!

And I agree about codacy: I thought it was a useful tool to improve code quality at the time. But it has mostly produced false alarms, so I am removing it again.

rueberger added a commit to rueberger/sacred that referenced this pull request Jun 20, 2019
* 'master' of https://github.com/IDSIA/sacred:
  Release 0.7.5
  update dependencies to safer version of SQLAlchemy
  fixed numpy deprecation warning
  fix yaml deprecation warning
  Make config read only (IDSIA#472)
  Remove broken codacy badge (IDSIA#486)
  Add queue based observer (IDSIA#471)
  Run CI against python 36 and 37 (IDSIA#485)
  Make failed mongo observer dump configurable (IDSIA#462)
  Make stale time longer (IDSIA#484)
  Fix a bug where a unicode char in README.rst would fail install (IDSIA#481)
@boeddeker
Copy link
Contributor

Now overwriting entries in the config does not work, but when someone did this in the right way, this also does not work:

import copy
def foo(_config):
     _config = copy.deepcopy(_config)
     _config['something'] = ...

How about changing the datatype back to the original datatype when using copy?

import copy

class ReadOnlyDict(dict):
    ...
    def __copy__(self):
        return {**self}

    def __deepcopy__(self, memo):
        d = dict(self)
        return copy.deepcopy(d, memo=memo)
>>> type(copy.copy(ReadOnlyDict()))
dict
>>> type(copy.deepcopy(ReadOnlyDict()))
dict

@JarnoRFB
Copy link
Collaborator

@boeddeker sounds very reasonable to me. PR welcome!

@thequilo
Copy link
Collaborator Author

Hmm, this actually was in the list of features that the read-only contains should have ("provide a way to get a mutable copy of the config"), but it somehow didn't get implemented

@boeddeker
Copy link
Contributor

Another thing, is it intended to lose the original datatype in the config?
make_read_only makes an isinstance check but does not preserve the original datatype.

>>> class MyDict(dict):
...     pass
>>> isinstance(MyDict(), dict)
True

Solutions:

  1. Make a type check: type(o, dict)
  2. Use a dynamic type: type('ReadOnlyDict', (ReadOnlyDict, *inspect.getmro(MyDict)), {})()

I prefer the first.

rueberger pushed a commit to rueberger/sacred that referenced this pull request Jun 21, 2019
…d into run_kwargs_fix

* 'run_kwargs_fix' of github.com-rueberger:rueberger/sacred:
  Release 0.7.5
  update dependencies to safer version of SQLAlchemy
  fixed numpy deprecation warning
  fix yaml deprecation warning
  Make config read only (IDSIA#472)
  Remove broken codacy badge (IDSIA#486)
  Add queue based observer (IDSIA#471)
  Run CI against python 36 and 37 (IDSIA#485)
  Make failed mongo observer dump configurable (IDSIA#462)
  Make stale time longer (IDSIA#484)
  Fix a bug where a unicode char in README.rst would fail install (IDSIA#481)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants