Skip to content

🐛 FIX: Allow for dereferencing of saved instance state#191

Merged
chrisjsewell merged 3 commits intodevelopfrom
fix/save
Jan 13, 2021
Merged

🐛 FIX: Allow for dereferencing of saved instance state#191
chrisjsewell merged 3 commits intodevelopfrom
fix/save

Conversation

@chrisjsewell
Copy link
Copy Markdown
Member

As a toy example:

import plumpy

persister = plumpy.InMemoryPersister()

class PersistWorkChain(plumpy.WorkChain):

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.outline(
            cls.step1,
            cls.step2,
        )
        
    def step1(self):
        self.ctx.step = 1
        persister.save_checkpoint(self, 'step1')

    def step2(self):
        self.ctx.step = 2
        persister.save_checkpoint(self, 'step2')

        
workchain = PersistWorkChain()
workchain.execute()
{k: v['_context'] for k, v in persister._checkpoints[workchain.pid].items()}
{'step1': {'step': 2}, 'step2': {'step': 2}}

This obviously won't affect Persisters that serialize the data, but it is definitely a bug in this instance.

Strictly we should use copy.deepcopy, but this could run into issues if a value was not pickleable, and then I'm not sure how exceptions should be handled

@muhrin
Copy link
Copy Markdown
Collaborator

muhrin commented Jan 11, 2021

Good catch. Yes, this is a problem (for InMemoryPersister at least). There are (at least) two strategies one can adopt here:

  1. What you've implemented, as you say, ideally with a deepcopy as here we're just kicking the can down one reference along.
  2. In a 'proper' programming language one woud have things that enter the bundle be constant references (or pointers to constant data) and the contract with the persister is that it has to serialise or copy the bundle immediately after the save_instance_state call. Naturally, this limits the the assumptions a piece of code can make about a Bundle produced by a save_instance_state call.

Option 1. is almost certainly safer while option 2 is, in general, more performant (and potentially safer if we don't use a deepcopy). Option 2. would require a change to InMemoryPersister (and maybe others if they're relying on having a copy).

Do you have a preference? I'm fairly happy with either so long as there is just a sentence added to the save_instance_method docstring explaining the choice.

@chrisjsewell
Copy link
Copy Markdown
Member Author

Hmm, yeh I think (2) then, because I'm "scared" that deepcopy has the potential to introduce failures.
save_instance_state should probably be named something like get_instance_state, to not imply it is creating a "static" state, but that's way too much effort lol.

@chrisjsewell
Copy link
Copy Markdown
Member Author

@muhrin how about a7b8aca?

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jan 11, 2021

Codecov Report

Merging #191 (a7b8aca) into develop (16e7713) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #191      +/-   ##
===========================================
+ Coverage    90.94%   90.94%   +0.01%     
===========================================
  Files           22       22              
  Lines         2922     2924       +2     
===========================================
+ Hits          2657     2659       +2     
  Misses         265      265              
Impacted Files Coverage Δ
plumpy/mixins.py 91.31% <ø> (ø)
plumpy/persistence.py 93.94% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16e7713...a7b8aca. Read the comment docs.

Comment thread plumpy/mixins.py Outdated
@chrisjsewell chrisjsewell changed the title 🐛 FIX: copy context when saving instance state 🐛 FIX: Allow for dereferencing of saved instance state Jan 13, 2021
@chrisjsewell chrisjsewell merged commit 2001fd3 into develop Jan 13, 2021
@chrisjsewell chrisjsewell deleted the fix/save branch January 13, 2021 11:15
unkcpz pushed a commit to unkcpz/plumpy that referenced this pull request Dec 14, 2024
When saving the context of the workchain,
a reference to it is supplied which will change when the context is updated
(see `ContextMixin.save_instance_state`).
This is an issue for persisters that do not serialize the saved state,
in particular the `InMemoryPersister`, since the context will not remain static.

In this commit, we add a `dereference` option to the `Bundle.__init__`,
which will deepcopy the returned save data, to remove pointers.
agoscinski pushed a commit to agoscinski/plumpy that referenced this pull request Apr 13, 2026
When saving the context of the workchain,
a reference to it is supplied which will change when the context is updated
(see `ContextMixin.save_instance_state`).
This is an issue for persisters that do not serialize the saved state,
in particular the `InMemoryPersister`, since the context will not remain static.

In this commit, we add a `dereference` option to the `Bundle.__init__`,
which will deepcopy the returned save data, to remove pointers.
agoscinski pushed a commit that referenced this pull request Apr 13, 2026
When saving the context of the workchain,
a reference to it is supplied which will change when the context is updated
(see `ContextMixin.save_instance_state`).
This is an issue for persisters that do not serialize the saved state,
in particular the `InMemoryPersister`, since the context will not remain static.

In this commit, we add a `dereference` option to the `Bundle.__init__`,
which will deepcopy the returned save data, to remove pointers.
agoscinski pushed a commit that referenced this pull request Apr 13, 2026
When saving the context of the workchain,
a reference to it is supplied which will change when the context is updated
(see `ContextMixin.save_instance_state`).
This is an issue for persisters that do not serialize the saved state,
in particular the `InMemoryPersister`, since the context will not remain static.

In this commit, we add a `dereference` option to the `Bundle.__init__`,
which will deepcopy the returned save data, to remove pointers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants