Custom pickler #615

dmitriy-serdyuk · 2015-05-01T18:57:53Z

Via @bartvm in the comments on #685 (adding here for posterity):

[This PR] combines the previous Checkpoint extension (which pickled the main loop) and the Dump extension (which collected the values of the parameters and saved them to an NPZ file). It does so by making use of pickle's "persistent ID" support (which we already merged into Theano a while ago, Theano/Theano#2802).

In practice, it means that during pickling, each shared variable gets intercepted by the PersistentSharedVariableID class and is assigned a name (if its owned by a brick, the path to the brick is used, otherwise the name of the shared variable is used). The value of the shared variable is saved as an NPY file to a ZIP file, together with the pickled main loop (which contains string references to the names of the parameters). The final result is a single ZIP file with a pickled main loop and a series of NumPy arrays containing the values of the all the shared variables contained within the main loop.

Advantages over the old method:

No need to use both Checkpoint and Dump. This single method allows you to resume training seamlessly, while also allowing you to inspect and share parameter values across platforms; no unpickling needed.
Parameters are no longer silently overwritten if they had the same name and/or don't belong to a brick, as Dump does.
The entire serialization situation gets (a) less confusing and (b) documented (because the current docs are out of date, and still talk about using Dill).

bartvm · 2015-05-01T19:04:04Z

blocks/serialization.py

+            if obj.name:
+                if obj.name == 'pkl':
+                    ValueError("can't pickle shared variable with name `pkl`")
+                if hasattr(obj, 'tag'):


Don't shared variables always have the tag attribute? It could be an empty dictionary though if it's not annotated.

Thinking about it, this class should probably inherit from PersistentNdarrayID. Right now, when super gets called, doesn't the parent class (PersistentSharedVariableID) simply override the name you've just created?

Oh.. Funny, it was working because I made a wrong super call.

dmitriy-serdyuk · 2015-05-05T22:21:00Z

I suppose, we won't need DumpManager anymore, but load_parameters and save_parameters functions might be useful for some people.

I'd like to save LoadFromDump extension. In order to make it work with a new serialization I'll pickle log and iteration state separately.

Do you agree?

bartvm · 2015-05-05T22:40:38Z

We'd only need load_parameters, right? There's no real need for save_parameters, considering that it's exactly the same as dump, except perhaps that save_parameters could raise an error for non-unique names.

I would change LoadFromDump to something like Load or Resume then, because the reason we had "dump" was to make a distinction between "checkpointing" (pickling everything) and "dumping" (which saved the parameters, but neglected to save a lot of other things). Ideally we get rid of this distinction, so then there will just be Checkpoint, Resume, load_parameters, and potentially save_parameters as a very thin wrapper around dump (for symmetry's sake).

dmitriy-serdyuk · 2015-05-07T17:39:33Z

@bartvm , the problem was completely different and I solved it now.

But now blocks-continue doesn't work for the reason we discussed: objects from global name space cannot be pickled. And actually it didn't work with normal pickler.

I propose to remove blocks-continue and force people to use Load extension.

dmitriy-serdyuk · 2015-05-07T17:43:33Z

It is also possible to use a construction in a script like

if __name__ == '__main__':
    if continue:
        continue_training(path)
    ...

bartvm · 2015-05-07T17:53:27Z

So blocks-continue always had this limitation? Or did something change with this new approach?

The Load extension doesn't do the same thing though. The purpose of pickling is being able to completely reconstruct the main loop, including the state of extensions, etc. So it doesn't make sense to expect the user to first construct a MainLoop instance, because it would be entirely replaced by whatever the Load extension unpickles (and performing this replacement from within the extensions would be a mess).

If this limitation has always existed, I'd argue that we just need to document it clearly. I imagine that something like this should work fine, right?

# my_model.py
def some_helper_function():
   pass

# train_my_model.py
from my_model import some_helper_function
main_loop = MainLoop(...)  # With Checkpoint, depends on some_helper_function
main_loop.run()

Because when unpickling, it can load the function from my_model.

dmitriy-serdyuk · 2015-05-07T18:08:10Z

Yes, it always had this limitation.

Yes, this is going to work, but we cannot provide a general script which is able to continue any kind of experiment. So I think, the best we can provide is a function continue_training and explain in the documentation that it should be used like

# define or import objects from global namespace like you defined it before
if __name__ == '__main__':
    if continue:
        continue_training(path)
    # Construct main loop with Checkpoint extension

bartvm · 2015-05-07T18:24:12Z

Right, we could just make it a function instead of a script, but we should make it clear that using it within a script isn't necessarily the only way to use it. Technically speaking the user could have just created their model in the interpreter, without any sort of script anywhere. They should still be able to fire up a new interpreter then and call continue_training(path) then.

Anyway, that sounds like a reasonable solution to me. The one thing that I think would be very nice--but I have no idea how difficult this is--is being able to give a warning whenever the main loop depends on functions defined in the __main__ namespace. It could say something like:

WARNING: Main loop depends on the function `_array_tuple` in `__main__` namespace. 

Because of limitations to pickling, this means that you will not be able to resume
your model outside of a namespace containing this function. In other words, you
can only call `continue_training` from within this cript.

Looking at pickle.py maybe it would work by overwriting save_global to check if module is '__main__' or something?

dmitriy-serdyuk · 2015-05-11T19:34:02Z

@bartvm , is there any documentation for python 3 pickle internals? This one doesn't seem full.

dmitriy-serdyuk · 2015-05-11T21:06:37Z

How can I test, that arrays were not pickled in a regular way? Is there a better way then to check pkl file size?

bartvm · 2015-05-12T13:26:14Z

You could load the zip file using numpy.load and check whether the array is one of the keys there, right?

I don't know of any documentation on pickle's internals that is any better... It's pretty badly documented I'm afraid. You can look at custom picklers like joblib's to get a better idea of how to write custom ones. Is there anything in particular you're running into?

dmitriy-serdyuk · 2015-05-12T13:41:04Z

Yes, but I'd like to check that it was not saved twice. If something goes wrong in __call__ method, it may add a numpy file into zip, but not to return an id.

Seems, there is no save_global function in python 3 or its signature is different.

bartvm · 2015-05-12T13:47:53Z

I just opened up pickle.py from Python 3, and it seems like the class is called _Pickler and the method looks like def save_global(self, obj, name=None)

bartvm · 2015-05-13T19:12:52Z

blocks/serialization.py

+        if isinstance(obj, SharedVariable):
+            if obj.name:
+                if obj.name == 'pkl':
+                    ValueError("can't pickle shared variable with name `pkl`")


Forgot the raise statement here.

bartvm · 2015-05-13T19:25:25Z

I think this is starting to look pretty good, and it'd be great if some other people could have a look, because it's a pretty big change (@rizar, @dwf).

It replaces the Dump and Checkpoint extensions by a custom pickler (an extension of the ones we recently merged into Theano as part of Theano/Theano#2802). The result is a single zip archive that contains the pickled main loop (as a file called pkl) and all the shared variables stored as NPY files. The names of these arrays are either the path leading to them (numbered, if need be e.g. mlp/linear/W_0, mlp/linear/W_1) or the name of the shared variable (also numbered if necessary).

The advantage is that it effectively does what Dump does, but without needing a Model instance, while still pickling the entire main loop (so the state of e.g. extensions is maintained). The resulting zip archive can be loaded using NumPy's numpy.load function if necessary in order to inspect parameters without unpickling.

dmitriy-serdyuk · 2015-05-13T19:30:09Z

Maybe I need to rebase it because I made a lot of stupid commits changing things back and forth.

bartvm · 2015-05-13T19:31:25Z

Yeah, a rebase would be nice eventually. Also, I made a comment about a missing raise statement.

bartvm · 2015-05-13T19:33:50Z

Oh, and we'll need a bunch of tests I guess.

bartvm · 2015-05-13T19:37:09Z

The main test I'd like to add is to make sure that #480 is fixed for both brick parameters as well as normal shared variables.

… pickler helper instead

Import operator

Removed unused imports

dmitriy-serdyuk · 2015-05-25T18:42:24Z

For my log it decreases pickling time from 5.3s to 1.4s while dumping without a persistent id with cPickle takes 238ms and 946ms with pickle.

Conflicts: tests/test_dump.py tests/test_examples.py

dmitriy-serdyuk · 2015-05-25T18:58:42Z

How can I create a pull request to two repositories simultaneously? Tests fail for blocks-examples repository.

bartvm · 2015-05-25T19:03:30Z

The way I've done it is by making a pull request to blocks-examples, and then changing this PR's .travis.yml to do git clone of your branch instead. Then if everything works, and things are ready to merge, we just change the git clone line back just before doing so.

bartvm · 2015-05-25T19:48:13Z

Great, so two things left to do then:

Undo deleting blocks-continue, if people like @rizar use it we might as well keep it, but knowing that sometimes it won't work.
Make dump optionally use cPickle by doing something like

def dump(obj, file_handler, protocol=DEFAULT_PROTOCOL,
         persistent_id=PersistentParameterID, use_cpickle=False):
    with closing(zipfile.ZipFile(file_handler, 'w', zipfile.ZIP_DEFLATED,
                                 allowZip64=True)) as zip_file:
        def func(f):
            if use_cpickle:
                p = cPickle.Pickler(f, protocol=protocol)
            else:
                p = PicklerWithWarning(f, protocol=protocol)
            p.persistent_id = persistent_id(zip_file)
            p.dump(obj)
        pkl_utils.zipadd(func, zip_file, 'pkl')

rizar · 2015-05-26T13:54:55Z

On my log in Python3 the slowdown from switching from pickle.dump and pickle.load to the functions from this PR is now roughly 3 times. This is much less than before, good job, but still quite a lot of overhead. I will try it with Python2 as well when you guys support cPickle there.

bartvm · 2015-05-29T15:56:18Z

Oops, my bad on the formatting errors, but at least tests are passing again. We need a bunch of tests now, because I realised that the previous PR was actually not storing a lot of things in the ZIP file (and the initial one was storing too many things...).

@rizar You might want to re-run your benchmarks, and test the support for cPickle.

bartvm · 2015-05-29T16:52:38Z

Needs to be rebased on top of #680 when that is merged

bartvm · 2015-05-29T17:23:46Z

#680 was merged, so could you rebase @dmitriy-serdyuk? Earlier the doctests were failing

bartvm · 2015-05-29T17:27:03Z

examples/sqrt.py

@@ -61,14 +61,14 @@ def main(save_to, num_batches, continue_=False):
            step_rule=Scale(learning_rate=0.001)),
        get_data_stream(range(100)),
        model=Model(cost),
-        extensions=([LoadFromDump(save_to)] if continue_ else []) +
+        extensions=([Load(save_to)] if continue_ else []) +


Oops, missed this line, needs to be deleted

bartvm · 2015-05-29T18:00:45Z

blocks/serialization.py

+                name = '{}.{}'.format(
+                    BRICK_DELIMITER.join([brick.name for brick in
+                                          get_brick(obj).get_unique_path()]),
+                    [obj.name]


@dmitriy-serdyuk The current error is my bad, this should be obj.name instead of [obj.name]

bartvm · 2015-05-29T18:04:52Z

So there's only one thing left to do that I can think of, which is changing Load to use numpy.load instead of unpickling when loading the parameters, and making the loading of the iteration state and log optional. This will allow someone to load the parameters from a model, even if unpickling fails completely.

Update Load extension, small fixes

bartvm reviewed May 1, 2015
View reviewed changes

This was referenced May 13, 2015

Custom pickler test #631

Closed

Custom picklers #579

Closed

bartvm reviewed May 13, 2015
View reviewed changes

bartvm and others added 9 commits May 13, 2015 16:09

Custom picklers

ccd3641

Remove custom CudaNdarray persistence and rely on Theano's registered…

c5ed70e

… pickler helper instead

Used dump/load from theano

c0a2259

Import operator

Refactored serialization

cc90f83

Changed Checkout to use dump/load

88a441a

Used dump/load in DumpManager

1e97f8f

Multiple fixes

5941e17

Returned secure dump

3f87474

Removed unused imports

Formatting

f0d3232

Do not use persistent id for numpy arrays

0e4bee5

Merge branch 'master' into custom_pickler

5c4a9e2

Conflicts: tests/test_dump.py tests/test_examples.py

dmitriy-serdyuk mentioned this pull request May 25, 2015

Used custom pickler mila-iqia/blocks-examples#4

Merged

Using a different branch of blocks-examples

8612e36

Simplify pickler, cPickle support, add blocks-continue back

46bc554

bartvm reviewed May 29, 2015
View reviewed changes

bartvm added 3 commits May 29, 2015 13:32

Move dump

79a3c25

Fix docs

c64301a

Update docstring

9219c78

dmitriy-serdyuk force-pushed the custom_pickler branch from fca03dc to 9219c78 Compare May 29, 2015 17:33

Fixed sqrt example, deleted test load parameters

928ee34

bartvm reviewed May 29, 2015
View reviewed changes

Update Load extension, small fixes

cfeb77c

bartvm mentioned this pull request May 31, 2015

Custom pickler #2 #685

Merged

Merge pull request #4 from mila-udem/last_fixes

4e6c34a

Update Load extension, small fixes

bartvm closed this Jun 7, 2015

This was referenced Jun 28, 2015

Checkpoint fix jbornschein/draw#18

Merged

new pickler causes RuntimeError when pickling #737

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom pickler #615

Custom pickler #615

dmitriy-serdyuk commented May 1, 2015

bartvm May 1, 2015

dmitriy-serdyuk May 1, 2015

dmitriy-serdyuk commented May 5, 2015

bartvm commented May 5, 2015

dmitriy-serdyuk commented May 7, 2015

dmitriy-serdyuk commented May 7, 2015

bartvm commented May 7, 2015

dmitriy-serdyuk commented May 7, 2015

bartvm commented May 7, 2015

dmitriy-serdyuk commented May 11, 2015

dmitriy-serdyuk commented May 11, 2015

bartvm commented May 12, 2015

dmitriy-serdyuk commented May 12, 2015

bartvm commented May 12, 2015

bartvm May 13, 2015

bartvm commented May 13, 2015

dmitriy-serdyuk commented May 13, 2015

bartvm commented May 13, 2015

bartvm commented May 13, 2015

bartvm commented May 13, 2015

dmitriy-serdyuk commented May 25, 2015

dmitriy-serdyuk commented May 25, 2015

bartvm commented May 25, 2015

bartvm commented May 25, 2015

rizar commented May 26, 2015

bartvm commented May 29, 2015

bartvm commented May 29, 2015

bartvm commented May 29, 2015

bartvm May 29, 2015

bartvm May 29, 2015

bartvm commented May 29, 2015

Custom pickler #615

Custom pickler #615

Conversation

dmitriy-serdyuk commented May 1, 2015

bartvm May 1, 2015

Choose a reason for hiding this comment

dmitriy-serdyuk May 1, 2015

Choose a reason for hiding this comment

dmitriy-serdyuk commented May 5, 2015

bartvm commented May 5, 2015

dmitriy-serdyuk commented May 7, 2015

dmitriy-serdyuk commented May 7, 2015

bartvm commented May 7, 2015

dmitriy-serdyuk commented May 7, 2015

bartvm commented May 7, 2015

dmitriy-serdyuk commented May 11, 2015

dmitriy-serdyuk commented May 11, 2015

bartvm commented May 12, 2015

dmitriy-serdyuk commented May 12, 2015

bartvm commented May 12, 2015

bartvm May 13, 2015

Choose a reason for hiding this comment

bartvm commented May 13, 2015

dmitriy-serdyuk commented May 13, 2015

bartvm commented May 13, 2015

bartvm commented May 13, 2015

bartvm commented May 13, 2015

dmitriy-serdyuk commented May 25, 2015

dmitriy-serdyuk commented May 25, 2015

bartvm commented May 25, 2015

bartvm commented May 25, 2015

rizar commented May 26, 2015

bartvm commented May 29, 2015

bartvm commented May 29, 2015

bartvm commented May 29, 2015

bartvm May 29, 2015

Choose a reason for hiding this comment

bartvm May 29, 2015

Choose a reason for hiding this comment

bartvm commented May 29, 2015