MAINT/ENH: SaveModel based serialization #128

adriangb · 2020-11-09T01:41:36Z

Building upon the test added in #126.

This PR implements:

SaveModel based serialization.
General serialization for optimizers, metrics and losses.
Tests for serializing optimizers, metrics and losses.

Closes #126, closes #70, closes #125

…rs and metrics, reorganize serialization stuff into a module

codecov-io · 2020-11-09T01:48:52Z

Codecov Report

Merging #128 (fcf91ca) into develop (e419d8e) will increase coverage by 0.18%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #128      +/-   ##
===========================================
+ Coverage    99.52%   99.70%   +0.18%     
===========================================
  Files            5        6       +1     
  Lines          627      678      +51     
===========================================
+ Hits           624      676      +52     
+ Misses           3        2       -1

Impacted Files	Coverage Δ
scikeras/__init__.py	`100.00% <100.00%> (ø)`
scikeras/_saving_utils.py	`100.00% <100.00%> (ø)`
scikeras/_utils.py	`100.00% <100.00%> (ø)`
scikeras/wrappers.py	`99.44% <100.00%> (+0.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e419d8e...fcf91ca. Read the comment docs.

adriangb · 2020-11-09T23:57:09Z

Look like this is only working on tensorflow>=2.4.0rc1 at the moment. The RAM filesystem is not fully functional in 2.3.1, which is the latest stable release. This will probably have to wait until 2.4.0 then.

adriangb · 2020-11-10T15:30:48Z

So tests are mostly working, but I'm seeing SavedModels saved from Tensorflow V1 or Estimator (any version) cannot be loaded with node filters in CI runs now, and I'm not able to reproduce locally...

tests/test_serialization.py

stsievert · 2020-11-10T18:21:17Z

scikeras/__init__.py

+keras.Model.__reduce__ = saving_utils.pack_keras_model
+keras.losses.Loss.__reduce__ = saving_utils.pack_keras_loss
+keras.metrics.Metric.__reduce__ = saving_utils.pack_keras_metric
+keras.optimizers.Optimizer.__reduce__ = saving_utils.pack_keras_optimizer


Shouldn't these lines go in the definitions of Model/Loss/etc?

Do you mean in the class def within Tensorflow? Maybe I'm not understanding...

Oh, I see. These are Keras classes, not SciKeras classes. Then why not put this into a Keras RFC?

Right now, these things are very hacky. The optimizer hack is never going to make it past any sort of review (I'm not even sure it works for 100% of cases; it relies on private methods). Model.__reduce__ does a bunch of zip file stuff that is also pretty hacky, and really should be replaced with TF utilities that need to be implemented on the C++ backend (I think) as discussed in the TF PR (tensorflow/tensorflow#39609 (comment)). TL;DR this should be upstreamed, but it's not going to happen in the current form, and not anytime soon.

I think there's two levels for RFCs: the goal and how to achieve that goal.

This PR shows the goal is feasible and practical, and serves as a prototype (_restore_optimizer_weights is clearly a rough prototype). It's clear what needs to be implemented in a cleaner way. Inside TF, there's no downside to depending on private functions.

I agree that this PR shows that it's feasible, but a (similar) thing is already at tensorflow/tensorflow#39609.

I guess, let me just ask, what steps do you suggest be taken? Open a new RFC or PR the optimizer weights part specifically?

For this PR/SciKeras I'm thinking of reverting to the old model serialization code (so that it works on TF 2.2.0 and Windows) but keep the optimizer stuff (so that all of the good new tests you added pass).

I might open a new PR to TF after this PR is merged. Or maybe add it to that PR with an explaining comment if the implementation is simple.

If they want an RFC they'll call for it.

I have no problem opening a hacky PR in TF just for visibility. Just to be clear, we're talking specifically about the optimizer weights part right (not about Metrics or Model)? And specifically, about the hack to restore the weights for Adam and other optimizers that use slots, not about implementing __reduce__ for all optimizers?

we're talking specifically about the optimizer weights part right

My concern is Keras having a working serialization method. I think Models and optimizers are most important use cases (plus all metrics I can think of are stateless past some rolling average).

I'm not sure what the difference is between "restoring weights for Adam" and "implementing __reduce__." Do they both accomplish a working serialization method? Do they have a different implementation?

Yes, they are fundamentally two distinct things.

In this PR, we are fixing the weight restoration bug by implementing some hacks within Model.__reduce__. I reported the bug and showed how this hack works in tensorflow/tensorflow#44670 in hopes that that will make it easier for them to fix the bug. To actually fix the bug, one would have to implement an actual fix within the TF SaveModel ecosystem. I tried and failed to do that. Thus I don't see that there's any further PRs I can submit to TensorFlow.

As for implementing Model.__reduce__, I think we should update the existing PR with whatever implementation we land on here, sans the hack to restore optimizer weights since that is functionally unrelated.

I hope this makes sense! I know it's a convoluted topic.

adriangb · 2020-11-10T19:34:12Z

So tests are mostly working, but I'm seeing SavedModels saved from Tensorflow V1 or Estimator (any version) cannot be loaded with node filters in CI runs now, and I'm not able to reproduce locally...

Looks like the problem was something related to collisions of the temporary ram folders. Fixed by using a uuid instead of the id(object).

scikeras/_saving_utils.py

adriangb · 2020-11-11T05:52:33Z

scikeras/_saving_utils.py

+    with tempfile.TemporaryDirectory() as tmpdirname:
+        model.save(tmpdirname)
+        b = BytesIO()
+        with tarfile.open(fileobj=b, mode="w:gz") as tar:
+            tar.add(tmpdirname, arcname=os.path.sep)
+        b.seek(0)


@stsievert how do you feel about saving to a temp directory? It works (that's the best thing I can say about it). But it seems wrong to me to serialize to disk and then load from disk to memory.

Yeah, I wouldn't write to disk. Pythons io module has in-memory file pointers with StringIO and BytesIO. Would those work?

That is what we're using here. The pickle transfer happens as bytes. The issue is that Keras/TF can't save to BytesIO or StringIO. That is:

model.save("some/directory/on/disk") # works model.save(BytesIO()) # does not work

So what we are doing here is:

Save TF model to a temporary folder on disk.

Load from that temp dir into a BytesIO object.

Wrap the BytesIO object in Numpy and pickle that.

Why wouldn't pickle work to serialize a Keras model? I thought that's the point of tensorflow/tensorflow#39609.

This is the same implementation as tensorflow/tensorflow#39609. The only difference is that there I am using TF's ram:// filesystem instead of writing to disk (i.e. write to disk-like thing in RAM, then load that into BytesIO). But that doesn't work on Windows, hence why I am considering writing to actual disk here. We could stop supporting Windows in which case can implement things exactly like tensorflow/tensorflow#39609.

That said, obviously the goal is for this to eventually end up in TensorFlow itself via tensorflow/tensorflow#39609 or tensorflow/tensorflow#39609 + other PRs at which point we'd simply delete the implementation from SciKeras.

I think that's a good clarification.

The optimizer hack is technically separate from the move from the old Model serialization code (an assortment of Keras methods that is not officially supported by TF/Keras) to this code (which uses SaveModel as the backend).

The move to SaveModel was going to happen at some point because that's what Keras is going to have more support for going forward and that's what will eventually be upstreamed.

That said, from an implementation perspective, these two things are pretty intermingled, and since they're private implementation details anyway, it makes sense to make both changes in the same PR.

That said, from an implementation perspective, these two things are pretty intermingled, and since they're private implementation details anyway, it makes sense to make both changes in the same PR.

How complex will the implementation of optimzier.__reduce__ be? Or, how many LOC do you estimate will need to be added to implement optimizer.__reduce__ in tensorflow/tensorflow#39609? I'm not talking about the implementation (not the tests/etc).

If it's a pretty simple implementation, I think the optimizer.__reduce__ implementation should go in that PR with comment explaining the change. I'm not sure if the TF team will accept the addition – in the minutes (tensorflow/community#286 (comment)), Francois said:

makes sense for models, not so much for optimizers (since state needs to be maintained, we only serialize configs)

...

we should not implement at the moment for objects (callbacks, optimizers, metrics, losses with lambdas, etc.) where we only promise to save the config, not the internal state (as this breaks guarantees and API safety; state is usually retrieved via different API calls; results of these can be used to set state too)

I was referring to putting them together in this PR. But I agree, currently tensorflow/tensorflow#39609 is in a state of POC, so adding code that maybe won't make it into the final version in order to chart/scope the project in general is probably a good idea.

I'd be inclined to reach out to the TF team to hear what they have to say; they had some not-mild pushback on serializing optimizer state. I'd be inclined to reach out to the TF team after this PR is firmed up.

I think the pushback wasn't because "it shouldn't be done" but rather because "it's not implemented cleanly in TF, and we don't want a complex __reduce__ implementation". But yeah I'll follow up after this PR.

Do we both agree reviewing / merging this PR into SciKeras and tabling the TF upstreaming discussion for a later date?

scikeras/_saving_utils.py

…rs and metrics, reorganize serialization stuff into a module

* TST: compare to sklearn.neural_network (#155) * MAINT/ENH: SaveModel based serialization (#128) * Bump min TensorFlow version to 2.4.0 to accomodate other changes Co-authored-by: Scott <stsievert@users.noreply.github.com>

Co-authored-by: Scott <stsievert@users.noreply.github.com>

stsievert and others added 6 commits November 6, 2020 19:15

TST: ensure pickling doesn't affect training

d191f2d

Simplify random state

c99fed5

Test different optimizers

76e38ea

lint

5401194

whoops

10aec83

Add serialization tests, add serialization hacks for losses, optimize…

b908e11

…rs and metrics, reorganize serialization stuff into a module

adriangb changed the title ~~Savemodel serialization~~ SaveModel based serialization Nov 9, 2020

adriangb added 3 commits November 8, 2020 20:23

try to fix import

0f2ef0d

fix typo

9f46336

actually use the new implementation

dcaeaed

TADA: hacks

fb77537

always use nix path seperators for ram filesystem

a6ff431

stsievert reviewed Nov 10, 2020

View reviewed changes

adriangb added 2 commits November 10, 2020 12:41

move optimizer hack into seperate func

3c37e1a

try to use unique folders

c5b5b11

adriangb added 3 commits November 10, 2020 13:45

I guess ram filesystem may use os path sep?

c958947

try saving using nix paths again

b2323fa

test yet another hack for windows

1b86b7e

stsievert reviewed Nov 11, 2020

View reviewed changes

scikeras/_saving_utils.py Outdated Show resolved Hide resolved

scikeras/_saving_utils.py Show resolved Hide resolved

adriangb added 2 commits November 10, 2020 22:24

save to a real file?

00e724c

So we do need latest TF

d7c776a

adriangb force-pushed the savemodel-serialization branch from 7d8391f to d7c776a Compare November 11, 2020 05:18

Remove comment

a9e1ec7

adriangb commented Nov 11, 2020

View reviewed changes

Save to ram for nix, file for windows?

2c30289

adriangb commented Nov 11, 2020

View reviewed changes

scikeras/_saving_utils.py Show resolved Hide resolved

stsievert and others added 21 commits December 28, 2020 23:01

whoops

f73907b

Add serialization tests, add serialization hacks for losses, optimize…

d37332a

…rs and metrics, reorganize serialization stuff into a module

try to fix import

ec391e2

fix typo

ccd8452

actually use the new implementation

ae0f260

TADA: hacks

3123974

always use nix path seperators for ram filesystem

1e18596

move optimizer hack into seperate func

b2a6318

try to use unique folders

3491ffd

I guess ram filesystem may use os path sep?

350b01c

try saving using nix paths again

8c0e107

test yet another hack for windows

3718aaf

save to a real file?

8b0c93b

So we do need latest TF

faf1972

Remove comment

6ba8d12

Save to ram for nix, file for windows?

067aff8

add clarifying comment about MethodType

94eeb54

remove unrelated changes

40ed794

MAINT: Bump TF version to 2.4.0 (#149)

b3a297f

keep saving utils private

682a6b6

rebase onto develop

7acf35b

adriangb changed the base branch from master to develop December 29, 2020 05:07

avoid importing keras into scikeras.keras

fcf91ca

adriangb merged commit 8a5bd8d into develop Jan 16, 2021

adriangb deleted the savemodel-serialization branch January 16, 2021 06:21

adriangb added a commit that referenced this pull request Jan 16, 2021

MAINT/ENH: SaveModel based serialization (#128)

08f8239

Co-authored-by: Scott <stsievert@users.noreply.github.com>

This was referenced Jan 20, 2021

MAINT: move tf.keras.Model serialization to use SaveModel #125

Closed

BUG: can't grid search optimizers #70

Closed

TST: ensure serialization doesn't affect training #126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT/ENH: SaveModel based serialization #128

MAINT/ENH: SaveModel based serialization #128

adriangb commented Nov 9, 2020 •

edited

codecov-io commented Nov 9, 2020 •

edited

adriangb commented Nov 9, 2020

adriangb commented Nov 10, 2020 •

edited

stsievert Nov 10, 2020

adriangb Nov 10, 2020

stsievert Nov 10, 2020

adriangb Nov 10, 2020 •

edited

stsievert Nov 11, 2020

adriangb Nov 11, 2020 •

edited

stsievert Nov 11, 2020

adriangb Nov 11, 2020

stsievert Dec 7, 2020

adriangb Dec 7, 2020

adriangb commented Nov 10, 2020

adriangb Nov 11, 2020

stsievert Nov 11, 2020

adriangb Nov 11, 2020

stsievert Nov 11, 2020

adriangb Nov 11, 2020

adriangb Nov 11, 2020

stsievert Nov 11, 2020 •

edited

adriangb Nov 11, 2020

stsievert Nov 11, 2020

adriangb Nov 11, 2020 •

edited

MAINT/ENH: SaveModel based serialization #128

MAINT/ENH: SaveModel based serialization #128

Conversation

adriangb commented Nov 9, 2020 • edited

codecov-io commented Nov 9, 2020 • edited

Codecov Report

adriangb commented Nov 9, 2020

adriangb commented Nov 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb Nov 10, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb Nov 11, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Nov 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsievert Nov 11, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb Nov 11, 2020 • edited

Choose a reason for hiding this comment

adriangb commented Nov 9, 2020 •

edited

codecov-io commented Nov 9, 2020 •

edited

adriangb commented Nov 10, 2020 •

edited

adriangb Nov 10, 2020 •

edited

adriangb Nov 11, 2020 •

edited

stsievert Nov 11, 2020 •

edited

adriangb Nov 11, 2020 •

edited