Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: Hardcode Keras params #47

Merged
merged 13 commits into from
Aug 14, 2020
Merged

REF: Hardcode Keras params #47

merged 13 commits into from
Aug 14, 2020

Conversation

adriangb
Copy link
Owner

@adriangb adriangb commented Aug 12, 2020

This implements hardcoding of common Keras parameters in BaseWrapper.__init__ as discussed in #37.

The parameters chosen come from Model.compile, Model.predict and Model.fit. Parameters were not hardcoded if:

  • They don't make sense for Scikit-Learn/Numpy arrays (for example, steps_per_epoch).
  • Should be fed to fit directly because they depend on the data (ex: sample_weights, class_weights). I think validation_data also falls into this category, but I am not sure if this makes much sense to implement since AFAIK this is not used anywhere else in Scikit-Learn. I am going to leave this for another PR where I'll also deprectate fit's **kwargs.

@codecov-commenter
Copy link

codecov-commenter commented Aug 12, 2020

Codecov Report

Merging #47 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #47      +/-   ##
==========================================
- Coverage   99.53%   99.51%   -0.02%     
==========================================
  Files           3        3              
  Lines         426      414      -12     
==========================================
- Hits          424      412      -12     
  Misses          2        2              
Impacted Files Coverage Δ
scikeras/wrappers.py 99.70% <100.00%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 367f606...b0d2d20. Read the comment docs.

Comment on lines -1227 to -1242
class TestUnsetParameter:
"""Tests for appropriate error on unfitted models.
"""

@pytest.mark.filterwarnings("ignore::FutureWarning")
def test_unset_input_parameter(self):
class ClassBuildFnClf(wrappers.KerasClassifier):
def __init__(self, input_param):
# does not set input_param
super().__init__()

def _keras_build_fn(self, hidden_dim):
return build_fn_clf(hidden_dim)

with pytest.raises(RuntimeError):
ClassBuildFnClf(input_param=10)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this check and test. It is really a redundant feature that no other estimators have. It was much more helpful previously when the interface was less defined.

Comment on lines +641 to +645
return (
k
for k in self.__dict__
if not k.endswith("_") and not k.startswith("_")
)
Copy link
Owner Author

@adriangb adriangb Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is borrowed from Skorch. For where we are headed, I think this makes more sense than storing _sk_params. It'll be up to users to make sure they don't set any params without _ that they don't want to show up in get_params.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added k.startswith("_") to accomodate parameters like self._random_state. So the rule will be that all model paramers (i.e. the __init__ ones) cannot start/end with _, at least for now. We'll have to see how that plays in with my overwriting proposal in #47 (comment)

self.initial_epoch = initial_epoch

# Unpack kwargs
vars(self).update(**kwargs)
Copy link
Owner Author

@adriangb adriangb Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also borrowed from Skorch. Much cleaner with the new _get_param_names implementation.

@adriangb
Copy link
Owner Author

tagging @stsievert in case you want to take a look

@adriangb
Copy link
Owner Author

@stsievert do you think we are good to go here or is there anything else you see that should be changed?

Copy link
Collaborator

@stsievert stsievert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Will returning un-compiled models from build_fn be supported in the next release as per #50 (comment)? Otherwise I'm not sure some parameters here make any sense (e.g, loss, optimizer).

I've glanced at the Keras.fit API. There might be some missing parameters:

  • use_multiprocessing, workers. This might be important for performance, though I suspect threading will have the same or higher performance because this loading data is IO bound. It might be good to leave it to the user though.
    • I think Dask will require use_multiprocessing=False, but haven't verified.
  • steps_per_epoch, sample_weight, class_weight.

verbose=1,
steps=None,
callbacks=None,
epochs=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to max_epochs for the fit implementation. I think partial_fit should always have epochs=1.

epochs=1,
validation_split=0.0,
shuffle=True,
initial_epoch=0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only makes sense if a pretrained model is passed to build_fn. I'd be okay removing support for this, especially if max_epochs is supported.

optimizer="rmsprop",
loss=None,
metrics=None,
run_eagerly=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does SciKeras have tests for run_eagerly in [True, False]? It seems run_eagerly will affect the serialization.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not. Maybe I'll just remove that parameter for now, I don't want to go down that rabbit hole in this PR...

*,
random_state=None,
optimizer="rmsprop",
loss=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are optimizer and loss used? Are they passed to the compile method?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands right now, it would be up to the user to declare their build_fn to use them. I.e.:

def build_fn(optimizer=..., loss=...):
    model = Model()
    ...
    model.compile(loss=loss, optimizer=optimizer)

@adriangb
Copy link
Owner Author

adriangb commented Aug 13, 2020

There might be some missing parameters

The reasons they are not being included are:

  • steps_per_epoch: the docs say This argument is not supported with array inputs so I think it only makes sense for tf.data, which we do not support here.
  • sample_weight: this is/will still be handled in fit like Scikit-Learn does.
  • class_weight: I think this one requires a bit of massaging to make Scikit-Learn's interpretation of this (ref) match up with Keras' (ref). I think Keras maps "integer class indices" to weights while Scikit-Learn maps "class labels" to weights. I'd like to leave this for a separate PR.
  • workers and use_multiprocessing: Keras docs say "Used for generator or keras.utils.Sequence input only", so it does not apply here.

Regarding max_epochs/epochs.
I think what Keras does is that it runs for epochs-initial_epoch epochs. Are you proposing that instead we just have a single parameter max_epochs and always start at epoch 0?
And separately, you are suggesting that partial_fit override whatever max_epochs was set to and pass epochs=1 to model_.fit?

@stsievert
Copy link
Collaborator

stsievert commented Aug 13, 2020

And separately, you are suggesting that partial_fit override whatever max_epochs was set to and pass epochs=1 to model_.fit?

Yes. Scikit-Learn's SGDClassifier/SGDRegressor/MLPClassifier do the same thing. Here's the docstring for MLPClassifier.partial_fit: "Update the model with a single iteration over the given data."

I think what Keras does is that it runs for epochs - initial_epoch epochs. Are you proposing that instead we just have a single parameter max_epochs and always start at epoch 0?

I think the difference is semantic. I don't think it's super important. max_epochs provides a little more leeway to perform some early stopping. Sticking with the Keras API isn't a bad choice I think (although I like max_epochs more; there's only one parameter and it more leeway for early stopping).

I mention it because Scikit-Learn calls this parameter max_iter. In the documentation they always mention max_iter is "the maximum number of passes over the training data (aka epochs)."

@adriangb
Copy link
Owner Author

adriangb commented Aug 13, 2020

Hmm okay. I think what I'd like to do is keep this PR smallish and leave the following parameters for a future PR (that can also contain any logic related to their handling, like what you mention about fit and epochs).

  • class_weight
  • epochs/initial_epoch/max_epochs
  • run_eagerly
    So I'll remove them from this PR.

@adriangb adriangb merged commit 6da5ec3 into master Aug 14, 2020
@adriangb adriangb deleted the hardcode-params branch August 14, 2020 00:16
@stsievert stsievert mentioned this pull request Aug 27, 2020
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants