Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Which We Add Support for Sample Weights to MLPs #75

Merged
merged 3 commits into from
Jan 26, 2018
Merged

In Which We Add Support for Sample Weights to MLPs #75

merged 3 commits into from
Jan 26, 2018

Conversation

waltaskew
Copy link
Contributor

@waltaskew waltaskew commented Jan 25, 2018

  • support sample weights in MLPClassifier
  • support sample weights in MLPRegressor
  • add tests ensuring sample weights are handled reasonably

@mheilman

@waltaskew waltaskew changed the title In Which We Add Support for Sample Weights to MLP Classifier WIP In Which We Add Support for Sample Weights to MLP Classifier Jan 25, 2018
@waltaskew waltaskew changed the title WIP In Which We Add Support for Sample Weights to MLP Classifier [DSRD-575] WIP In Which We Add Support for Sample Weights to MLP Classifier Jan 25, 2018
@beckermr
Copy link
Contributor

A quick comment here. We should normalize the losses by the sum of the sample weights in order to keep them on a decent numerical scale.

@@ -129,19 +129,23 @@ def _init_model_objective_fn(self, t):
if self.multilabel_:
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(
logits=t, labels=tf.cast(self.input_targets_, np.float32))
cross_entropy = tf.multiply(cross_entropy, self._sample_weight)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this broadcast correctly?

For multilabel, I think the shape of of cross_entropy is [n_examples, n_classes] and the weights are [n_examples]. You might need to add a trailing dimensions to get this to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not broadcast correctly. The fact that the tests passed makes me worry how well we're testing the multi-label case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Sam and I worked on those tests a lot but I am not going to claim they are bullet proof.

@mheilman mheilman changed the title [DSRD-575] WIP In Which We Add Support for Sample Weights to MLP Classifier WIP In Which We Add Support for Sample Weights to MLP Classifier Jan 25, 2018
- support sample weights in MLPClassifier
- support sampel weights in MLPRegressor
- add tests ensuring sample weights are handled reasonably
@waltaskew waltaskew changed the title WIP In Which We Add Support for Sample Weights to MLP Classifier In Which We Add Support for Sample Weights to MLP Classifier Jan 25, 2018
@waltaskew waltaskew changed the title In Which We Add Support for Sample Weights to MLP Classifier In Which We Add Support for Sample Weights to MLPs Jan 25, 2018
Copy link
Contributor

@mheilman mheilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also please add this to the changelog in this PR?

sample_weight : numpy array of shape [n_samples,]
Per-sample weights. Re-scale the loss per sample.
Higher weights force the estimator to put more emphasis
on these samples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth noting here or in the class docstring that the sample weights are normalized per batch. i understand the argument for this, and i think that as the batch size approaches the size of the dataset, it's equivalent to normalizing the sample weights once ahead of time, but this is altering the objective function a bit, so i imagine that it could lead to odd behavior in edge cases. in practice, it probably doesn't matter either way, so i'm fine with keeping it this way, but it'd be good to document in case a user expects not normalizing.

tf.where(y_finite, self._zeros, cross_entropy))

# reshape to broadcast multiplication along cross_entropy
sample_weight = tf.reshape(self._sample_weight, (-1, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this and the next line of code turn the original (batch_size,) sample weights array into a (batch, n_classes_) one with the same value across a row? If so, could you add a comment to that effect?

Also, we're not actually using broadcasting here, right? That comment might be confusing, unless I'm misunderstanding something here.

assert_clfs_almost_equal(clf1, clf2)

# Fitting twice with half sample-weights should result
# in same classifier as fitting once with full weights.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... i don't think running for twice as many epochs with sample weights scaled by 0.5 will lead to the same model for non-vanilla SGD. Also, shouldn't the models be the same after a single fit of clf2 because of the batch sample weight normalization?

Does this fail if partial_fit is only called once below for clf2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think this test makes sense because we're normalizing by the sum of the weights.
My original thinking was to shrink the loss with the weights to force the training to take smaller steps, but normalization means this isn't happening -- I'll remove this test.


clf1 = make_classifier_func().fit(X_train[ind], y_train[ind])
clf1 = make_classifier_func().fit(
X_train, y_train, sample_weight=sample_weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this fail if you don't include the sample weights here (after fixing the copy-paste error)?

sample_weight = np.bincount(ind, minlength=X_train.shape[0])

clf1 = make_classifier_func().fit(X_train[ind], y_train[ind])
clf1 = make_classifier_func().fit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-paste error? i think this should be clf2.

why didn't that cause a test failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't cause a failure because all the clf1 and clf2s are learning the same model. It's comparing against the wrong clf2, but the wrong clf2 is still trained on the same data.

The window for determining if two classifiers make the same prediction may be too large, though.

Copy link
Contributor

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment here is that the tests seem a bit excessive. It seems sufficient to test that the weights are handled properly in a simple technical sense, as opposed to testing our assumptions about how the weights should work. To this end, only the last test in the assert function in the test suite seems relevant.

An even simpler test which just checks some of the computations against similar versions in numpy might be easier to write and more direct.

@waltaskew
Copy link
Contributor Author

waltaskew commented Jan 26, 2018

My main comment here is that the tests seem a bit excessive. It seems sufficient to test that the weights are handled properly in a simple technical sense, as opposed to testing our assumptions about how the weights should work. To this end, only the last test in the assert function in the test suite seems relevant.

I'd like to keep the test asserting that not passing sample weights give the same result as passing sample weights of all 1s, because that seems like a reasonable invariant that could catch errors in handling absent sample weights.

I'm definitely removing the two partial fits test, which leaves the test asserting that duplicated inputs are equivalent to up-weighted inputs. I don't have particular allegiances to that test.

@waltaskew
Copy link
Contributor Author

I improved the docs and comments and removed the brittle tests. Anything else?

- better docs
- better comments
- fewer brittle tests
- updated CHANGELOG
@beckermr beckermr self-assigned this Jan 26, 2018
CHANGELOG.md Outdated
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## Unreleased

- Added support for the `sample_weight` keyword argument to the `fit`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put this under an " ### Added" section please?

Per-sample weights. Re-scale the loss per sample.
Higher weights force the estimator to put more emphasis
on these samples.
Sample weights are normalized per-batch. As the batch size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just "Sample weights are normalized per-batch." is fine here.

Per-sample weights. Re-scale the loss per sample.
Higher weights force the estimator to put more emphasis
on these samples.
Sample weights are normalized per-batch. As the batch size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

@beckermr
Copy link
Contributor

LGTM from me modulo mike’s comments. Thanks a bunch for doing this Walt! (Hammer) (huge fan)

@waltaskew
Copy link
Contributor Author

CHANGELOG changed and docstrings shortened, anything else?

Copy link
Contributor

@mheilman mheilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@waltaskew waltaskew merged commit 3c90d1e into civisanalytics:master Jan 26, 2018
@waltaskew waltaskew deleted the support-sample-weights branch January 26, 2018 18:26
@beckermr beckermr mentioned this pull request Jan 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants