Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sklearn dev tests #474

Merged
merged 10 commits into from Mar 4, 2019

Conversation

Projects
None yet
2 participants
@jrbourbeau
Copy link
Member

jrbourbeau commented Feb 28, 2019

There are a couple of recent updates to scikit-learn master that are causing the sklearn_dev CI build to fail (ref https://circleci.com/gh/dask/dask-ml/3634?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link). The recent changes are:

To fix these failures, this PR:

  • Adds a drop= keyword to the dask_ml.preprocessing.OneHotEncoder. By default drop=None, which maintains the current behavior. For now, a NotImplementedError error is raised when drop is not None. I also added a test to ensure NotImplementedError is raised appropriately.

  • I ported over the changes made to QuantileTransformer

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

jrbourbeau commented Feb 28, 2019

Well it looks like I got the tests to pass with scikit-learn master...but broke all the other builds. I'll have to go back and add appropriate checks for the sklearn version.

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

jrbourbeau commented Feb 28, 2019

Addressing the sklearn_dev DeprecationWarning in scikit-learn/scikit-learn#13342

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

jrbourbeau commented Mar 1, 2019

The QuantileTransformer update in scikit-learn results in a slightly different transform() output (ref the conversation starting at scikit-learn/scikit-learn#12827 (comment)). Because we test against multiple versions of sklearn, some of which include this update to QuantileTransformer and other that don't, we need to decide how we want to incorporate the update here. I see a couple of ways to do this.

First, we could keep two versions of the relevant method and use the appropriate one depending on which sklearn version is being used. Something like:

def _transform_col(self, X_col, quantiles, inverse):
    if SK_VERSION > packaging.version.parse("0.20.2"):
        return self._transform_col_new(X_col, quantiles, inverse)
    else:
        return self._transform_col_legacy(X_col, quantiles, inverse)

Second, we can port over the changes made to QuantileTransformer in sklearn and add a small allowed tolerance, e.g. atol=1e-7, when comparing the sklearn and dask_ml transformations here

assert_eq_ar(a.transform(X), b.transform(X))

Given that QuantileTransformer in dask_ml already differs from the sklearn implementation because it uses approximate quantiles, I have a slight preference for the second option.

@TomAugspurger do you have a preference here?

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

jrbourbeau commented Mar 3, 2019

I increased the number of samples in the test data for tests/preprocessing/test_data.py::TestQuantileTransformer to avoid the new UserWarning introduced in scikit-learn/scikit-learn#13333

@TomAugspurger

This comment has been minimized.

Copy link
Member

TomAugspurger commented Mar 3, 2019

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

jrbourbeau commented Mar 4, 2019

Yep, I think this should be good to merge!

@TomAugspurger TomAugspurger merged commit 4e5e78b into dask:master Mar 4, 2019

7 checks passed

ci/circleci: earliest_supported Your tests passed on CircleCI!
Details
ci/circleci: py27 Your tests passed on CircleCI!
Details
ci/circleci: py36 Your tests passed on CircleCI!
Details
ci/circleci: sklearn_dev Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.33%)
Details
codecov/project 95.37% (+0.04%) compared to 1058ab4
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jrbourbeau jrbourbeau deleted the jrbourbeau:fix-sklearn-dev branch Mar 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.