Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dependencies to fix ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' errors #1540

Merged
merged 6 commits into from Dec 14, 2020

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Dec 10, 2020

Closes #1519

Repro steps:

  1. Create new virtualenv with no packages installed. I used python 3.7.6
  2. Install the following:
    numpy>=1.19.1
    pandas>1.1.0
    scikit-learn==0.23
    scikit-optimize==0.8
  3. Run pip install -r requirements.txt and install pytest
  4. Run tests

This should raise the following:

================================================================= ERRORS =================================================================
_____________________________________________________ ERROR collecting test session ______________________________________________________
../../.pyenv/versions/3.7.6/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:953: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:219: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:953: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:219: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:967: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:677: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:728: in exec_module
    ???
<frozen importlib._bootstrap>:219: in _call_with_frames_removed
    ???
evalml/__init__.py:9: in <module>
    import evalml.pipelines
evalml/pipelines/__init__.py:1: in <module>
    from .components import (
evalml/pipelines/components/__init__.py:2: in <module>
    from .estimators import (
evalml/pipelines/components/estimators/__init__.py:2: in <module>
    from .classifiers import (LogisticRegressionClassifier,
evalml/pipelines/components/estimators/classifiers/__init__.py:1: in <module>
    from .logistic_regression import LogisticRegressionClassifier
evalml/pipelines/components/estimators/classifiers/logistic_regression.py:3: in <module>
    from skopt.space import Real
../../.pyenv/versions/3.7.6/lib/python3.7/site-packages/skopt/__init__.py:55: in <module>
    from .searchcv import BayesSearchCV
../../.pyenv/versions/3.7.6/lib/python3.7/site-packages/skopt/searchcv.py:16: in <module>
    from sklearn.utils.fixes import MaskedArray
E   ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' (/Users/angela.lin/.pyenv/versions/3.7.6/lib/python3.7/site-packages/sklearn/utils/fixes.py)
======================================================== short test summary info =========================================================
ERROR  - ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes' (/Users/angela.lin/.pyenv/versions/3.7.6/lib/python3....
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================ 1 error in 3.06s ============================================================

Upgrading scikit-learn==0.23.1 and scikit-optimize==0.8.1 fixes this :)

--

This PR also tracks updating numpy and pandas versions. Since we rely on woodwork, we'll get errors (see issue) with our current minimal versions.

@codecov
Copy link

codecov bot commented Dec 10, 2020

Codecov Report

Merging #1540 (6bc73da) into main (d77e854) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1540   +/-   ##
=======================================
  Coverage   100.0%   100.0%           
=======================================
  Files         232      232           
  Lines       16639    16639           
=======================================
  Hits        16631    16631           
  Misses          8        8           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d77e854...6bc73da. Read the comment docs.

@angela97lin angela97lin self-assigned this Dec 10, 2020
@angela97lin angela97lin added this to the December 2020 milestone Dec 10, 2020
@angela97lin angela97lin marked this pull request as ready for review December 10, 2020 16:45
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! Adding a warning to our release notes would be helpful especially since we're jumping quite a bit on our min versions. Maybe just like how we do breaking changes.

Comment on lines +1 to +2
numpy>=1.19.1
pandas>=1.1.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woodwork/Featuretool minimal requirements

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @angela97lin !

core-requirements.txt Show resolved Hide resolved
Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Agree with Freddy, if numpy and pandas are already on WW/FT requirements, do we need them on our requirements.txt as well?

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin looks good! I left some questions. Could we try not updating the min numpy version and see what happens?

I'm also curious why pandas 1.1.0 is required for woodwork (?) -- let's follow up with them on that. I wonder if they can lower the min pandas version. The more permissive we can make these ranges, the more likely people will be to use our stuff heh (certainly at the risk of bugs)

core-requirements.txt Show resolved Hide resolved
@@ -1,8 +1,8 @@
numpy>=1.16.4
pandas>=0.25.0
numpy>=1.19.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin what happens if you keep this PR as-is, but delete this change so that older numpy versions are still allowed? Do the other changes still solve the problem, or is updating our min numpy also required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry I posted in the original issue, but we get this error if we try to specifically install a version less than these because of Woodwork requirements:

ERROR: featuretools 0.22.0 has requirement pandas!=1.1.0,!=1.1.1,<2.0.0,>=0.24.1, but you'll have pandas 1.1.0 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
woodwork 0.0.6 requires numpy>=1.19.1, but you have numpy 1.16.4 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
woodwork 0.0.6 requires pandas>=1.1.0, but you have pandas 0.25.0 which is incompatible.

core-requirements.txt Show resolved Hide resolved
scipy>=1.2.1
scikit-learn>=0.23
scikit-optimize>=0.8
scikit-learn>=0.23.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin so the problem was specifically in sklearn 0.23.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin so the problem was specifically in sklearn 0.23.0?

Combination of scikit-learn 0.23 and scikit-optimize 0.8 and numpy: scikit-optimize/scikit-optimize#902

@angela97lin
Copy link
Contributor Author

@dsherry Could be wrong but I don't think Woodwork can handle 0.25, because they rely so heavily on the new nullable structures (pd.NA, the new nullable float that is coming soon or was recently introduced in Pandas 1.X 🤔

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin thanks for the details :) I filed an issue in woodwork to track lowering the numpy/pandas limits to match featuretools and evalml.

Yes I see that pd.NA was first added in pandas 1.0.0, so perhaps the pandas version needs to stay at something like pandas>=1.1.0. I still think the numpy version limit in woodwork is too high and that we can lower it.

Let's merge this PR, and once woodwork puts out a release with lower pip reqs for numpy and/or pandas, we can update.

@angela97lin angela97lin merged commit 8a4e763 into main Dec 14, 2020
@angela97lin angela97lin deleted the 1519_update_dependencies branch December 14, 2020 17:43
@dsherry dsherry mentioned this pull request Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update scikit-optimize requirements to be >=0.8.1 and scikit-learn>=0.23.1
5 participants