Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add Series.__iter__, closes #4871 #5071

Merged
merged 17 commits into from Aug 6, 2019

Conversation

@BlaneG
Copy link
Contributor

commented Jul 5, 2019

  • Tests added / passed
  • Passes black dask / flake8 dask

Here is the test output:

================================================= test session starts =================================================
platform win32 -- Python 3.7.3, pytest-4.6.3, py-1.8.0, pluggy-0.12.0 -- c:\users\b_gra\anaconda3\envs\dask\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\b_gra\Desktop\projects\2019\dask, inifile: setup.cfg
collected 6927 items / 2 errors / 14 skipped / 6911 selected

======================================================= ERRORS ========================================================
__________________________________ ERROR collecting dask/array/tests/test_xarray.py ___________________________________
dask\array\tests\test_xarray.py:6: in <module>
    xr = pytest.importorskip("xarray")
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\__init__.py:10: in <module>
    from .core.alignment import align, broadcast, broadcast_arrays
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\alignment.py:11: in <module>
    from .indexing import get_indexer_nd
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\indexing.py:11: in <module>
    from . import duck_array_ops, nputils, utils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\duck_array_ops.py:16: in <module>
    from . import dask_array_ops, dtypes, npcompat, nputils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\dask_array_ops.py:13: in <module>
    if LooseVersion(dask.__version__) <= LooseVersion('0.18.2'):
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:58: in __le__
    c = self._cmp(other)
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:337: in _cmp
    if self.version < other.version:
E   TypeError: '<' not supported between instances of 'str' and 'int'
__________________________________ ERROR collecting dask/array/tests/test_xarray.py ___________________________________
dask\array\tests\test_xarray.py:6: in <module>
    xr = pytest.importorskip("xarray")
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\__init__.py:10: in <module>
    from .core.alignment import align, broadcast, broadcast_arrays
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\alignment.py:11: in <module>
    from .indexing import get_indexer_nd
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\indexing.py:11: in <module>
    from . import duck_array_ops, nputils, utils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\duck_array_ops.py:16: in <module>
    from . import dask_array_ops, dtypes, npcompat, nputils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\dask_array_ops.py:13: in <module>
    if LooseVersion(dask.__version__) <= LooseVersion('0.18.2'):
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:58: in __le__
    c = self._cmp(other)
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:337: in _cmp
    if self.version < other.version:
E   TypeError: '<' not supported between instances of 'str' and 'int'
================================================== warnings summary ===================================================
c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\heapdict.py:11
  c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\heapdict.py:11: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    class heapdict(collections.MutableMapping):

c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\py\_path\local.py:701
  c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\py\_path\local.py:701: UserWarning: ShareDict has been deprecated in favor of HighLevelGraph and will be removed in future versions
    __import__(modname)

c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\pywt\_utils.py:6
  c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\pywt\_utils.py:6: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Iterable

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=============================================== short test summary info ===============================================
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\array\tests\test_cupy.py:10: could not import 'cupy': No module named 'cupy'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\array\tests\test_sparse.py:12: could not import 'sparse': No module named 'sparse'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\bag\tests\test_avro.py:7: could not import 'fastavro': No module named 'fastavro'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\bytes\tests\test_s3.py:13: could not import 's3fs': No module named 's3fs'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\dataframe\io\tests\test_orc.py:13: could not import 'pyarrow.orc': No module named 'pyarrow'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\tests\test_cache.py:10: could not import 'cachey': No module named 'cachey'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\tests\test_dot.py:11: could not import 'graphviz': No module named 'graphviz'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================== 14 skipped, 3 warnings, 2 error in 31.77 seconds ===================================
BlaneG added 2 commits Jul 5, 2019
@mrocklin
Copy link
Member

left a comment

Thanks @BlaneG !

Can I ask you for a small test for this functionality? If you search for def test.*iter in dask/dataframe/tests/test_dataframe.py you should find some examples of tests for similar functionality.

dask/dataframe/core.py Outdated Show resolved Hide resolved
BlaneG added 2 commits Jul 12, 2019
@BlaneG

This comment has been minimized.

Copy link
Contributor Author

commented Jul 14, 2019

@mrocklin test added.

@mrocklin
Copy link
Member

left a comment

Thanks @BlaneG !

Looks good. I made a small suggestion but other than that eveything looks fine. You should be able to just press the green button to commit the suggested change.

dask/dataframe/tests/test_dataframe.py Outdated Show resolved Hide resolved
delete unnecessary iter() calls
Co-Authored-By: Matthew Rocklin <mrocklin@gmail.com>
@TomAugspurger

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

@BlaneG the test failure looks related: https://travis-ci.org/dask/dask/jobs/558604032#L977 can you take a look?

pytest dask/dataframe/tests/test_dataframe.py -k test_rename_series_method
@BlaneG

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2019

@TomAugspurger, @mrocklin, I addressed the test failure. Do you want to review changes to the if/else statement I made in the rename function to address this test failure?

After adding the __iter__ method, the condition (is_list_like(index) and not is_dict_like(index) changed from False to True when the index was a dask series. I looked through the applications of Series.rename() in the repo, but couldn't figure out what this statement was trying to catch with the types of indexes that were being passed to series.rename, i.e.:

  • string (e.g. "y")
  • dict (e.g. {"B": "C"})
  • dask series
  • lambda functions (e.g. lambda: x: x**2)

Do you have some more insight on the if condition that I changed?

Also, maybe I should add an else statement with an assertion error when the if/elif conditions are not met?

@TomAugspurger

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

I don't really follow the changes to the if / elif conditions. Can you make it smaller? Perhaps just a and not isinstance(index, dd.Index) to the condition (IIUC)?

@BlaneG

This comment has been minimized.

Copy link
Contributor Author

commented Aug 5, 2019

That makes sense @TomAugspurger.


if is_scalar(index) or (is_list_like(index) and not is_dict_like(index)):

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Aug 5, 2019

Member

Should be something like not isinstance(index, Index). You might need this condition in addition to the is_list_like and is_dict_like checks.

This comment has been minimized.

Copy link
@BlaneG

BlaneG Aug 6, 2019

Author Contributor

I see what you mean...adding and not isinstance(index, Index) to the end of the if statement will avoid the error when passing a dd.Series index to rename. I was originally trying to make the conditions noted in the docstring explicit in the else statement: If dict-like or callable, the transformation is applied to the index.

BlaneG and others added 6 commits Aug 5, 2019
-revert to if/else conditions from cb4fb56,
-replace is_list_like(index) with isinstance(index, dd.Series)
@TomAugspurger

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

Merged master to fix the CI issues.

@TomAugspurger TomAugspurger merged commit 4c9bdb6 into dask:master Aug 6, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@TomAugspurger

This comment has been minimized.

Copy link
Member

commented Aug 6, 2019

Thanks @BlaneG!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.