Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add Series.__iter__, closes #4871 #5071

Merged
merged 17 commits into from Aug 6, 2019
Merged

ENH: add Series.__iter__, closes #4871 #5071

merged 17 commits into from Aug 6, 2019

Conversation

BlaneG
Copy link
Contributor

@BlaneG BlaneG commented Jul 5, 2019

  • Tests added / passed
  • Passes black dask / flake8 dask

Here is the test output:

================================================= test session starts =================================================
platform win32 -- Python 3.7.3, pytest-4.6.3, py-1.8.0, pluggy-0.12.0 -- c:\users\b_gra\anaconda3\envs\dask\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\b_gra\Desktop\projects\2019\dask, inifile: setup.cfg
collected 6927 items / 2 errors / 14 skipped / 6911 selected

======================================================= ERRORS ========================================================
__________________________________ ERROR collecting dask/array/tests/test_xarray.py ___________________________________
dask\array\tests\test_xarray.py:6: in <module>
    xr = pytest.importorskip("xarray")
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\__init__.py:10: in <module>
    from .core.alignment import align, broadcast, broadcast_arrays
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\alignment.py:11: in <module>
    from .indexing import get_indexer_nd
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\indexing.py:11: in <module>
    from . import duck_array_ops, nputils, utils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\duck_array_ops.py:16: in <module>
    from . import dask_array_ops, dtypes, npcompat, nputils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\dask_array_ops.py:13: in <module>
    if LooseVersion(dask.__version__) <= LooseVersion('0.18.2'):
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:58: in __le__
    c = self._cmp(other)
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:337: in _cmp
    if self.version < other.version:
E   TypeError: '<' not supported between instances of 'str' and 'int'
__________________________________ ERROR collecting dask/array/tests/test_xarray.py ___________________________________
dask\array\tests\test_xarray.py:6: in <module>
    xr = pytest.importorskip("xarray")
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\__init__.py:10: in <module>
    from .core.alignment import align, broadcast, broadcast_arrays
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\alignment.py:11: in <module>
    from .indexing import get_indexer_nd
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\indexing.py:11: in <module>
    from . import duck_array_ops, nputils, utils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\duck_array_ops.py:16: in <module>
    from . import dask_array_ops, dtypes, npcompat, nputils
..\..\..\..\anaconda3\envs\dask\lib\site-packages\xarray\core\dask_array_ops.py:13: in <module>
    if LooseVersion(dask.__version__) <= LooseVersion('0.18.2'):
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:58: in __le__
    c = self._cmp(other)
..\..\..\..\anaconda3\envs\dask\lib\distutils\version.py:337: in _cmp
    if self.version < other.version:
E   TypeError: '<' not supported between instances of 'str' and 'int'
================================================== warnings summary ===================================================
c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\heapdict.py:11
  c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\heapdict.py:11: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    class heapdict(collections.MutableMapping):

c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\py\_path\local.py:701
  c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\py\_path\local.py:701: UserWarning: ShareDict has been deprecated in favor of HighLevelGraph and will be removed in future versions
    __import__(modname)

c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\pywt\_utils.py:6
  c:\users\b_gra\anaconda3\envs\dask\lib\site-packages\pywt\_utils.py:6: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Iterable

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=============================================== short test summary info ===============================================
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\array\tests\test_cupy.py:10: could not import 'cupy': No module named 'cupy'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\array\tests\test_sparse.py:12: could not import 'sparse': No module named 'sparse'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\bag\tests\test_avro.py:7: could not import 'fastavro': No module named 'fastavro'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\bytes\tests\test_s3.py:13: could not import 's3fs': No module named 's3fs'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\dataframe\io\tests\test_orc.py:13: could not import 'pyarrow.orc': No module named 'pyarrow'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\tests\test_cache.py:10: could not import 'cachey': No module named 'cachey'
SKIPPED [2] C:\Users\b_gra\Desktop\projects\2019\dask\dask\tests\test_dot.py:11: could not import 'graphviz': No module named 'graphviz'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================== 14 skipped, 3 warnings, 2 error in 31.77 seconds ===================================

Copy link
Member

@mrocklin mrocklin left a comment

Thanks @BlaneG !

Can I ask you for a small test for this functionality? If you search for def test.*iter in dask/dataframe/tests/test_dataframe.py you should find some examples of tests for similar functionality.

dask/dataframe/core.py Outdated Show resolved Hide resolved
@BlaneG
Copy link
Contributor Author

@BlaneG BlaneG commented Jul 14, 2019

@mrocklin test added.

Copy link
Member

@mrocklin mrocklin left a comment

Thanks @BlaneG !

Looks good. I made a small suggestion but other than that eveything looks fine. You should be able to just press the green button to commit the suggested change.

dask/dataframe/tests/test_dataframe.py Outdated Show resolved Hide resolved
Co-Authored-By: Matthew Rocklin <mrocklin@gmail.com>
@TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Jul 15, 2019

@BlaneG the test failure looks related: https://travis-ci.org/dask/dask/jobs/558604032#L977 can you take a look?

pytest dask/dataframe/tests/test_dataframe.py -k test_rename_series_method

@BlaneG
Copy link
Contributor Author

@BlaneG BlaneG commented Jul 31, 2019

@TomAugspurger, @mrocklin, I addressed the test failure. Do you want to review changes to the if/else statement I made in the rename function to address this test failure?

After adding the __iter__ method, the condition (is_list_like(index) and not is_dict_like(index) changed from False to True when the index was a dask series. I looked through the applications of Series.rename() in the repo, but couldn't figure out what this statement was trying to catch with the types of indexes that were being passed to series.rename, i.e.:

  • string (e.g. "y")
  • dict (e.g. {"B": "C"})
  • dask series
  • lambda functions (e.g. lambda: x: x**2)

Do you have some more insight on the if condition that I changed?

Also, maybe I should add an else statement with an assertion error when the if/elif conditions are not met?

@TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Aug 5, 2019

I don't really follow the changes to the if / elif conditions. Can you make it smaller? Perhaps just a and not isinstance(index, dd.Index) to the condition (IIUC)?

@BlaneG
Copy link
Contributor Author

@BlaneG BlaneG commented Aug 5, 2019

That makes sense @TomAugspurger.


if is_scalar(index) or (is_list_like(index) and not is_dict_like(index)):
Copy link
Member

@TomAugspurger TomAugspurger Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be something like not isinstance(index, Index). You might need this condition in addition to the is_list_like and is_dict_like checks.

Copy link
Contributor Author

@BlaneG BlaneG Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean...adding and not isinstance(index, Index) to the end of the if statement will avoid the error when passing a dd.Series index to rename. I was originally trying to make the conditions noted in the docstring explicit in the else statement: If dict-like or callable, the transformation is applied to the index.

@TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Aug 6, 2019

Merged master to fix the CI issues.

@TomAugspurger TomAugspurger merged commit 4c9bdb6 into dask:master Aug 6, 2019
2 checks passed
@TomAugspurger
Copy link
Member

@TomAugspurger TomAugspurger commented Aug 6, 2019

Thanks @BlaneG!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants