Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use da.eye with chunks="auto" fails #4635

Open
jakirkham opened this issue Mar 26, 2019 · 7 comments
Open

Use da.eye with chunks="auto" fails #4635

jakirkham opened this issue Mar 26, 2019 · 7 comments
Labels

Comments

@jakirkham
Copy link
Member

Currently if a user tries to create an array with da.eye(..., chunks="auto"), it fails. An MRE is included below.

Example:
In [1]: import dask.array as da                                                 
da.ey	
In [2]: da.eye(100, chunks=10)                                                  
Out[2]: dask.array<eye, shape=(100, 100), dtype=float64, chunksize=(10, 10)>

In [3]: da.eye(100, chunks="auto")                                              
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-80d20afa2a92> in <module>
----> 1 da.eye(100, chunks="auto")

~/Developer/dask/dask/array/creation.py in eye(N, chunks, M, k, dtype)
    437     """
    438     if not isinstance(chunks, Integral):
--> 439         raise ValueError('chunks must be an int')
    440 
    441     token = tokenize(N, chunk, M, k, dtype)

ValueError: chunks must be an int

Environment:
name: daskdev
channels:
  - conda-forge
  - defaults
dependencies:
  - appnope=0.1.0=py37_1000
  - asn1crypto=0.24.0=py37_1003
  - attrs=19.1.0=py_0
  - backcall=0.1.0=py_0
  - blas=1.1=openblas
  - bleach=3.1.0=py_0
  - bokeh=1.0.4=py37_1000
  - bzip2=1.0.6=h1de35cc_1002
  - ca-certificates=2019.3.9=hecc5488_0
  - certifi=2019.3.9=py37_0
  - cffi=1.12.2=py37h2d6ddff_1
  - chardet=3.0.4=py37_1003
  - click=7.0=py_0
  - cloudpickle=0.8.0=py_0
  - cryptography=2.6.1=py37hc2b1221_0
  - cycler=0.10.0=py_1
  - cytoolz=0.9.0.1=py37h470a237_1
  - dask-glm=0.1.0=0
  - dask-ml=0.12.0=py_0
  - decorator=4.3.2=py_0
  - defusedxml=0.5.0=py_1
  - distributed=1.26.0=py37_1
  - entrypoints=0.3=py37_1000
  - freetype=2.9.1=h597ad8a_1005
  - heapdict=1.0.0=py37_1000
  - idna=2.8=py37_1000
  - imageio=2.5.0=py37_0
  - ipykernel=5.1.0=py37h24bf2e0_1002
  - ipython=7.3.0=py37h24bf2e0_0
  - ipython_genutils=0.2.0=py_1
  - jedi=0.13.3=py37_0
  - jinja2=2.10=py_1
  - jpeg=9c=h1de35cc_1001
  - jsonschema=3.0.1=py37_0
  - jupyter_client=5.2.4=py_3
  - jupyter_core=4.4.0=py_0
  - kiwisolver=1.0.1=py37h04f5b5a_1002
  - libcxx=7.0.0=h2d50403_1
  - libffi=3.2.1=h0a44026_1005
  - libgfortran=3.0.1=0
  - libpng=1.6.36=ha441bb4_1000
  - libsodium=1.0.16=h1de35cc_1001
  - libtiff=4.0.10=h79f4b77_1001
  - llvm-meta=7.0.0=0
  - llvmlite=0.26.0=py37h8c7ce04_0
  - locket=0.2.0=py_2
  - markupsafe=1.1.1=py37h1de35cc_0
  - matplotlib-base=3.0.3=py37hf043ca5_0
  - mistune=0.8.4=py37h1de35cc_1000
  - msgpack-python=0.6.1=py37h04f5b5a_0
  - multipledispatch=0.6.0=py_0
  - nbconvert=5.4.1=py_2
  - nbformat=4.4.0=py_1
  - ncurses=6.1=h0a44026_1002
  - networkx=2.2=py_1
  - notebook=5.7.6=py37_0
  - numba=0.41.0=py37h1702cab_1000
  - numpy=1.16.2=py37_blas_openblash486cb9f_0
  - olefile=0.46=py_0
  - openblas=0.3.3=hdc02c5d_1001
  - openssl=1.1.1b=h1de35cc_1
  - packaging=19.0=py_0
  - pandas=0.24.2=py37h0a44026_0
  - pandoc=2.6=1
  - pandocfilters=1.4.2=py_1
  - parso=0.3.4=py_0
  - partd=0.3.9=py_0
  - pexpect=4.6.0=py37_1000
  - pickleshare=0.7.5=py37_1000
  - pillow=5.4.1=py37hbddbef0_1000
  - pip=19.0.3=py37_0
  - prometheus_client=0.6.0=py_0
  - prompt_toolkit=2.0.9=py_0
  - psutil=5.6.1=py37h1de35cc_0
  - ptyprocess=0.6.0=py37_1000
  - pycparser=2.19=py37_1
  - pygments=2.3.1=py_0
  - pyopenssl=19.0.0=py37_0
  - pyparsing=2.3.1=py_0
  - pyrsistent=0.14.11=py37h1de35cc_0
  - pysocks=1.6.8=py37_1002
  - python=3.7.1=hbdd33cc_1003
  - python-dateutil=2.8.0=py_0
  - pytz=2018.9=py_0
  - pywavelets=1.0.2=py37h917ab60_0
  - pyyaml=5.1=py37h1de35cc_0
  - pyzmq=18.0.1=py37h4cc6ddd_0
  - readline=7.0=hcfe32e1_1001
  - requests=2.21.0=py37_1000
  - scikit-image=0.14.2=py37h0a44026_1
  - scikit-learn=0.20.3=py37_blas_openblashc6dc708_0
  - scipy=1.2.1=py37_blas_openblash486cb9f_0
  - send2trash=1.5.0=py_0
  - setuptools=40.8.0=py37_0
  - six=1.12.0=py37_1000
  - sortedcontainers=2.1.0=py_0
  - sparse=0.6.0=py_0
  - sqlite=3.26.0=h1765d9f_1001
  - tblib=1.3.2=py_1
  - terminado=0.8.1=py37_1001
  - testpath=0.4.2=py37_1000
  - tk=8.6.9=ha441bb4_1000
  - toolz=0.9.0=py_1
  - tornado=6.0.1=py37h1de35cc_0
  - traitlets=4.3.2=py37_1000
  - urllib3=1.24.1=py37_1000
  - wcwidth=0.1.7=py_1
  - webencodings=0.5.1=py_1
  - wheel=0.33.1=py37_0
  - xz=5.2.4=h1de35cc_1001
  - yaml=0.1.7=h1de35cc_1001
  - zeromq=4.2.5=h0a44026_1006
  - zict=0.1.4=py_0
  - zlib=1.2.11=h1de35cc_1004
@mrocklin
Copy link
Member

It looks like the chunks parameter to eye differs from how it is used everywhere else. This is unfortunate, but also not very surprising. Supporting arbitrary chunks= values would be more difficult. Maybe we look at the value, verify that it is an integer or a string (otherwise err informatively) then call normalize_chunks if it's a string and pull out the default value again?

if not isinstance(chunks, (int, str)):
    raise ...
elif isinstance(chunks, str):
    chunks = normalize_chunks(chunks, ...)
    chunks = chunks[0][0]  # grab out an integer again

?

@andersy005
Copy link
Member

andersy005 commented May 2, 2019

Would it be useful to add chunks='auto' as a default argument to da.eye(), then apply the logic that @mrocklin described above? It seems to me that this is the common behavior for most if not all other functions in array/creation.py module.

In [9]: with dask.config.set({'array.chunk-size': '50 MiB'}): 
   ...:     x = da.eye(500000, 'auto') 
   ...:     print(x) 
   ...:                                                                              
dask.array<eye, shape=(500000, 500000), dtype=float64, chunksize=(2500, 2500)>

@jakirkham
Copy link
Member Author

Good question. IIRC I was talking with @lightsighter, which prompted this issue. Maybe he can advise us on desirable behavior.

@lightsighter
Copy link

I would like da.eye to support arbitrary chunking sizes because I was using a brute-force autotuning script to select the chunks sizes. It really starts to get hard to do autotuning when you can't choose arbitrarily reasonable values.

@jakirkham
Copy link
Member Author

So chunks='auto' is now supported in da.eye thanks to PR ( #4834 ).

The remaining task is to support anisotropic chunks for da.eye as well. For example we'd like to support the following, da.eye(12, chunks=(4, 3)) (stolen from Tom's example ;).

@mrocklin
Copy link
Member

mrocklin commented May 28, 2019 via email

@jakirkham
Copy link
Member Author

Personally I don't see a problem with keeping this issue open (especially if we think it is in scope). Though I agree this seems like lower priority relative to other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants