Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask reshape size error #7496

Closed
RichardScottOZ opened this issue Apr 1, 2021 · 15 comments
Closed

Dask reshape size error #7496

RichardScottOZ opened this issue Apr 1, 2021 · 15 comments
Labels
array needs info Needs further information from the user

Comments

@RichardScottOZ
Copy link

RichardScottOZ commented Apr 1, 2021

image

What happened:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-119-47f24d3281a7> in <module>
----> 1 geotiff_rs = dask.array.reshape(geotiff_flat, (geotiff_flat.shape[2]*geotiff_flat.shape[3], geotiff_flat.shape[0] * geotiff_flat.shape[1]))

~\AppData\Local\Continuum\anaconda3\envs\pangeo3\lib\site-packages\dask\array\reshape.py in reshape(x, shape)
    176 
    177     if reduce(mul, shape, 1) != x.size:
--> 178         raise ValueError("total size of new array must be unchanged")
    179 
    180     if x.shape == shape:

ValueError: total size of new array must be unchanged

What you expected to happen:

Reshape DataArray

Minimal Complete Verifiable Example:

geotiff_rs = dask.array.reshape(geotiff_flat, (geotiff_flat.shape[2]*geotiff_flat.shape[3], geotiff_flat.shape[0] * geotiff_flat.shape[1]))

Anything else we need to know?:

Environment:

  • Dask version: 2.3.0
  • Python version: 3.8.6
  • Operating System: Windows10
  • Install method (conda, pip, source): conda
@jsignell
Copy link
Member

jsignell commented Apr 1, 2021

Thanks for raising this. I get a slightly different error on latest dask, but I still see this behavior. Here is a repoducer:

import dask.array as da

shape = (42, 1, 879, 786)
arr = da.arange(shape[0]*shape[1]*shape[2]*shape[3]).reshape(shape)
arr.reshape(shape[2]*shape[3], shape[0]*shape[1])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-c510dcbec9fd> in <module>
      3 shape = (42, 1, 879, 786)
      4 arr = da.arange(shape[0]*shape[1]*shape[2]*shape[3]).reshape(shape)
----> 5 arr.reshape(shape[2]*shape[3], shape[0]*shape[1])

~/dask/dask/array/core.py in reshape(self, merge_chunks, *shape)
   1917         if len(shape) == 1 and not isinstance(shape[0], Number):
   1918             shape = shape[0]
-> 1919         return reshape(self, shape, merge_chunks=merge_chunks)
   1920 
   1921     def topk(self, k, axis=-1, split_every=None):

~/dask/dask/array/reshape.py in reshape(x, shape, merge_chunks)
    226         x = x.rechunk({i: 1 for i in range(din - dout)})
    227 
--> 228     inchunks, outchunks = reshape_rechunk(x.shape, shape, x.chunks)
    229     x2 = x.rechunk(inchunks)
    230 

~/dask/dask/array/reshape.py in reshape_rechunk(inshape, outshape, inchunks)
     72                 oleft -= 1
     73             if reduce(mul, outshape[oleft : oi + 1]) != din:
---> 74                 raise ValueError("Shapes not compatible")
     75 
     76             # TODO: don't coalesce shapes unnecessarily

ValueError: Shapes not compatible

@RichardScottOZ
Copy link
Author

RichardScottOZ commented Apr 1, 2021

The (reduce, mul ) number was many times bigger in my example, when I asked it to print, too, but the shape created at the start was ok.

@jsignell
Copy link
Member

jsignell commented Apr 2, 2021

Possibly related to #7171

@jsignell
Copy link
Member

jsignell commented Apr 2, 2021

Actually I just read the reshape docstring more carefully and I think the behavior that you are seeing is intentional:

This is a parallelized version of the np.reshape function with the
following limitations:

  1. It assumes that the array is stored in row-major order_
  2. It only allows for reshapings that collapse or merge dimensions like
    (1, 2, 3, 4) -> (1, 6, 4) or (64,) -> (4, 4, 4)

You'll note that it works to reshape in a way that preserves the order like:

arr.reshape(shape[0]*shape[1], shape[2]*shape[3])

I am wondering if there is a way to clear up the docstring to make this constraint more clear, and I am also wondering if what you are actually trying to achieve is a transpose and a reshape:

arr.transpose().reshape(shape[2]*shape[3], shape[0]*shape[1])

Hopefully that solves the issue for you. If you have ideas about how the docstring or the error could be clearer, please open a pull request or make a suggestion here.

@RichardScottOZ
Copy link
Author

Thanks I will try that sometime soon.

Perhaps could include a common 4d and 3d to 2d example in the above that is a bit clearer than 4 cubed? 3 band image, 4d tensor, et al? Not all on one line with multiple arrows? Unlikely anyone uses Dask for 64 elements, generally speaking?

If I get this to work, perhaps I can do an example.

@RichardScottOZ
Copy link
Author

Perhaps crosslink transpose too as you mention there, as maybe there will be others that misunderstand as I have?

@RichardScottOZ
Copy link
Author

RichardScottOZ commented Apr 4, 2021

Ok, so just trying that I get

Geotiff_flat
<xarray.DataArray 'stack-4286d9eeb8766ba3c7466e4b6118c8e8' (variable: 41, band: 1, y: 81794, x: 78063)>
dask.array<stack, shape=(41, 1, 81794, 78063), dtype=int16, chunksize=(1, 1, 8000, 8000), chunktype=numpy.ndarray>
Coordinates:
  * band      (band) int64 1
  * y         (y) float64 -11.89 -11.89 -11.89 -11.89 ... -19.24 -19.24 -19.24
  * x         (x) float64 132.1 132.1 132.1 132.1 ... 139.2 139.2 139.2 139.2
  * variable  (variable) <U17 'Pt-Mal' 'Pt-LyYa' ... 'b7.img' 'b11.img'
Geotiff_flat type <class 'xarray.core.dataarray.DataArray'>

Geotff_squeeze
<xarray.DataArray 'stack-4286d9eeb8766ba3c7466e4b6118c8e8' (variable: 41, y: 81794, x: 78063)>
dask.array<astype, shape=(41, 81794, 78063), dtype=uint16, chunksize=(1, 8000, 8000), chunktype=numpy.ndarray>
Coordinates:
    band      int64 1
  * y         (y) float64 -11.89 -11.89 -11.89 -11.89 ... -19.24 -19.24 -19.24
  * x         (x) float64 132.1 132.1 132.1 132.1 ... 139.2 139.2 139.2 139.2
  * variable  (variable) <U17 'Pt-Mal' 'Pt-LyYa' ... 'b7.img' 'b11.img'
Geotiff squeeze type <class 'xarray.core.dataarray.DataArray'>
Traceback (most recent call last):
  File "enerzai2.py", line 90, in <module>
    geotiff_rs = geotiff_squeeze.transpose().reshape( geotiff_squeeze.shape[1]*geotiff_squeeze.shape[2], geotiff_squeeze.shape[0] )
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/common.py", line 239, in __getattr__
    raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'reshape'

but if I explicitly recast again to a dask array I do get warnings of unexpected behaviours

@RichardScottOZ
Copy link
Author

RichardScottOZ commented Apr 4, 2021

and an eventual error from xarray

geotiff_rs = dask.array.from_array(geotiff_squeeze).transpose().reshape( geotiff_squeeze.shape[1]*geotiff_squeeze.shape[2], geotiff_squeeze.shape[0] )

then I try and compute geotiff_rs as a test

Geotiff_rs
dask.array<reshape, shape=(6385085022, 41), dtype=uint16, chunksize=(59709620, 1), chunktype=numpy.ndarray>
Geotiff_rs type <class 'dask.array.core.Array'>

computing array as test
[##                                      ] | 5% Completed |  5.9s
Traceback (most recent call last):
  File "enerzai2.py", line 100, in <module>
    geotiff_rsNP =  geotiff_rs.compute()
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/base.py", line 284, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/base.py", line 566, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/local.py", line 514, in get_async
    raise_exception(exc, tb)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/local.py", line 325, in reraise
    raise exc
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/local.py", line 223, in execute_task
    result = _execute_task(task, data)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/optimization.py", line 963, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/core.py", line 151, in get
    result = _execute_task(task, cache)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/utils.py", line 34, in apply
    return func(*args, **kwargs)
  File "<__array_function__ internals>", line 5, in transpose
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 658, in transpose
    return _wrapfunc(a, 'transpose', axes)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/dataarray.py", line 2210, in transpose
    dims = tuple(utils.infix_dims(dims, self.dims, missing_dims))
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/utils.py", line 788, in infix_dims
    existing_dims = drop_missing_dims(dims_supplied, dims_all, missing_dims)
  File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/utils.py", line 879, in drop_missing_dims
    raise ValueError(
ValueError: Dimensions {(2, 1, 0)} do not exist. Expected one or more of ('variable', 'y', 'x')

@jsignell
Copy link
Member

jsignell commented Apr 5, 2021

This looks like a separate issue, possibly with transpose in xarray. It'd be best to open a new issue with a minimal reproducible example.

@dcherian
Copy link
Contributor

dcherian commented Apr 5, 2021

Xarray expects dimension names, not axes numbers.

A reshape in xarray land is stack: https://xarray.pydata.org/en/stable/generated/xarray.Dataset.stack.html?highlight=stack#xarray.Dataset.stack Can you try and that and open an issue on Xarray's discussions if you still have trouble?

@jsignell
Copy link
Member

jsignell commented Apr 5, 2021

Ah perfect - thanks for the speedy response @dcherian. I'm not quite sure what's going on since it seems like they are not explicitly including any args or kwargs in transpose.

@jsignell jsignell added array needs info Needs further information from the user labels Apr 5, 2021
@RichardScottOZ
Copy link
Author

Thanks @dcherian I was looking at stack last night but haven't tried it yet - one version is really slow, but to_stacked_array might be useful?

@RichardScottOZ
Copy link
Author

Interestingly, quick test of this works - dies on laptop test from memory issues, not shape issues:-

image

@jsignell
Copy link
Member

I am going to close this issue, since I think the original question has been addressed.

@RichardScottOZ
Copy link
Author

Yes, thanks Julia.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array needs info Needs further information from the user
Projects
None yet
Development

No branches or pull requests

3 participants