New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask reshape size error #7496
Comments
Thanks for raising this. I get a slightly different error on latest dask, but I still see this behavior. Here is a repoducer: import dask.array as da
shape = (42, 1, 879, 786)
arr = da.arange(shape[0]*shape[1]*shape[2]*shape[3]).reshape(shape)
arr.reshape(shape[2]*shape[3], shape[0]*shape[1]) ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-c510dcbec9fd> in <module>
3 shape = (42, 1, 879, 786)
4 arr = da.arange(shape[0]*shape[1]*shape[2]*shape[3]).reshape(shape)
----> 5 arr.reshape(shape[2]*shape[3], shape[0]*shape[1])
~/dask/dask/array/core.py in reshape(self, merge_chunks, *shape)
1917 if len(shape) == 1 and not isinstance(shape[0], Number):
1918 shape = shape[0]
-> 1919 return reshape(self, shape, merge_chunks=merge_chunks)
1920
1921 def topk(self, k, axis=-1, split_every=None):
~/dask/dask/array/reshape.py in reshape(x, shape, merge_chunks)
226 x = x.rechunk({i: 1 for i in range(din - dout)})
227
--> 228 inchunks, outchunks = reshape_rechunk(x.shape, shape, x.chunks)
229 x2 = x.rechunk(inchunks)
230
~/dask/dask/array/reshape.py in reshape_rechunk(inshape, outshape, inchunks)
72 oleft -= 1
73 if reduce(mul, outshape[oleft : oi + 1]) != din:
---> 74 raise ValueError("Shapes not compatible")
75
76 # TODO: don't coalesce shapes unnecessarily
ValueError: Shapes not compatible |
The (reduce, mul ) number was many times bigger in my example, when I asked it to print, too, but the shape created at the start was ok. |
Possibly related to #7171 |
Actually I just read the reshape docstring more carefully and I think the behavior that you are seeing is intentional:
You'll note that it works to reshape in a way that preserves the order like: arr.reshape(shape[0]*shape[1], shape[2]*shape[3]) I am wondering if there is a way to clear up the docstring to make this constraint more clear, and I am also wondering if what you are actually trying to achieve is a transpose and a reshape: arr.transpose().reshape(shape[2]*shape[3], shape[0]*shape[1]) Hopefully that solves the issue for you. If you have ideas about how the docstring or the error could be clearer, please open a pull request or make a suggestion here. |
Thanks I will try that sometime soon. Perhaps could include a common 4d and 3d to 2d example in the above that is a bit clearer than 4 cubed? 3 band image, 4d tensor, et al? Not all on one line with multiple arrows? Unlikely anyone uses Dask for 64 elements, generally speaking? If I get this to work, perhaps I can do an example. |
Perhaps crosslink transpose too as you mention there, as maybe there will be others that misunderstand as I have? |
Ok, so just trying that I get Geotiff_flat
<xarray.DataArray 'stack-4286d9eeb8766ba3c7466e4b6118c8e8' (variable: 41, band: 1, y: 81794, x: 78063)>
dask.array<stack, shape=(41, 1, 81794, 78063), dtype=int16, chunksize=(1, 1, 8000, 8000), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1
* y (y) float64 -11.89 -11.89 -11.89 -11.89 ... -19.24 -19.24 -19.24
* x (x) float64 132.1 132.1 132.1 132.1 ... 139.2 139.2 139.2 139.2
* variable (variable) <U17 'Pt-Mal' 'Pt-LyYa' ... 'b7.img' 'b11.img'
Geotiff_flat type <class 'xarray.core.dataarray.DataArray'>
Geotff_squeeze
<xarray.DataArray 'stack-4286d9eeb8766ba3c7466e4b6118c8e8' (variable: 41, y: 81794, x: 78063)>
dask.array<astype, shape=(41, 81794, 78063), dtype=uint16, chunksize=(1, 8000, 8000), chunktype=numpy.ndarray>
Coordinates:
band int64 1
* y (y) float64 -11.89 -11.89 -11.89 -11.89 ... -19.24 -19.24 -19.24
* x (x) float64 132.1 132.1 132.1 132.1 ... 139.2 139.2 139.2 139.2
* variable (variable) <U17 'Pt-Mal' 'Pt-LyYa' ... 'b7.img' 'b11.img'
Geotiff squeeze type <class 'xarray.core.dataarray.DataArray'> Traceback (most recent call last):
File "enerzai2.py", line 90, in <module>
geotiff_rs = geotiff_squeeze.transpose().reshape( geotiff_squeeze.shape[1]*geotiff_squeeze.shape[2], geotiff_squeeze.shape[0] )
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/common.py", line 239, in __getattr__
raise AttributeError(
AttributeError: 'DataArray' object has no attribute 'reshape' but if I explicitly recast again to a dask array I do get warnings of unexpected behaviours |
and an eventual error from xarray geotiff_rs = dask.array.from_array(geotiff_squeeze).transpose().reshape( geotiff_squeeze.shape[1]*geotiff_squeeze.shape[2], geotiff_squeeze.shape[0] ) then I try and compute geotiff_rs as a test Geotiff_rs
dask.array<reshape, shape=(6385085022, 41), dtype=uint16, chunksize=(59709620, 1), chunktype=numpy.ndarray>
Geotiff_rs type <class 'dask.array.core.Array'>
computing array as test
[## ] | 5% Completed | 5.9s
Traceback (most recent call last):
File "enerzai2.py", line 100, in <module>
geotiff_rsNP = geotiff_rs.compute()
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/base.py", line 284, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/base.py", line 566, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/threaded.py", line 79, in get
results = get_async(
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/local.py", line 514, in get_async
raise_exception(exc, tb)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/local.py", line 325, in reraise
raise exc
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/local.py", line 223, in execute_task
result = _execute_task(task, data)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/optimization.py", line 963, in __call__
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/core.py", line 151, in get
result = _execute_task(task, cache)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/dask/utils.py", line 34, in apply
return func(*args, **kwargs)
File "<__array_function__ internals>", line 5, in transpose
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 658, in transpose
return _wrapfunc(a, 'transpose', axes)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/dataarray.py", line 2210, in transpose
dims = tuple(utils.infix_dims(dims, self.dims, missing_dims))
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/utils.py", line 788, in infix_dims
existing_dims = drop_missing_dims(dims_supplied, dims_all, missing_dims)
File "/home/ubuntu/anaconda3/envs/richard/lib/python3.8/site-packages/xarray/core/utils.py", line 879, in drop_missing_dims
raise ValueError(
ValueError: Dimensions {(2, 1, 0)} do not exist. Expected one or more of ('variable', 'y', 'x')
|
This looks like a separate issue, possibly with transpose in xarray. It'd be best to open a new issue with a minimal reproducible example. |
Xarray expects dimension names, not axes numbers. A reshape in xarray land is |
Ah perfect - thanks for the speedy response @dcherian. I'm not quite sure what's going on since it seems like they are not explicitly including any args or kwargs in transpose. |
Thanks @dcherian I was looking at stack last night but haven't tried it yet - one version is really slow, but to_stacked_array might be useful? |
I am going to close this issue, since I think the original question has been addressed. |
Yes, thanks Julia. |
What happened:
What you expected to happen:
Reshape DataArray
Minimal Complete Verifiable Example:
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: