Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks which are obviously root tasks not considered rootish #7274

Open
gjoseph92 opened this issue Nov 8, 2022 · 6 comments · May be fixed by #7221
Open

Tasks which are obviously root tasks not considered rootish #7274

gjoseph92 opened this issue Nov 8, 2022 · 6 comments · May be fixed by #7221

Comments

@gjoseph92
Copy link
Collaborator

If a task does not have dependencies, it's obviously a root task.

However, is_rootish currently requires that the task also have > 2 * total_nthreads other tasks in its TaskGroup. (The 2x is its own problem: #7273.)

This means some tasks which are clearly root tasks don't go down root-task code paths.

The consequences of this aren't obvious. They may not be particularly noticeable as a user. See #7221 (comment) for discussion of practical implications. My main concern so far is that this could be hiding untested and possibly incorrect behavior. (So there might be practical implications, we just haven't noticed them yet.)

  1. Many of our tests don't use root-task code paths, even when queuing is enabled, because they don't submit enough tasks. So whatever situations they're testing for (which presumably could happen with more tasks too) are un-tested with queuing on. At least 12 tests fail with queuing on when this problem is fixed; we don't know how many pass in both cases, but were not using the queuing code path and would not have caught a regression.
  2. More codepaths make reasoning harder. Combined with the fact that not all tests may be running, it's harder to avoid incorrect behavior at edge cases. Specifically, the round-robin code path doesn't get much explicit testing, is something we've already wanted to remove Simplify decide_worker #6974, and does not interact well with worker-saturation Round-robin worker selection makes poor choices with worker-saturation > 1.0 #7197 — something CI didn't show until we tried changing the default.
@dcherian
Copy link

dcherian commented Apr 21, 2023

I think I'm running in to some version of this.

The behaviour changes drastically depending on whether a couple of xarray DataArrays are chunked at read-time. I have a reproducible workflow but it is complicated. I can try to minimize but maybe you can already tell what's wrong by looking at the graphs.

The computation involving these dataarrays is

u_e = (merged.UVEL*dyu*dzu/1.e4)
v_e = (merged.VVEL*dxu*dzu/1.e4)

UVEL, VVEL are chunked in the xr.open_dataset call, dzu always wraps a numpy array, the only difference between the two below is whether dxu and dyu are chunked.

unchunked

looks great, open_dataset tasks were identified as "root tasks" (is that what the diagonal lines mean) and memory usage was great

graph1

image

chunked

not good. Note that open_dataset is not identified as a root task, and memory use is much higher. The difference is open_dataset-DXU and open_dataset-DYU on the right side, i think the quivalents are array-24, array-48 in the previous graph

graph2

image

@mrocklin
Copy link
Member

UVEL, VVEL are chunked in the xr.open_dataset call, dzu always wraps a numpy array, the only difference between the two below is whether dxu and dyu are chunked.

This is going to sound dumb, but can you leave them unchunked? 😉. Seriously though, is this you reporting a case that should theoretically be smoother but you're alright, or is this blocking you painfully?

@fjetter do you have anyone who can take a look at this meaningfully? I wouldn't be surprised if there was some special-case here, such as literals being fused in.

I have a reproducible workflow but it is complicated. I can try to minimize but maybe you can already tell what's wrong by looking at the graphs.

Thanks for the offer. Let's see what @fjetter comes back with. I wouldn't be surprised if minimizing ends up being necessary.

@fjetter
Copy link
Member

fjetter commented Apr 25, 2023

First gut reaction without any meaningful thought: Is it easy and not too expensive for you to try #7531? Over there I played a bit with the rootish classification logic and our benchmarks looked fine. However, the logic I am introducing there is something we were not fully convinced of without more data.
Maybe this helps (or destroys) your workload which would be an interesting data point. Bonus points if we can run this on Coiled ;)


Can somebody help me understand what the difference between chunked and unchunked is? Does this mean chunks is explicitly provided in the call to open_dataset, i.e. the rechunking happening behind the scenes is different? Looking at the task screens it looks like the graphs contain the same prefixes and same task counts in both instances but I can't puzzle out the difference and I'm not sufficiently familiar with xarray.

@fjetter
Copy link
Member

fjetter commented Apr 25, 2023

We can maybe run some diagnostics with the following snippet

def get_tg_info(dask_scheduler):
     return {
         tg.name: {
             'nthreads': dask_scheduler.total_nthreads,
             'len': len(tg),
             'len_dep': len(tg.dependencies),
             'sum_map': sum(map(len, tg.dependencies))
         }
         for tg in dask_scheduler.task_groups.values()
     }
client.run_on_scheduler(get_tg_info)

Can you run this once for the good and the bad example? This extracts the internal info that is being used to classify the tasks by rootish/non-rootish and we can see what exactly is leading to the wrong classification

@mrocklin
Copy link
Member

My guess (uneducated) is that chunked xarray datasets use dask arrays while unchunked xarray datasets use numpy.

@dcherian
Copy link

dcherian commented May 8, 2023

This is going to sound dumb, but can you leave them unchunked? 😉. Seriously though, is this you reporting a case that should theoretically be smoother but you're alright, or is this blocking you painfully?

The problem is knowing the right thing to do. As illustration, here's how this adventure went:

  1. I received the notebook with the complaint that it only worked for a 100 year long dataset when batching the inputs 10 years at a time.
  2. I made some edits, increasing chunk size, and got the calculation to work for a 50 year chunk of data. Great!
  3. I then sized up the cluster and tried all 100 years. No luck, dask is reading more data than it is reducing, so we eventually run out of memory. I notice this warning:
/glade/u/home/dcherian/miniconda3/envs/ipogs/lib/python3.10/site-packages/distributed/client.py:3106: UserWarning: Sending large graph of size 284.27 MiB.
This may cause some slowdown.
Consider scattering data ahead of time and using futures.
  1. I looked for places where non-dask things were being used. Then I spotted a xarray.open_dataset line with no chunks specified. So I added some chunking there, still no luck.
  2. At that point I go back to a 50 year batch size but keep the new chunking specified in (5). Now things are still broken.
  3. After a long time I realize the dashboard shows me different root tasks depending on whether I specify chunks in that single open_dataset call. After which I post Tasks which are obviously root tasks not considered rootish #7274 (comment)

Honestly if I wrote this code from scratch, I would have written the open_dataset call with chunks specified and never gotten the thing to work for a 50 year batch size.

Also I apologize, I intended to post this thread on #7273 but messed up. This comment from Gabe seems very descriptive of step (3) in my adventure.

If you make your problem size smaller, or make your cluster bigger—two things that you'd expect to reduce per-worker memory usage—you may cross an opaque magic threshold at which your workload suddenly uses up to 2x more memory.


Can you run this once for the good and the bad example? This extracts the internal info that is being used to classify the tasks by rootish/non-rootish and we can see what exactly is leading to the wrong classification

@fjetter Thanks here's the info with

distributed: 2023.4.1
xarray     : 2023.1.0
dask       : 2023.4.1
numpy      : 1.23.5

and cluster setup

cluster = PBSCluster(
        cores=2,
        memory='30GiB',
        processes=1,
        queue='casper',
        resource_spec='select=1:ncpus=2:mem=30GB', 
        account='ncgd0011',
        walltime='01:00:00',
        interface='ib0',)
cluster.scale(20)

no chunking

{'concatenate-ef587e065dd7f163c9d8da9d9ceac7e2': {'nthreads': 22,
  'len': 1080,
  'len_dep': 2,
  'sum_map': 1080},
 'astype-broadcast_to-getitem-concatenate-ef587e065dd7f163c9d8da9d9ceac7e2': {'nthreads': 22,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540},
 'transpose-66f4b0e6276e5fd00d43c8a24eaee84f': {'nthreads': 22,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540},
 'nancumsum-setitem-transpose-66f4b0e6276e5fd00d43c8a24eaee84f': {'nthreads': 22,
  'len': 540,
  'len_dep': 2,
  'sum_map': 541},
 'sum-aggregate-63289758805633ec5d11a2b18edc35fa': {'nthreads': 22,
  'len': 540,
  'len_dep': 1,
  'sum_map': 1080},
 'sum-partial-a7790e47e5bf8ac649ed7beeee92e81a': {'nthreads': 22,
  'len': 1080,
  'len_dep': 2,
  'sum_map': 4320},
 'sum-partial-7ae749f995f3a8d2c9da01a1b81f939a': {'nthreads': 22,
  'len': 1080,
  'len_dep': 1,
  'sum_map': 3240},
 'sum-partial-27e0673968a08a2227d6ac56a83e14e0': {'nthreads': 22,
  'len': 3240,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-f091cfdce318729306ca22f0ed2f53a9': {'nthreads': 22,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 12972},
 'rechunk-merge-a776a5520df945d007b880d33065e912': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-a776a5520df945d007b880d33065e912': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-87529c8ca92197f637d36731667ec0c7': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'getitem-71afddeaaffc893c65f92b85d2036300': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'nancumsum-neg-getitem-71afddeaaffc893c65f92b85d2036300': {'nthreads': 22,
  'len': 6480,
  'len_dep': 4,
  'sum_map': 25920},
 'sum-aggregate-ddbe845f4a9f5cc272c17a0239f7d978': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-sum-aggregate-ddbe845f4a9f5cc272c17a0239f7d978': {'nthreads': 22,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 19440},
 'truediv-2e823a27a1723f60226f921ae4adba24': {'nthreads': 22,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 6504},
 'rechunk-merge-3856c120ccfb41360af5857eebea1327': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-3856c120ccfb41360af5857eebea1327': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-1bd89cb79248107ca0615e68383dec7a': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'getitem-afb52e51fd4eeffb787d1455fce2927e': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'where-getitem-afb52e51fd4eeffb787d1455fce2927e': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 4},
 'original': {'nthreads': 22, 'len': 4, 'len_dep': 0, 'sum_map': 0},
 'rechunk-merge-8830b6989eefa8c88f47df888b306486': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-8830b6989eefa8c88f47df888b306486': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-1f757566036ece865da6871676abaf9b': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'sub-059fd4cda2da80609cb64874c950acc4': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'compute_eos-compute_eos_0-sub-059fd4cda2da80609cb64874c950acc4': {'nthreads': 22,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 12961},
 'getitem-76fb022692bef9a7df06a0785026fda4': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'open_dataset-getitem-76fb022692bef9a7df06a0785026fda4': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 4},
 'array-f3e5592e80649dafe228ccf744173f68': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'getitem-2fa58ffd3db3842832c121f0670a98c9': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'open_dataset-getitem-2fa58ffd3db3842832c121f0670a98c9': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 4},
 'rechunk-merge-3b848316b0a560d079527b61ff318aee': {'nthreads': 22,
  'len': 6480,
  'len_dep': 2,
  'sum_map': 18900},
 'rechunk-split-3b848316b0a560d079527b61ff318aee': {'nthreads': 22,
  'len': 11880,
  'len_dep': 1,
  'sum_map': 7020},
 'concatenate-d035a941fd0b9bdd5e095879b94b4452': {'nthreads': 22,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 7020},
 'getitem-concatenate-d035a941fd0b9bdd5e095879b94b4452': {'nthreads': 22,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-aggregate-48b3672b7d8e36557e73b877df64c939': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-sum-aggregate-48b3672b7d8e36557e73b877df64c939': {'nthreads': 22,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 19440},
 'truediv-ee7c157bc1dae95d15520bbec4085a7c': {'nthreads': 22,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 6504},
 'rechunk-merge-f8ece5b208dc2f428ebf3ac767fec239': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-f8ece5b208dc2f428ebf3ac767fec239': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-b80326d1a72990951e13925dcfa30388': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'getitem-e55a84c9d024b023fde7432ddddb19d1': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'where-getitem-e55a84c9d024b023fde7432ddddb19d1': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 4},
 'rechunk-merge-c13fa64f8d6146c9a0df20d1af4dd764': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 12960},
 'concatenate-b64b3c85cce74ff1d81b30e669e909dd': {'nthreads': 22,
  'len': 12960,
  'len_dep': 2,
  'sum_map': 12960},
 'getitem-concatenate-b64b3c85cce74ff1d81b30e669e909dd': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'broadcast_to-concatenate-b64b3c85cce74ff1d81b30e669e909dd': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 1},
 'array-ace5aaf197c34f0f95a5763ab398f518': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'rechunk-merge-c07df27804c2e13e02e944248d292db0': {'nthreads': 22,
  'len': 6480,
  'len_dep': 2,
  'sum_map': 18900},
 'rechunk-split-c07df27804c2e13e02e944248d292db0': {'nthreads': 22,
  'len': 11880,
  'len_dep': 1,
  'sum_map': 7020},
 'concatenate-642f0fe92d439ee90261af31158f5895': {'nthreads': 22,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 7020},
 'getitem-concatenate-642f0fe92d439ee90261af31158f5895': {'nthreads': 22,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 6480},
 'rechunk-merge-fb0b6f5a4aba108df1bc7e10170bd1d5': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 12960},
 'concatenate-509259909755a23c3ccdf4f25dc7c889': {'nthreads': 22,
  'len': 12960,
  'len_dep': 2,
  'sum_map': 12960},
 'broadcast_to-concatenate-509259909755a23c3ccdf4f25dc7c889': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 1},
 'getitem-concatenate-509259909755a23c3ccdf4f25dc7c889': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'broadcast_to-fa995185568331a08fdcd587f7c3d6bd': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-merge-0e17780e50b355adcd0639a18baf2e4b': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-0e17780e50b355adcd0639a18baf2e4b': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-ed8f282a14efa5790b6732ee307e8546': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'array-b66d90a056a045428594c8b8a8314b99': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'add-broadcast_to-concatenate-ef587e065dd7f163c9d8da9d9ceac7e2': {'nthreads': 22,
  'len': 540,
  'len_dep': 2,
  'sum_map': 1080},
 'getitem-d3b4fcae341916dda5a41d70f71e562c': {'nthreads': 22,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540},
 'nancumsum-sum-aggregate-truediv-getitem-d3b4fcae341916dda5a41d70f71e562c': {'nthreads': 22,
  'len': 540,
  'len_dep': 1,
  'sum_map': 1620},
 'sum-partial-f41e4944f3107f928d0ccff6ae640065': {'nthreads': 22,
  'len': 1620,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-570cb34bb0cc8c6029f5e2884e7f5155': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'getitem-mul-sum-570cb34bb0cc8c6029f5e2884e7f5155': {'nthreads': 22,
  'len': 6480,
  'len_dep': 2,
  'sum_map': 6492},
 'getitem-81430ec04a2dbff95b234368c5688bd5': {'nthreads': 22,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'rechunk-merge-c09f381e48a86b37adc1a3ef4092bc58': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-c09f381e48a86b37adc1a3ef4092bc58': {'nthreads': 22,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-0e6673e35ec726f7a3c9ad9b7f233d83': {'nthreads': 22,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'getitem-da6642d820d08f86bbaf4f9a86a018da': {'nthreads': 22,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540}}

chunked inputs

{'concatenate-733df5b2704164247ff6d63f51c495d4': {'nthreads': 38,
  'len': 1080,
  'len_dep': 2,
  'sum_map': 1080},
 'astype-broadcast_to-getitem-concatenate-733df5b2704164247ff6d63f51c495d4': {'nthreads': 38,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540},
 'transpose-b768baf3cb5735cb69952595c5a9ea60': {'nthreads': 38,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540},
 'nancumsum-setitem-transpose-b768baf3cb5735cb69952595c5a9ea60': {'nthreads': 38,
  'len': 540,
  'len_dep': 2,
  'sum_map': 541},
 'sum-aggregate-2e380536a9edf8437ec06ac3fabf8050': {'nthreads': 38,
  'len': 540,
  'len_dep': 1,
  'sum_map': 1080},
 'sum-partial-2226556116ebab4dcd6f1a21e3ec2768': {'nthreads': 38,
  'len': 1080,
  'len_dep': 2,
  'sum_map': 4320},
 'sum-partial-8c93eacd14e04e868388f1d751c4b1b1': {'nthreads': 38,
  'len': 3240,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-d6761d747b4a07b4596994acffb78ace': {'nthreads': 38,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 12972},
 'getitem-9fe0d01220cdfb631a971e6c38279cfb': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'nancumsum-neg-getitem-9fe0d01220cdfb631a971e6c38279cfb': {'nthreads': 38,
  'len': 6480,
  'len_dep': 4,
  'sum_map': 25920},
 'rechunk-merge-f8b47c7dc5bb413d347828d20457dff2': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 12960},
 'concatenate-155f47a90e15756fbb735fd200d7e820': {'nthreads': 38,
  'len': 12960,
  'len_dep': 2,
  'sum_map': 12960},
 'getitem-concatenate-155f47a90e15756fbb735fd200d7e820': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-aggregate-682f5ffc221dd5aa9c70e1d9a28649fa': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-sum-aggregate-682f5ffc221dd5aa9c70e1d9a28649fa': {'nthreads': 38,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 19440},
 'sub-059fd4cda2da80609cb64874c950acc4': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'compute_eos-compute_eos_0-sub-059fd4cda2da80609cb64874c950acc4': {'nthreads': 38,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 12961},
 'getitem-2fa58ffd3db3842832c121f0670a98c9': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'open_dataset-getitem-2fa58ffd3db3842832c121f0670a98c9': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6},
 'original': {'nthreads': 38, 'len': 6, 'len_dep': 0, 'sum_map': 0},
 'array-f3e5592e80649dafe228ccf744173f68': {'nthreads': 38,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'getitem-76fb022692bef9a7df06a0785026fda4': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'open_dataset-getitem-76fb022692bef9a7df06a0785026fda4': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6},
 'truediv-fd79c61798c02a52d21bcb31515f93b9': {'nthreads': 38,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 6504},
 'getitem-afb52e51fd4eeffb787d1455fce2927e': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'where-getitem-afb52e51fd4eeffb787d1455fce2927e': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6},
 'getitem-b9f972f7d26cd390af6b0d80504fa4af': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'open_dataset-getitem-b9f972f7d26cd390af6b0d80504fa4af': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 6},
 'rechunk-merge-8830b6989eefa8c88f47df888b306486': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-8830b6989eefa8c88f47df888b306486': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-1f757566036ece865da6871676abaf9b': {'nthreads': 38,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'rechunk-merge-7df5493ebbfc6795a377924cf5ca4756': {'nthreads': 38,
  'len': 6480,
  'len_dep': 2,
  'sum_map': 18900},
 'concatenate-daee6a6a478cd533af401f56519c35ca': {'nthreads': 38,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 7020},
 'getitem-concatenate-daee6a6a478cd533af401f56519c35ca': {'nthreads': 38,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 6480},
 'rechunk-split-7df5493ebbfc6795a377924cf5ca4756': {'nthreads': 38,
  'len': 11880,
  'len_dep': 1,
  'sum_map': 7020},
 'broadcast_to-concatenate-155f47a90e15756fbb735fd200d7e820': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 1},
 'array-ace5aaf197c34f0f95a5763ab398f518': {'nthreads': 38,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'rechunk-merge-127f91cb60aafedd13d5336e05e12091': {'nthreads': 38,
  'len': 6480,
  'len_dep': 2,
  'sum_map': 18900},
 'rechunk-split-127f91cb60aafedd13d5336e05e12091': {'nthreads': 38,
  'len': 11880,
  'len_dep': 1,
  'sum_map': 7020},
 'concatenate-1f713f3fb69990eb35b3ea3b1322c040': {'nthreads': 38,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 7020},
 'getitem-concatenate-1f713f3fb69990eb35b3ea3b1322c040': {'nthreads': 38,
  'len': 7020,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-aggregate-eff4f15c7ea9fdb5cc761d6779f58419': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-sum-aggregate-eff4f15c7ea9fdb5cc761d6779f58419': {'nthreads': 38,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 19440},
 'truediv-df98e2a579e59210d6894a134a57008f': {'nthreads': 38,
  'len': 6480,
  'len_dep': 3,
  'sum_map': 6504},
 'getitem-e55a84c9d024b023fde7432ddddb19d1': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'where-getitem-e55a84c9d024b023fde7432ddddb19d1': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6},
 'getitem-340403d31219797c50bd554f8cce10e7': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'open_dataset-getitem-340403d31219797c50bd554f8cce10e7': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 6},
 'rechunk-merge-2dede9e59077d227dcb29cd60b6cbd02': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 12960},
 'concatenate-b159ac3ee8a1e48d7f40260ecad71e67': {'nthreads': 38,
  'len': 12960,
  'len_dep': 2,
  'sum_map': 12960},
 'broadcast_to-concatenate-b159ac3ee8a1e48d7f40260ecad71e67': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 1},
 'getitem-concatenate-b159ac3ee8a1e48d7f40260ecad71e67': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'rechunk-merge-a776a5520df945d007b880d33065e912': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-a776a5520df945d007b880d33065e912': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-87529c8ca92197f637d36731667ec0c7': {'nthreads': 38,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'broadcast_to-fa995185568331a08fdcd587f7c3d6bd': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-merge-0e17780e50b355adcd0639a18baf2e4b': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-0e17780e50b355adcd0639a18baf2e4b': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-ed8f282a14efa5790b6732ee307e8546': {'nthreads': 38,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'sum-partial-6c1221d15698d45745d47391f7c06f4c': {'nthreads': 38,
  'len': 1080,
  'len_dep': 1,
  'sum_map': 3240},
 'array-b66d90a056a045428594c8b8a8314b99': {'nthreads': 38,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'add-broadcast_to-concatenate-733df5b2704164247ff6d63f51c495d4': {'nthreads': 38,
  'len': 540,
  'len_dep': 2,
  'sum_map': 1080},
 'getitem-05fa60a16e5f6ea114e9a39975770593': {'nthreads': 38,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540},
 'nancumsum-sum-aggregate-truediv-getitem-05fa60a16e5f6ea114e9a39975770593': {'nthreads': 38,
  'len': 540,
  'len_dep': 1,
  'sum_map': 1620},
 'sum-partial-3a8991a6975d564ff5f7abb90f6681a6': {'nthreads': 38,
  'len': 1620,
  'len_dep': 1,
  'sum_map': 6480},
 'sum-b30eca24887c566767effdae080d51ef': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'getitem-mul-sum-b30eca24887c566767effdae080d51ef': {'nthreads': 38,
  'len': 6480,
  'len_dep': 2,
  'sum_map': 6492},
 'rechunk-merge-c09f381e48a86b37adc1a3ef4092bc58': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 12},
 'rechunk-split-rechunk-merge-c09f381e48a86b37adc1a3ef4092bc58': {'nthreads': 38,
  'len': 12,
  'len_dep': 1,
  'sum_map': 1},
 'array-0e6673e35ec726f7a3c9ad9b7f233d83': {'nthreads': 38,
  'len': 1,
  'len_dep': 0,
  'sum_map': 0},
 'getitem-92ccf89e869227e086bc16e5ce4456ee': {'nthreads': 38,
  'len': 6480,
  'len_dep': 1,
  'sum_map': 6480},
 'getitem-1eca1f800555c63612153ead329f7edf': {'nthreads': 38,
  'len': 540,
  'len_dep': 1,
  'sum_map': 540}}

️Can somebody help me understand what the difference between chunked and unchunked is?

@fjetter This gets really in to the weeds, but chunked means xarray.open_dataset(..., chunks={...}) which creates a dask array where each chunk wraps a xarray lazy array that gets replaced by a numpy array at compute time. unchunked means xarray.open_dataset(...) which returns xarray lazy arrays which gets replaced by a numpy array at compute time

here's a example

import xarray as xr

xr.tutorial.open_dataset("air_temperature", chunks={"time": 1}).air.variable._data  #  dask array
xr.tutorial.open_dataset("air_temperature").air.variable._data  # weird xarray lazy array thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants