Skip to content

AsyncBuffer#88

Merged
fjetter merged 4 commits into
dask:mainfrom
crusaderky:async_buffer2
Apr 6, 2023
Merged

AsyncBuffer#88
fjetter merged 4 commits into
dask:mainfrom
crusaderky:async_buffer2

Conversation

@crusaderky
Copy link
Copy Markdown
Collaborator

@crusaderky crusaderky commented Mar 21, 2023

@crusaderky crusaderky self-assigned this Mar 21, 2023
@crusaderky crusaderky marked this pull request as draft March 21, 2023 01:09
@crusaderky crusaderky force-pushed the async_buffer2 branch 27 times, most recently from bd957ed to 2bcd214 Compare March 27, 2023 23:02
@crusaderky crusaderky changed the title [WIP] AsyncBuffer AsyncBuffer Apr 4, 2023
@crusaderky crusaderky marked this pull request as ready for review April 4, 2023 12:42
Copy link
Copy Markdown
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits. I think the one thing I care about is the add_done_callback question. Most other things are rather an API design question but that concerns functionality that might cause surprising behavior

Comment thread zict/async_buffer.py Outdated
Comment on lines +144 to +157
if self.evicting is not None:
weight = min(weight, self.evicting)
if weight <= n:
return

self.evicting = n

def done(_: asyncio.Future) -> None:
self.evicting = None

# Note: this can get cancelled by LRU.close(), which in turn is
# triggered by Buffer.close()
future = self._offload(self.evict_until_below_target, n)
future.add_done_callback(done)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is possibly called multiple times (even if it is locked) but the done callbacks of concurrently running evict_untiL_below_target would overwrite / first one would already reset evicting. Is this intentional?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, fixed

Comment thread zict/async_buffer.py
Comment on lines +48 to +51
if self.executor is None:
self.executor = ThreadPoolExecutor(
1, thread_name_prefix="zict.Buffer offloader"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: From an API POV I would suggest to allow the init to take an TPE and if none is provided, we'll initialize it during initialization of the class

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread zict/async_buffer.py
Comment on lines +64 to +66
def async_get(
self, keys: Collection[KT], missing: Literal["raise", "omit"] = "raise"
) -> asyncio.Future[dict[KT, VT]]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly curious, I don't think we should change this now but why is the API accepting multiple keys? is this somehow a requirement or an optimization?

The reason why I'm asking is because I think this drives some complexity and I would expect missing not to be required otherwise. I'm not entirely sure about this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an optimization to avoid having an extremely long queue to the thread pool, with each individual key incurring in thread synchronization overhead.

Comment thread zict/async_buffer.py
Comment on lines +101 to +103
# Do not pull keys towards the top of the LRU unless they are all available.
# This matters when there is a very long queue of async_get futures.
d = self.fast.get_all_or_nothing(keys)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, fast doesn't need to be a LRU, does it? In this case, I suspect we'll get an AttributeError here, won't we?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer.fast is always an LRU:

zict/zict/buffer.py

Lines 79 to 85 in b96afc4

self.fast = LRU(
n,
fast,
weight=weight,
on_evict=[self.fast_to_slow],
on_cancel_evict=[self._cancel_evict],
)

@crusaderky
Copy link
Copy Markdown
Collaborator Author

@fjetter all review comments have been addressed

@fjetter fjetter merged commit 78b4b7a into dask:main Apr 6, 2023
crusaderky added a commit that referenced this pull request Apr 6, 2023
@crusaderky
Copy link
Copy Markdown
Collaborator Author

Ouch. That was a merge without squash. I force-pushed to fix it; could we force squashing in the repo like we already did in distributed?

@crusaderky crusaderky deleted the async_buffer2 branch April 6, 2023 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants