Illustrate streaming over decomposition approach#1
Merged
rjzamora merged 7 commits intoMay 4, 2026
Merged
Conversation
…` are columns (rapidsai#22132) This commit adds the initial support for calling `replace()` on strings columns, with the `targets` and `repls` arguments specified as strings columns. The erstwhile implementation of `replace()` supports the `targets` and `repls` being STRING scalars. This commit extends the implementation to also support entire STRING columns for the second and third arguments. This initial implementation only supports a "one-string-per-thread" approach. We might consider adding a "character-parallel" version of this function at a later date. Authors: - MithunR (https://github.com/mythrocks) - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Muhammad Haseeb (https://github.com/mhaseeb123) - Jihoon Son (https://github.com/jihoonson) URL: rapidsai#22132
This PR introduces `RayEngine._reset()`, which allows reusing existing actor processes while updating per-rank state. Constructing a `RayEngine` is expensive. Each instance spawns one Ray actor per rank (Python startup, imports, CUDA context creation) and bootstraps a UCXX communicator. In contrast: * `SPMDEngine` reuses a session-scoped communicator, so construction is cheap. * `DaskEngine` connects to long-lived workers, so construction is cheap. * `RayEngine` always creates fresh actors, there is no reuse. This is not an issue in production, where a single engine is typically reused across many queries. It is costly in tests, where we parametrize over many `StreamingOptions` variants and rebuild the engine repeatedly, paying the full startup cost each time. With this change, test suite runtime with `RayEngine` **drops from hours to minutes!** Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Peter Andreas Entschev (https://github.com/pentschev) URL: rapidsai#22348
Fixes several use-after-free potential issues where host/device memory are copied and the source may be freed before the async copy has finished. Follow on to rapidsai#22321 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Muhammad Haseeb (https://github.com/mhaseeb123) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#22332
f8db130
into
Matt711:fea/polars/streaming-over
5 of 6 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TODO