Skip to content

Ray Pool Executor#1415

Merged
lbliii merged 4 commits intoNVIDIA-NeMo:mainfrom
lbliii:llane/ray-actor-pool-executor
Jan 26, 2026
Merged

Ray Pool Executor#1415
lbliii merged 4 commits intoNVIDIA-NeMo:mainfrom
lbliii:llane/ray-actor-pool-executor

Conversation

@lbliii
Copy link
Contributor

@lbliii lbliii commented Jan 22, 2026

No description provided.

Signed-off-by: Lawrence Lane <llane@nvidia.com>
@lbliii lbliii self-assigned this Jan 22, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 22, 2026

Greptile Overview

Greptile Summary

This PR updates the executor documentation to remove maturity labels and reorganize content, promoting RayActorPoolExecutor to a more prominent position. The changes improve the overall presentation by:

  • Removing "Production Ready" and "Experimental" labels from the architecture diagram in release notes
  • Reorganizing the execution backends guide to present RayActorPoolExecutor before RayDataExecutor
  • Streamlining descriptions to focus on use cases rather than maturity status
  • Removing the comparison table that highlighted experimental status

However, the documentation has an inconsistency: while the changes suggest RayActorPoolExecutor is being promoted from experimental status, the module still lives in nemo_curator.backends.experimental.ray_actor_pool throughout the codebase. Additionally, unlike the other executors, RayActorPoolExecutor lacks an import code example, which could confuse users about the correct import path.

Confidence Score: 3/5

  • Documentation-only changes with potential usability issues but no runtime impact
  • The changes are safe from a technical perspective (documentation only, no code changes), but the missing import example for RayActorPoolExecutor and the disconnect between the documentation's presentation (suggesting stable/promoted status) and the actual code structure (still in experimental namespace) create user confusion and inconsistency concerns that should be addressed before merge
  • docs/reference/infrastructure/execution-backends.md requires attention to add the missing import example and clarify the experimental status of RayActorPoolExecutor

Important Files Changed

Filename Overview
docs/about/release-notes/index.md Removed maturity status labels from backend executors diagram - cleaner visual presentation
docs/reference/infrastructure/execution-backends.md Reorganized executor documentation with improved descriptions; missing import example for RayActorPoolExecutor raises concerns about usability

Sequence Diagram

sequenceDiagram
    participant User
    participant Docs as Documentation
    participant ReleaseNotes as Release Notes
    participant ExecutionBackends as Execution Backends Guide
    
    User->>Docs: Read about executors
    Docs->>ExecutionBackends: Navigate to execution backends
    
    Note over ExecutionBackends: XennaExecutor section<br/>Clear import example provided
    ExecutionBackends-->>User: Shows XennaExecutor usage
    
    Note over ExecutionBackends: RayActorPoolExecutor section<br/>No import example shown
    ExecutionBackends-->>User: Shows conceptual usage only
    
    User->>ExecutionBackends: Try to use RayActorPoolExecutor
    Note over User: Confusion: Where to import from?<br/>nemo_curator.backends.ray_actor_pool?<br/>nemo_curator.backends.experimental.ray_actor_pool?
    
    User->>ReleaseNotes: Check architecture diagram
    Note over ReleaseNotes: Diagram shows all executors<br/>without maturity labels
    ReleaseNotes-->>User: No clarity on experimental status
    
    Note over User: Import fails at runtime<br/>Needs to check source code
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@lbliii lbliii requested a review from sarahyurick January 23, 2026 22:27
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +111 to +120
### `RayActorPoolExecutor`

`RayDataExecutor` uses Ray Data, a scalable data processing library built on Ray Core. Ray Data provides a familiar DataFrame-like API for distributed data transformations. This executor is experimental and best suited for large-scale batch processing tasks that benefit from Ray Data's optimized data loading and transformation pipelines.
`RayActorPoolExecutor` uses Ray's ActorPool for efficient distributed processing with fine-grained resource management. This executor creates pools of Ray actors per stage, enabling better load balancing and fault tolerance through Ray's native mechanisms. Deduplication workflows automatically use this executor for GPU-accelerated stages.

**Key Features**:
- **Ray Data API**: Leverages Ray Data's optimized data processing primitives
- **Scalable transformations**: Efficient map-batch operations across distributed workers
- **Experimental status**: API and performance characteristics may change
- **ActorPool-based execution**: Creates dedicated actor pools per stage for optimal resource utilization
- **Load balancing**: Uses `map_unordered` for efficient work distribution across actors
- **RAFT support**: Native integration with [RAFT](https://github.com/rapidsai/raft) (RAPIDS Analytics Framework Toolbox) for GPU-accelerated clustering and nearest-neighbor operations
- **Head node exclusion**: Optional `ignore_head_node` parameter to reserve the Ray cluster's [head node](https://docs.ray.io/en/latest/cluster/key-concepts.html#head-node) for coordination tasks only

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing import example for RayActorPoolExecutor

Unlike XennaExecutor and RayDataExecutor, there's no code example showing how to import RayActorPoolExecutor. Based on the codebase, the correct import is:

from nemo_curator.backends.experimental.ray_actor_pool import RayActorPoolExecutor

Consider adding an import example here for consistency and to help users.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@lbliii lbliii merged commit 3974061 into NVIDIA-NeMo:main Jan 26, 2026
17 checks passed
@sarahyurick sarahyurick mentioned this pull request Feb 11, 2026
44 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants