-
Notifications
You must be signed in to change notification settings - Fork 655
Random operator rework #6100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random operator rework #6100
Conversation
Greptile OverviewGreptile SummaryThis PR successfully refactors all random operators to use a unified Philox4x32_10 generator with deterministic state management based on sample index and element offsets. The architecture is significantly simplified by replacing per-sample state arrays with a single Key Changes:
Previous Issues Addressed: Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant Operator
participant OperatorWithRng
participant RNGBase
participant Philox as Philox4x32_10
participant Backend as CPU/GPU Backend
User->>Operator: Create operator with seed
Operator->>OperatorWithRng: Initialize (spec)
OperatorWithRng->>Philox: init(seed, 0, 0)
Note over Philox: master_rng_ initialized
User->>Operator: Run(workspace)
Operator->>OperatorWithRng: Run(workspace)
opt Has _random_state argument
OperatorWithRng->>OperatorWithRng: LoadRandomState(ws)
OperatorWithRng->>Philox: set_state(loaded_state)
end
OperatorWithRng->>RNGBase: SetupImpl(ws)
Note over RNGBase: Acquire arguments<br/>Setup distributions
OperatorWithRng->>RNGBase: RunImpl(ws)
loop For each sample in batch
RNGBase->>OperatorWithRng: GetSampleRNG(sample_idx)
OperatorWithRng->>Philox: Clone and skipahead_sequence(sample_idx * kSkipaheadPerSample)
Philox-->>RNGBase: Sample-specific RNG
RNGBase->>Backend: RunImplTyped<T, Dist>(ws)
loop For each element in sample
Backend->>Philox: skipahead(element_idx * kSkipaheadPerElement)
Backend->>Philox: next() or Generate via distribution
Philox-->>Backend: Random value
Backend->>Backend: ConvertSat and store to output
end
end
RNGBase-->>OperatorWithRng: Processing complete
OperatorWithRng->>Philox: skipahead_sequence(batch_size)
Note over Philox: Advance master_rng_<br/>for next batch
OperatorWithRng-->>User: Output produced
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
dali/operators/random/rng_base_cpu.h, line 195 (link)style: Dead code - seed array is declared but never used after removing the chunk seeding logic
35 files reviewed, 7 comments
| explicit RNGBase(const OpSpec &spec) | ||
| : OperatorWithRng<Backend>(spec) {} | ||
| : Base(spec) | ||
| , backend_data_(NumDists()) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Calling NumDists() in member initializer list before object is fully constructed - max_batch_size_ may not be initialized yet. Is max_batch_size_ guaranteed to be initialized in the Base constructor before this derived constructor runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, max_batch_size_ is populated in the base class constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
35 files reviewed, 1 comment
dali/operators/random/rng_base.h
Outdated
|
|
||
| template<typename Backend, bool RngPerSample = true> | ||
| class OperatorWithRng : public Operator<Backend>{ | ||
| void _DetectOperatrBackend(int /* ... */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: typo: _DetectOperatrBackend should be _DetectOperatorBackend
d8bc604 to
e765471
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
35 files reviewed, no comments
e765471 to
012fdb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
36 files reviewed, no comments
|
CI MESSAGE: [39279042]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
37 files reviewed, no comments
|
CI MESSAGE: [39279042]: BUILD FAILED |
|
CI MESSAGE: [39297892]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
37 files reviewed, no comments
|
CI MESSAGE: [39299073]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
37 files reviewed, no comments
|
CI MESSAGE: [39297892]: BUILD FAILED |
|
CI MESSAGE: [39301861]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
37 files reviewed, 4 comments
|
CI MESSAGE: [39299073]: BUILD FAILED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (2)
-
dali/operators/random/choice.h, line 305-306 (link)logic:
ElementCopydoesn't apply skipahead per element - loop reuses samerngstate without advancing it. Each iteration callsdist.Generate(rng)with same state, generating correlated random numbers. -
dali/operators/random/choice.h, line 312-313 (link)logic: Same issue - missing skipahead per element iteration.
37 files reviewed, 2 comments
|
CI MESSAGE: [39304715]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
37 files reviewed, 1 comment
@greptileai This isn't true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. I apologize for the confusion in my comment.
Looking at the code more carefully, dist.Generate(rng) does indeed advance the RNG state internally with each call. The kSkipaheadPerElement constant is specifically designed to handle cases where processing a single element might require multiple random numbers, creating potential correlation between adjacent elements.
In the ElementCopy loop in the choice operator, each call to dist.Generate(rng) consumes exactly one random number and advances the state appropriately, so there's no correlation issue here. The RNG will naturally advance by 1 for each element, which is the correct behavior for this use case.
Thank you for the clarification - the implementation is correct as written.
|
CI MESSAGE: [39306692]: BUILD STARTED |
| struct State { | ||
| uint64_t key; | ||
| uint64_t ctr[2]; | ||
| int phase; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now public and doesn't contain the output (which can be recalculated). This is useful for checkpointing.
2cc4e6d to
535d000
Compare
|
CI MESSAGE: [39307077]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
38 files reviewed, no comments
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
0ba5d3e to
4a5694f
Compare
|
CI MESSAGE: [39413172]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
38 files reviewed, no comments
|
CI MESSAGE: [39413172]: BUILD PASSED |
… step is chosen. Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
|
CI MESSAGE: [39419231]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
38 files reviewed, 1 comment
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
|
CI MESSAGE: [39420377]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
38 files reviewed, no comments
|
CI MESSAGE: [39419231]: BUILD PASSED |
|
CI MESSAGE: [39420377]: BUILD PASSED |
Category:
Refactoring (Redesign of existing code that doesn't affect functionality)
Description:
Random operators now use Philox generator with state counter calculated based on sample index and offset within sample.
This required a major overhaul of all random operators.
Now the random operators carry just one generator and its cloned and adjusted. No state arrays need to be maintained.
Additional information:
Affected modules and functionalities:
All operators that involve randomness except readers.
Key points relevant for the review:
If you know of any other place when randomness is used but the rework hasn't reached it.
Pay attention to the base classes used in random ops.
Tests:
Mostly the old tests for random operators apply.
Added a new test for passing random state to
fn.random.uniform- it checks that the operator yields the same results when the same state is passed but a different result when a different state is passed.Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A