refactor: extract distributed attempt ownership#2183
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR extracts distributed-execution bookkeeping into a new attemptOwnership helper and rewires the coordinator to use it for status validation, lease/active-run synchronization, task-claim recording, and tracking cleanup. ChangesDistributed Attempt Ownership Encapsulation
Sequence Diagram(s)sequenceDiagram
participant Handler
participant attemptOwnership
participant LeaseStore
participant ActiveRunStore
Handler->>attemptOwnership: statusDecision(incomingStatus, latestStatus)
attemptOwnership->>LeaseStore: Get lease / validate freshness
LeaseStore-->>attemptOwnership: lease (or not found)
attemptOwnership->>LeaseStore: Upsert or Delete lease
attemptOwnership->>ActiveRunStore: Upsert or Delete active-run
ActiveRunStore-->>attemptOwnership: ack
attemptOwnership-->>Handler: syncFromStatus result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/service/coordinator/distributed_attempt_ownership_test.go (1)
134-141: ⚡ Quick winConsider verifying the active record's
UpdatedAttimestamp.The test verifies the active record's DAGRun, Root, AttemptID, WorkerID, and Status but omits the
UpdatedAtfield.TestDistributedAttemptOwnershipTaskClaimTrackingexplicitly checksUpdatedAtat line 198, suggesting this timestamp is important to verify for consistency.📋 Suggested addition
record, err := activeStore.Get(ctx, "attempt-key-1") require.NoError(t, err) assert.Equal(t, run, record.DAGRun) assert.Equal(t, run, record.Root) assert.Equal(t, "attempt-1", record.AttemptID) assert.Equal(t, "worker-1", record.WorkerID) assert.Equal(t, core.Running, record.Status) +assert.Equal(t, now.UnixMilli(), record.UpdatedAt)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/service/coordinator/distributed_attempt_ownership_test.go` around lines 134 - 141, The test block reading the active record via activeStore.Get does not assert the UpdatedAt timestamp; add an assertion on record.UpdatedAt (e.g., assert.False(t, record.UpdatedAt.IsZero()) or assert.Equal(t, expectedUpdatedAt, record.UpdatedAt) depending on the existing test pattern) so the UpdatedAt field is validated alongside DAGRun, Root, AttemptID, WorkerID and Status; reference the activeStore.Get call and the record variable in distributed_attempt_ownership_test.go and mirror how UpdatedAt is asserted in TestDistributedAttemptOwnershipTaskClaimTracking.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/service/coordinator/distributed_attempt_ownership.go`:
- Around line 120-124: The switch in distributed_attempt_ownership.go is
incorrectly treating core.Queued as a terminal status and removing ownership;
update the handling so core.Queued is treated as a live attempt like
core.Running and core.NotStarted by calling upsertLeaseFromStatus(ctx, workerID,
status, fallbackAttemptID) instead of dropping the lease/active-run record, and
make the same change in the other switch (around the block that mirrored lines
232-236). Ensure this aligns with the other helpers that consider Queued live
(restoreConfirmedFromStatus, indexedRunMatchesStatus, lease reconciliation) so
queued updates do not delete the durable lease before a worker reaches Running.
In `@internal/service/coordinator/handler.go`:
- Around line 927-929: In AckTaskClaim, avoid recording an empty worker ID: when
calling h.distributedAttempts().recordTaskClaim(ctx, claimed.Task, req.WorkerId)
ensure you use a non-empty worker ID by falling back to claimed.WorkerID if
req.WorkerId is empty (e.g., compute workerID := req.WorkerId; if workerID == ""
{ workerID = claimed.WorkerID } and pass workerID), or alternatively reject the
ACK when req.WorkerId is empty; update the call site that invokes
recordTaskClaim and keep references to claimed.Task, req.WorkerId and
claimed.WorkerID consistent so upsertActiveFromTask receives the correct owner.
---
Nitpick comments:
In `@internal/service/coordinator/distributed_attempt_ownership_test.go`:
- Around line 134-141: The test block reading the active record via
activeStore.Get does not assert the UpdatedAt timestamp; add an assertion on
record.UpdatedAt (e.g., assert.False(t, record.UpdatedAt.IsZero()) or
assert.Equal(t, expectedUpdatedAt, record.UpdatedAt) depending on the existing
test pattern) so the UpdatedAt field is validated alongside DAGRun, Root,
AttemptID, WorkerID and Status; reference the activeStore.Get call and the
record variable in distributed_attempt_ownership_test.go and mirror how
UpdatedAt is asserted in TestDistributedAttemptOwnershipTaskClaimTracking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 22d0ae54-4b8f-4132-b55a-266d18b4bacf
📒 Files selected for processing (3)
internal/service/coordinator/distributed_attempt_ownership.gointernal/service/coordinator/distributed_attempt_ownership_test.gointernal/service/coordinator/handler.go
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
409d4ed to
4a435e2
Compare
Summary
Refactor distributed attempt ownership logic into a focused coordinator module while preserving distributed run behavior.
Changes
distributedAttemptOwnership.httptest.Servercleanup cannot close another test client connection.Related Issues
N/A
Checklist
Testing
Summary by CodeRabbit
Release Notes
Refactor
Tests