Skip to content

feat(api): implement production unit management (Order/Batch/Task/Episode)#23

Merged
shark0F0497 merged 3 commits intomainfrom
feat/production-management
Mar 31, 2026
Merged

feat(api): implement production unit management (Order/Batch/Task/Episode)#23
shark0F0497 merged 3 commits intomainfrom
feat/production-management

Conversation

@shark0F0497
Copy link
Copy Markdown
Collaborator

Pull Request Checklist

Please ensure your PR meets the following requirements:

  • Code follows the style guidelines
  • Tests pass locally
  • Code is formatted
  • Documentation updated if needed
  • Commit messages follow conventional commits
  • PR description is complete and clear

Summary

This PR implements a complete production unit management system for Keystone Edge, adding Order, Batch, Task, and Episode CRUD operations with idempotency guarantees, transactional consistency, and referential integrity controls.


Motivation

The production workflow requires managing the full lifecycle of data collection tasks:

  • Orders represent production requests with target quantities and priorities
  • Batches group tasks by order and workstation
  • Tasks are atomic execution units bound to SOPs, scenes, and workstations
  • Episodes record capture artifacts (MCAP + sidecar files)

This change establishes the Edge-side closed-loop for task generation, recording, upload, and episode persistence, enabling downstream QA, sync, and search capabilities.


Changes

Modified Files

Added Files

  • docs/designs/production-units.md - Design document defining domain relationships, data model semantics, HTTP API contracts, state machines, and key flows for Order→Batch→Task→Episode lineage
  • internal/api/handlers/batch.go - BatchHandler with ListBatches, GetBatch, DeleteBatch (soft delete), PatchBatch (status transitions with state machine validation)
  • internal/api/handlers/order.go - OrderHandler with ListOrders, GetOrder, CreateOrder, UpdateOrder, DeleteOrder (with referential integrity checks for batches/tasks/episodes)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update (documentation changes only)
  • Refactoring (code improvement without functional changes)
  • Performance improvement (code changes that improve performance)
  • Test changes (adding, modifying, or removing tests)

Impact Analysis

Breaking Changes

None

Backward Compatibility

  • GET /tasks/:id now uses numeric task PK (tasks.id) instead of public task_id string (tasks.task_id). Existing clients using numeric IDs will continue to work.
  • GET /episodes now returns task_id as int64 instead of string. Clients expecting string type will need to update.
  • Order name uniqueness constraint changed from (organization_id, scene_id, name) to (organization_id, name). Existing orders with duplicate names across scenes may be affected.

Testing

Test Environment

Test Cases

  • Unit tests pass locally
  • Integration tests pass locally
  • E2E tests pass (if applicable)
  • Manual testing completed

Manual Testing Steps

  1. Create an order with POST /orders
  2. Create tasks with POST /tasks (verifies batch creation/reuse, target_count enforcement)
  3. Query tasks with GET /tasks/:id and GET /tasks/:id/config
  4. Trigger upload callback and verify episode creation is idempotent
  5. Verify task status transitions to completed after Transfer Verified ACK
  6. Attempt to delete scene/order with existing references (should fail)

Test Coverage

  • New tests added
  • Existing tests updated
  • Coverage maintained or improved

Screenshots / Recordings


Performance Impact

  • Memory usage: No change
  • CPU usage: No change
  • Throughput: No change
  • Lock contention: Reduced (row-level locking with FOR UPDATE prevents race conditions in batch/task creation)

Documentation


Related Issues

  • Fixes #
  • Related to #
  • Refers to #

Additional Notes

Key design decisions:

  1. Batch reuse logic: POST /tasks reuses existing active/pending batches for the same (order_id, workstation_id) pair to avoid fragmenting batches
  2. Idempotency: Episode creation checks for existing episodes by task_id before inserting, preventing duplicates on device retry
  3. Episode-count maintenance: batches.episode_count is incremented atomically within the same transaction as episode creation
  4. Task completion source of truth: Task status is set to completed by Transfer Verified ACK, not by device callbacks

Reviewers

@kilo-code-bot


Notes for Reviewers

  • Please review the concurrency safety of batch/task creation with FOR UPDATE locking
  • Verify the idempotency logic in transfer.go handles all retry scenarios
  • Check the referential integrity deletions in scene.go and order.go

Checklist for Reviewers

  • Code changes are correct and well-implemented
  • Tests are adequate and pass
  • Documentation is updated and accurate
  • No unintended side effects
  • Performance impact is acceptable
  • Backward compatibility maintained (if applicable)

c.JSON(http.StatusBadRequest, gin.H{"error_msg": "invalid task id"})
return
}
if idStr == "" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Dead code — idStr == "" is unreachable here

strconv.ParseInt("", 10, 64) always returns an error, so the if err != nil check on line 358 already catches empty strings and returns early. This block can never be reached and should be removed.

Suggested change
if idStr == "" {
}
var req UpdateTaskRequest

FROM episodes
WHERE task_id = ? AND deleted_at IS NULL
LIMIT 1
`, taskRow.ID).Scan(&existingEpisodeID); err != nil || existingEpisodeID == "" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Idempotency check treats all DB errors as "episode does not exist"

The condition err != nil || existingEpisodeID == "" proceeds to insert a new episode for any error — including real database failures (connection drops, timeouts), not just sql.ErrNoRows. This could lead to duplicate episode creation under transient DB errors. Consider checking specifically for sql.ErrNoRows:

Suggested change
`, taskRow.ID).Scan(&existingEpisodeID); err != nil || existingEpisodeID == "" {
`, taskRow.ID).Scan(&existingEpisodeID); err != nil && err != sql.ErrNoRows {
logger.Printf("[TRANSFER] Device %s: DB query failed for task=%s: %v", dc.DeviceID, taskID, err)
return
}
if err == sql.ErrNoRows || existingEpisodeID == "" {

status = 'completed',
completed_at = CASE WHEN completed_at IS NULL THEN ? ELSE completed_at END,
updated_at = ?
WHERE id = ? AND deleted_at IS NULL
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Task status unconditionally overwritten to completed

The WHERE id = ? AND deleted_at IS NULL clause does not guard on the current status. If a task was manually set to cancelled or failed, a retried upload callback would silently overwrite it back to completed. Consider adding a status guard:

Suggested change
WHERE id = ? AND deleted_at IS NULL
WHERE id = ? AND status NOT IN ('cancelled', 'failed') AND deleted_at IS NULL

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State machine management will be implemented uniformly later.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot bot commented Mar 31, 2026

Code Review Summary (Incremental — commit b0c3590)

Status: 1 Issue Remaining | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0

Incremental Changes

Two previous issues were fixed in this commit:

File Previous Issue Status
internal/api/handlers/task.go Dead code: idStr == "" unreachable after ParseInt ✅ Fixed (moved before ParseInt)
internal/api/handlers/transfer.go Idempotency check treats all DB errors as "episode does not exist" ✅ Fixed (now uses errors.Is(err, sql.ErrNoRows))
Remaining Issues (click to expand)

WARNING

File Line Issue
internal/api/handlers/transfer.go 449 Task status unconditionally overwritten to completed — cancelled/failed tasks can be overridden by upload retries
Files Reviewed (2 files changed in this increment)
  • internal/api/handlers/task.go - 0 new issues (1 previous issue fixed)
  • internal/api/handlers/transfer.go - 0 new issues (1 previous issue fixed, 1 carried forward)

Fix these issues in Kilo Cloud

@shark0F0497 shark0F0497 merged commit aaa0337 into main Mar 31, 2026
6 checks passed
@shark0F0497 shark0F0497 deleted the feat/production-management branch March 31, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant