Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .flow/epics/fn-24.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "fn-24",
"created_at": "2026-01-22T18:01:43.795974Z",
"depends_on_epics": [],
"id": "fn-24",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-24.md",
"status": "open",
"title": "Month 0-2: Triage + Policy Engine",
"updated_at": "2026-01-22T18:01:43.796354Z"
}
44 changes: 44 additions & 0 deletions .flow/specs/fn-24.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# fn-24: Month 0-2: Triage + Policy Engine

## Goal
Ship policy-driven triage and investigation automation for data platform teams, with per-team rules, dataset overrides, queueing, and measurable activation/usage metrics.

## Scope
- Team policy engine with dataset overrides and per-team rate limits.
- Integrations -> issues pipeline that applies policy actions (auto, review, issue-only).
- Investigation queue/batch executor with Redis rate limiting per team.
- Redis-backed SSE event storage + API rate limiting.
- Policy editor UI in Settings > Teams.
- Activation + weekly usage analytics.

## Non-Goals
- Automated fixing.
- SCIM provisioning.
- SAML (planned for Month 4-6).

## Approach
- Add policy tables and repository methods for team rules and dataset overrides.
- Implement a policy evaluator that returns an action + queue config for each issue.
- Route integration events through policy evaluation and trigger issue creation + optional investigations.
- Add Redis-backed queue + rate limiter for investigations and SSE event replay storage.
- Build a team policy editor in Settings > Teams with dataset overrides.
- Instrument issue/investigation lifecycle events and weekly usage metrics.

## Quick commands
- `just test`

## Acceptance
- [ ] Team policies with dataset overrides are persisted and can be read/written via API.
- [ ] Integration ingestion applies policy actions (auto, review, issue-only).
- [ ] Auto investigations are queued and rate-limited per team via Redis.
- [ ] SSE events and API rate limiting are no longer in-memory.
- [ ] Team policy editor UI is usable in Settings > Teams.
- [ ] Activation + weekly usage metrics are available via API or queries.

## References
- `python-packages/dataing/src/dataing/entrypoints/api/routes/integrations.py`
- `python-packages/dataing/src/dataing/entrypoints/api/routes/runs.py`
- `python-packages/dataing/src/dataing/entrypoints/api/middleware/rate_limit.py`
- `frontend/app/src/features/issues/IssueList.tsx`
- `frontend/app/src/features/issues/IssueWorkspace.tsx`
- `frontend/app/src/features/settings/teams/teams-settings.tsx`
23 changes: 23 additions & 0 deletions .flow/tasks/fn-24.1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-22T18:07:36.595638Z",
"created_at": "2026-01-22T18:02:02.585217Z",
"depends_on": [],
"epic": "fn-24",
"evidence": {
"commits": [
"b63b00483c7d664738b6efd4e4b0d5837d83930a"
],
"prs": [],
"tests": [
"uv run pytest python-packages/dataing/tests/unit/adapters/db/test_team_policy_repository.py"
]
},
"id": "fn-24.1",
"priority": null,
"spec_path": ".flow/tasks/fn-24.1.md",
"status": "done",
"title": "Data model + migrations for team policies and overrides",
"updated_at": "2026-01-22T19:08:26.558940Z"
}
29 changes: 29 additions & 0 deletions .flow/tasks/fn-24.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# fn-24.1 Data model + migrations for team policies and overrides

## Description
Add database tables and repository helpers for team policy rules, dataset overrides, and per-team queue limits. Support overrides by dataset_id and tag-based selectors.

## Acceptance
- [ ] Migrations add tables for team policies, overrides, and queue limits.
- [ ] Repository methods exist for CRUD on policies and overrides.
- [ ] Dataset/tag selectors are represented in the schema (dataset_id or tag_id).
- [ ] Basic unit tests cover create/read/update for policies.

## Done summary
- Added migration 028_team_policies.sql with 4 tables: team_policies, team_policy_overrides, team_queue_limits, dataset_tags
- Created TeamPolicyRepository with full CRUD operations for policies, overrides, queue limits, and dataset tags
- Added PolicyAction enum and dataclasses for type-safe domain entities

Why:
- Foundation for policy-driven triage and investigation automation
- Enables per-team configuration with dataset/tag-specific overrides

Verification:
- 28 unit tests passing
- ruff check passing
- mypy type check passing
- just test-ce passing (1257 tests)
## Evidence
- Commits: b63b00483c7d664738b6efd4e4b0d5837d83930a
- Tests: uv run pytest python-packages/dataing/tests/unit/adapters/db/test_team_policy_repository.py
- PRs:
25 changes: 25 additions & 0 deletions .flow/tasks/fn-24.2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-22T19:09:11.846118Z",
"created_at": "2026-01-22T18:02:12.872541Z",
"depends_on": [
"fn-24.1"
],
"epic": "fn-24",
"evidence": {
"commits": [
"5a240ac3e4f38322c857f7789c2f77edb75a629f"
],
"prs": [],
"tests": [
"uv run pytest python-packages/dataing/tests/unit/services/test_policy.py"
]
},
"id": "fn-24.2",
"priority": null,
"spec_path": ".flow/tasks/fn-24.2.md",
"status": "done",
"title": "Policy engine: evaluate team + dataset rules",
"updated_at": "2026-01-22T19:12:48.936850Z"
}
30 changes: 30 additions & 0 deletions .flow/tasks/fn-24.2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# fn-24.2 Policy engine: evaluate team + dataset rules

## Description
Implement a policy evaluation service that resolves the effective action for an issue using team rules and dataset/tag overrides. Output should include action (auto, review, issue-only) and queue/rate-limit settings.

## Acceptance
- [ ] Policy evaluator resolves precedence: dataset overrides > team default.
- [ ] Action outputs include auto/review/issue-only and queue settings.
- [ ] Evaluation is exercised via unit tests with team + dataset scenarios.
- [ ] API layer can fetch evaluated policy results for an issue.

## Done summary
- Added PolicyService with precedence-based policy evaluation
- Implemented resolution order: dataset overrides > tag overrides > team default > system defaults
- Added severity-based action resolution (auto/review thresholds)
- Added QueueConfig, IssueContext, PolicyResult dataclasses
- Added evaluate_policy_for_issue convenience function for API layer

Why:
- Central policy engine needed by integrations to determine triage actions
- Enables automatic investigation triggering based on configured rules

Verification:
- 19 unit tests passing (test_policy.py)
- ruff check passing
- mypy type check passing
## Evidence
- Commits: 5a240ac3e4f38322c857f7789c2f77edb75a629f
- Tests: uv run pytest python-packages/dataing/tests/unit/services/test_policy.py
- PRs:
23 changes: 23 additions & 0 deletions .flow/tasks/fn-24.3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-22T19:14:17.576326Z",
"created_at": "2026-01-22T18:02:31.734575Z",
"depends_on": [
"fn-24.2"
],
"epic": "fn-24",
"evidence": {
"commits": [],
"prs": [],
"tests": [
"uv run pytest python-packages/dataing/tests/unit/api/test_integrations_routes.py python-packages/dataing/tests/unit/services/test_policy.py python-packages/dataing/tests/unit/adapters/db/test_team_policy_repository.py"
]
},
"id": "fn-24.3",
"priority": null,
"spec_path": ".flow/tasks/fn-24.3.md",
"status": "done",
"title": "Integrations to issues: policy-driven actions",
"updated_at": "2026-01-22T19:21:02.616765Z"
}
36 changes: 36 additions & 0 deletions .flow/tasks/fn-24.3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# fn-24.3 Integrations to issues: policy-driven actions

## Description
Route integration events through the policy engine to create issues and trigger the correct action: auto investigation, review required, or issue-only. Ensure notifications for review-required flows.

## Acceptance
- [ ] Integration ingestion calls policy evaluation and records the action taken.
- [ ] Auto actions enqueue investigations; review actions create approval notifications.
- [ ] Issue-only path creates issue without starting investigation.
- [ ] Idempotency behavior remains intact for integration events.

## Done summary
- Integrated policy evaluation into webhook issue creation flow
- Policy determines action: auto investigation, review required, or issue-only
- AUTO: Starts Temporal investigation workflow (if configured)
- REVIEW: Sends notification via NotificationService
- ISSUE_ONLY: No additional action beyond issue creation
- Added policy_action and investigation_id fields to WebhookIssueResponse
- Added get_default_team_for_tenant helper to TeamPolicyRepository

Why:
- Integration events need policy-driven triage to determine appropriate action
- Enables automatic investigation triggering based on configured rules
- Maintains idempotency behavior for integration events

Verification:
- 21 unit tests for integration routes (6 new policy-related tests)
- 19 unit tests for PolicyService
- 28 unit tests for TeamPolicyRepository
- All 68 tests passing
- ruff check passing
- mypy type check passing
## Evidence
- Commits:
- Tests: uv run pytest python-packages/dataing/tests/unit/api/test_integrations_routes.py python-packages/dataing/tests/unit/services/test_policy.py python-packages/dataing/tests/unit/adapters/db/test_team_policy_repository.py
- PRs:
23 changes: 23 additions & 0 deletions .flow/tasks/fn-24.4.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-22T19:21:55.582374Z",
"created_at": "2026-01-22T18:02:44.208277Z",
"depends_on": [
"fn-24.2"
],
"epic": "fn-24",
"evidence": {
"commits": [],
"prs": [],
"tests": [
"uv run pytest python-packages/dataing/tests/unit/adapters/queue/ -v"
]
},
"id": "fn-24.4",
"priority": null,
"spec_path": ".flow/tasks/fn-24.4.md",
"status": "done",
"title": "Investigation queue + per-team rate limits (Redis)",
"updated_at": "2026-01-22T19:30:31.922472Z"
}
38 changes: 38 additions & 0 deletions .flow/tasks/fn-24.4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# fn-24.4 Investigation queue + per-team rate limits (Redis)

## Description
Add a Redis-backed investigation queue with per-team rate limits and batch processing. Policies should control queue thresholds and rate limits.

## Acceptance
- [ ] Redis queue exists for investigation jobs with per-team routing.
- [ ] Rate limiting is enforced per team with configurable limits.
- [ ] Worker can batch-dequeue and start Temporal workflows.
- [ ] Failures retry with backoff and do not block other teams.

## Done summary
- Added Redis-backed investigation queue with per-team routing
- Implemented sliding window rate limiter using Redis Lua script
- Created InvestigationWorker that processes jobs and starts Temporal workflows
- Jobs support priority, retry with exponential backoff, and status tracking
- Worker polls teams, respects rate limits, and processes in batches
- Failures don't block other teams (isolated per-team queues)
- Added redis>=5.0.0 dependency

Why:
- Per-team rate limiting prevents any single team from overwhelming the system
- Batch processing improves throughput for investigation workflows
- Retry with backoff handles transient failures gracefully

Components:
- InvestigationQueue: Per-team job queue with priority sorting
- RedisRateLimiter: Sliding window rate limiter per team
- InvestigationWorker: Background worker that processes queues

Verification:
- 25 unit tests for queue and rate limiter
- mypy type check passing
- ruff check passing
## Evidence
- Commits:
- Tests: uv run pytest python-packages/dataing/tests/unit/adapters/queue/ -v
- PRs:
21 changes: 21 additions & 0 deletions .flow/tasks/fn-24.5.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-22T19:32:07.904710Z",
"created_at": "2026-01-22T18:02:54.454759Z",
"depends_on": [],
"epic": "fn-24",
"evidence": {
"commits": [],
"prs": [],
"tests": [
"uv run pytest python-packages/dataing/tests/unit/adapters/sse/ python-packages/dataing/tests/unit/middleware/test_redis_rate_limit.py -v"
]
},
"id": "fn-24.5",
"priority": null,
"spec_path": ".flow/tasks/fn-24.5.md",
"status": "done",
"title": "Redis-backed SSE event store + rate limiting",
"updated_at": "2026-01-22T19:39:04.950959Z"
}
39 changes: 39 additions & 0 deletions .flow/tasks/fn-24.5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# fn-24.5 Redis-backed SSE event store + rate limiting

## Description
Replace in-memory SSE event storage and API rate limiting with Redis-backed implementations.

## Acceptance
- [ ] SSE run events are persisted in Redis and survive process restart.
- [ ] Replay window reads from Redis instead of in-memory dicts.
- [ ] API rate limiting uses Redis with per-tenant identifiers.
- [ ] Existing SSE API behavior remains backward compatible.

## Done summary
- Added Redis-backed SSE event store for run events persistence
- SSE events now survive process restart with configurable TTL
- Replay window reads from Redis instead of in-memory dicts
- Added Redis-backed API rate limiting middleware with sliding window algorithm
- Rate limiting uses per-tenant identifiers (tenant > API key > IP fallback)
- Both components fail open on Redis errors for reliability
- Included in-memory fallback store for local development

Why:
- SSE events were lost on process restart, causing client reconnection issues
- In-memory rate limiting didn't work in multi-instance deployments
- Redis provides distributed state for horizontal scaling

Components:
- RedisSSEEventStore: Store/retrieve events with automatic sequencing and TTL
- RunMetadata: Track run status and replay window expiration
- InMemoryFallbackSSEEventStore: Local development fallback
- RedisRateLimitMiddleware: Distributed rate limiting with Lua script

Verification:
- 41 unit tests for SSE event store and rate limit middleware
- mypy type check passing
- ruff check passing
## Evidence
- Commits:
- Tests: uv run pytest python-packages/dataing/tests/unit/adapters/sse/ python-packages/dataing/tests/unit/middleware/test_redis_rate_limit.py -v
- PRs:
24 changes: 24 additions & 0 deletions .flow/tasks/fn-24.6.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-22T19:40:38.864418Z",
"created_at": "2026-01-22T18:03:03.205338Z",
"depends_on": [
"fn-24.2"
],
"epic": "fn-24",
"evidence": {
"files_changed": [
"frontend/app/src/features/settings/teams/team-policy-editor.tsx",
"frontend/app/src/features/settings/teams/teams-settings.tsx"
],
"lint_pass": true,
"tests_pass": true
},
"id": "fn-24.6",
"priority": null,
"spec_path": ".flow/tasks/fn-24.6.md",
"status": "done",
"title": "Policy editor UI in Settings > Teams",
"updated_at": "2026-01-22T19:46:27.896902Z"
}
17 changes: 17 additions & 0 deletions .flow/tasks/fn-24.6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# fn-24.6 Policy editor UI in Settings > Teams

## Description
Build a policy editor under Settings > Teams for managing per-team alert sources, auto-investigate thresholds, review requirements, and dataset/tag overrides.

## Acceptance
- [ ] UI lives under Settings > Teams and loads/saves policy via API.
- [ ] Supports editing default team policy and dataset/tag overrides.
- [ ] Displays queue/rate limit settings per team.
- [ ] Error and empty states are handled.

## Done summary
Added team policy editor UI with default policy settings, dataset/tag overrides, and queue limit management. Integrated into Settings > Teams page with settings button for each team.
## Evidence
- Commits:
- Tests:
- PRs:
Loading