refactor: v0.9.2–0.9.3 restructuring #68
Merged
dwsmith1983 merged 17 commits intomainfrom Mar 14, 2026
Merged
Conversation
EvaluateRules now uses strings.ToUpper(mode) so "any", "Any", and "ANY" all route to the ANY branch. Previously lowercase variants fell through to the default ALL case.
- lambda_trigger_arns default [] with precondition (SEC-1) - Slack plaintext token deprecation warning via check block (SEC-2) - New variables for trigger IAM scoping: glue_job_arns, emr_cluster_arns, emr_serverless_app_arns, sfn_trigger_arns — all default [] (SEC-4) - EventBridge bus policy restricts PutEvents to Lambda roles (SEC-5)
…s (BUG-5, CQ-5) BUG-5: handleSLACancel now checks for trigger existence before publishing SLA verdict. Pipelines that were never triggered no longer emit false SLA_MET events. CQ-5: Replaced _ = publishEvent(...) with error-logged calls in SLA reconcile path.
closeSensorTriggerWindow now reads timezone from cfg.Schedule.Timezone (the schedule's own timezone) instead of cfg.SLA.Timezone. Falls back to SLA timezone if schedule timezone is not set. Prevents incorrect deadline calculation when schedule and SLA use different timezones.
…RY-1) New ExtractFloatOk distinguishes absent keys from actual zero values. New DetectDrift consolidates 3 identical drift comparison sites into one shared function. Transitions like 5000→0 now correctly trigger drift detection instead of being silently skipped.
…poch normalization (BUG-2,3,4,9,10) BUG-4: HandleStreamEvent returns DynamoDBEventResponse with failed record EventIDs for Lambda partial batch retry. BUG-10: Namespace postrun baseline by rule key to prevent field collision between rules with the same field name. Clean break — existing flat baselines self-heal on next pipeline completion. BUG-2: RemapPerPeriodSensors collects additions in staging map to avoid nondeterministic map mutation during range iteration. BUG-3: Reorder handleRerunRequest to lock-before-write, preventing orphaned rerun records when lock reset fails. BUG-9: Normalize updatedAt epoch timestamps < 1e12 to milliseconds for consistent rerun freshness comparison.
resolveHTTPClient(timeoutSec) replaces identical 7-line blocks in ExecuteHTTP and ExecuteAirflow.
createSLASchedules() replaces duplicated warning/breach schedule creation loops in scheduleSLAAlerts (watchdog) and handleSLASchedule (sla-monitor). onConflictSkip parameter handles the differing error behavior between the two callers.
Eliminates shell interpretation entirely. No pipes, redirects, or variable expansion. strings.Fields splits the command into argv. Prevents command injection via crafted pipeline configs.
Pure refactor, no logic changes. Functions grouped by domain: - watchdog.go: HandleWatchdog entry point only (34 lines) - watchdog_stale.go: stale trigger detection + reconciliation - watchdog_missed.go: missed schedule detection (cron + inclusion) - watchdog_sla.go: SLA alerting + trigger deadlines - watchdog_postrun.go: post-run sensor monitoring + relative SLA
Add 4 new EventDetailType constants for dry-run rerun/retry observability: DRY_RUN_WOULD_RERUN, DRY_RUN_RERUN_REJECTED, DRY_RUN_WOULD_RETRY, DRY_RUN_RETRY_EXHAUSTED.
Replace the 5-line early returns in handleRerunRequest and handleJobFailure with self-contained evaluation blocks that run all checks (calendar exclusion, rerun limits, circuit breaker, retry budget) and publish observation events instead of executing side effects. New functions: handleDryRunRerunRequest, handleDryRunJobFailure. Tests: 2 updated + 6 new covering all decision branches (would-rerun, calendar-rejected, limit-exceeded, circuit-breaker-reject, no-job- history, would-retry, retry-exhausted, calendar-excluded).
Add DRY_RUN_WOULD_RERUN, DRY_RUN_RERUN_REJECTED, DRY_RUN_WOULD_RETRY, and DRY_RUN_RETRY_EXHAUSTED to the EventBridge alert-events pattern.
publishEvent now checks FailedEntryCount on PutEventsOutput. AWS EventBridge can return FailedEntryCount > 0 with error == nil for partial failures — these were previously silently discarded.
Custom http.Transport with dial-time IP validation rejects connections to private, loopback, link-local, and multicast addresses. Catches all bypass vectors including DNS rebinding and HTTP redirects. Protects HTTP, Airflow, and Databricks triggers against targeting internal endpoints (AWS IMDS, ECS metadata, VPC services).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary