Skip to content

feat: expose iac.provider module type via SDK ModuleProvider + ServiceInvoker#1

Merged
intel352 merged 1 commit intomainfrom
feat/iac-provider-module-exposure
Apr 21, 2026
Merged

feat: expose iac.provider module type via SDK ModuleProvider + ServiceInvoker#1
intel352 merged 1 commit intomainfrom
feat/iac-provider-module-exposure

Conversation

@intel352
Copy link
Copy Markdown
Contributor

Summary

Closes the gap documented in workflow PR #428: wfctl ci run --phase deploy with a digitalocean provider now works end-to-end (once the companion workflow PR below is merged).

Changes:

  • doPlugin implements sdk.ModuleProvider: ModuleTypes() returns ["iac.provider"]; CreateModule("iac.provider", ...) initialises a DOProvider and wraps it in doModuleInstance
  • New doModuleInstance implements sdk.ModuleInstance (lifecycle no-ops) and sdk.ServiceInvoker with dispatch for:
    • IaCProvider.Initialize — no-op (already done in CreateModule)
    • IaCProvider.Name, .Version, .Capabilities
    • ResourceDriver.Update — decodes resource_type, ref_*, spec_* args; calls the matching driver's Update
    • ResourceDriver.HealthCheck — calls driver's HealthCheck, returns {"healthy": bool, "message": string}
    • ResourceDriver.Create/Read/Delete/Scale — return actionable error pointing at wfctl infra apply
  • Version bumped to 0.2.0 in Manifest(), DOProvider.Version(), and plugin.json
  • plugin.json capabilities.moduleTypes updated to ["iac.provider"]

Companion PR (must merge together): workflow feat/remote-iac-provider-adapter — adds remoteIaCProvider adapter that routes interfaces.IaCProvider calls through InvokeService on the RemoteModule.

Test plan

  • TestDoPlugin_ModuleTypes — returns ["iac.provider"]
  • TestDoPlugin_CreateModule_UnknownType — errors on unknown type
  • TestDoPlugin_CreateModule_MissingToken — Initialize error propagated
  • TestDoPlugin_CreateModule_ValidConfig — returns non-nil ModuleInstance implementing ServiceInvoker
  • TestDoModuleInstance_Lifecycle — Init/Start/Stop all no-ops
  • TestDoModuleInstance_InvokeMethod_Name / Version / Capabilities / Initialize_NoOp
  • TestDoModuleInstance_InvokeMethod_Update_DispatchesToDriver — calls stub driver; verifies updateCalled=true
  • TestDoModuleInstance_InvokeMethod_HealthCheck_DispatchesToDriver / _Unhealthy
  • TestDoModuleInstance_InvokeMethod_Unknown — returns error
  • TestDoModuleInstance_InvokeMethod_Update_MissingResourceType — returns error
  • Full ./... suite passes; no real DO API calls in tests

🤖 Generated with Claude Code

…eInvoker

- doPlugin now implements sdk.ModuleProvider:
  - ModuleTypes() returns ["iac.provider"]
  - CreateModule("iac.provider", ...) creates and initialises a DOProvider,
    wraps it in doModuleInstance
- doModuleInstance implements sdk.ModuleInstance (Init/Start/Stop) and
  sdk.ServiceInvoker with dispatch for:
  - IaCProvider.Initialize (no-op — already done in CreateModule)
  - IaCProvider.Name, .Version, .Capabilities
  - ResourceDriver.Update — decodes ref/spec args, calls driver.Update
  - ResourceDriver.HealthCheck — calls driver.HealthCheck, returns healthy/message
  - ResourceDriver.Create/Read/Delete/Scale — return actionable error (use wfctl infra apply)
- Version bumped to 0.2.0 in Manifest, DOProvider.Version, and plugin.json
- plugin.json capabilities.moduleTypes updated to ["iac.provider"]
- Tests: ModuleTypes, CreateModule unknown type/missing token/valid config,
  ModuleInstance lifecycle, InvokeMethod dispatch for all supported methods
  using stub drivers (no real DO API calls)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@intel352 intel352 merged commit 4fa455f into main Apr 21, 2026
2 of 3 checks passed
intel352 added a commit that referenced this pull request May 3, 2026
…inding #1)

strSliceFromConfig silently drops non-string and empty entries, which is
dangerous for volume attachments — a typo or wrong type leaves the Droplet
running without an expected disk and emits no diagnostic. Add
dropletVolumesFromConfig that returns an explicit error for any non-string
or empty entry, mirroring dropletSSHKeysFromConfig's error-message style.

Tests cover both shapes: []any{123, "data"} (type mismatch) and []any{""}
(empty string).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 3, 2026
…finding #1)

dropletOutput previously stored godo's raw VolumeIDs (UUIDs) in
Outputs["volumes"], while desired config carries volume *names*. After
every successful Create/Read, the next Diff would compare e.g.
["vol-uuid-1"] against ["pg-data"] and force-replace the Droplet —
catastrophic for stateful workloads (the PG data Droplet would be
destroyed and recreated on every deploy, losing all game state).

Resolve each VolumeID to its name via Storage.GetVolume in dropletOutput.
Cache lookups within a single Read so duplicate VolumeIDs only hit the
API once. On resolution failure (e.g. volume deleted out-of-band) fall
back to the raw ID and record the unresolved IDs in
Outputs["volumes_resolution"] so operators can debug.

Update Diff comment to reflect that comparison is now name-vs-name and
stable across plans.

Tests:
- Read with VolumeIDs returns names in Outputs
- Diff with matching names returns no change (regression guard)
- Diff with desired missing a name returns ForceNew
- Resolution failure falls back to ID + records unresolved_ids
- Duplicate IDs only call GetVolume once (cache hit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 3, 2026
…+ new infra.volume driver (#55)

* feat(drivers): add boolFromConfig and strSliceFromConfig helpers

Both helpers are structpb-aware: boolFromConfig falls through to the
default for non-bool values, and strSliceFromConfig accepts the typed
[]string shape Go-native callers emit AND the []any shape that survives
a YAML/JSON → structpb round-trip. Empty strings drop silently. Used by
the upcoming droplet/volume drivers; placed in util.go alongside the
existing strFromConfig/intFromConfig helpers so all drivers can reach
them without import cycles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(droplet): support user_data / vpc_uuid / ssh_keys / volumes / tags / bools

Self-hosted services (e.g. Apache-AGE Postgres on a Droplet, since DO
Managed Postgres has no AGE extension) need more than the size/image/
region the DropletDriver previously honoured. New optional config keys,
all additive and defaulted-empty (no behaviour change for existing
callers):

  - user_data        string         — cloud-init payload
  - vpc_uuid         string         — VPC the Droplet joins
  - ssh_keys         []string|[]int — fingerprints OR numeric IDs;
                                      element type detected at runtime
                                      (mixed lists OK; structpb floats
                                      must be whole numbers)
  - tags             []string
  - enable_backups   bool           — godo Backups field
  - monitoring       bool
  - ipv6             bool
  - volumes          []string of    — names resolved to IDs at create
                     volume names     time via Storage.ListVolumes(name=,
                                      region=). Region-bound: a name not
                                      present in the droplet's region
                                      returns "droplet volumes: volume
                                      %q not found".

Outputs gain `private_ip` (godo droplet.PrivateIPv4()) so downstream
services in the same VPC can be wired without a second Read.

DropletDriver now holds an optional Storage client (set by
NewDropletDriver from c.Storage; tests inject via the variadic
NewDropletDriverWithClient param). When `volumes` is non-empty but the
storage client is nil, Create returns an explicit error rather than
silently dropping the attachment.

mockDropletClient now captures the godo.DropletCreateRequest so tests
can assert on populated struct fields. Adds focused tests for each new
key, plus the SSH-keys fractional-float rejection path that prevents
silent ID truncation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(drivers): add infra.volume driver (DO Block Storage)

New ResourceDriver covering CRUD + Diff + HealthCheck for DigitalOcean
Block Storage volumes. Backed by godo.StorageService; in-place size
growth uses godo.StorageActionsService.Resize. ProviderIDFormat is UUID
(DO volume IDs are 36-character UUIDs).

Config keys:
  - name             from spec.Name
  - region           defaults to provider region
  - size_gb          required, > 0
  - filesystem_type  optional ("ext4" / "xfs"); empty = raw block device
  - description      optional
  - tags             optional []string

Outputs: id (UUID), name, region, size_gb, filesystem_type. Status is a
stable "available" string — godo.Volume exposes no Status field, so a
successful Read on a real volume is the strongest health signal we have.

Update semantics:
  - size growth → in-place StorageActions.Resize, then re-read for state
  - size shrink → explicit error (DO has no shrink API; replace required)
  - region or filesystem_type change → Diff sets ForceNew so plan
    classifies as replace, not update

Tests cover Create / Read / Update (resize, no-op, shrink-rejected,
empty-ProviderID guard) / Delete / Diff (nil current, grow=update-only,
shrink=replace, region=replace, filesystem_type=replace, no-changes)
/ HealthCheck (success, read-error, empty-ProviderID) / ProviderIDFormat
/ Scale-not-supported. The volume mock pool supports filter-by-name so
the same struct exercises both VolumeDriver tests and the droplet
volumes-by-name path.

providerid_format_test.go adds the volume entry to the manually
maintained registry — without it, a future drift in ProviderIDFormat
would not be caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provider): register infra.volume driver and capability

Wire VolumeDriver into Initialize() alongside the other infra drivers
and declare its IaCCapabilityDeclaration with the noScale operation set
(volumes have no scale concept; size_gb is dimensional). Add
"infra.volume" to the sizing map as a noopSizing entry so ResolveSizing
returns "n/a" for every abstract tier — the operator must set size_gb
explicitly. Provider Capabilities test gains the matching required
entry so removal of the driver registration would be caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: declare infra.volume in plugin.json and CHANGELOG

plugin.json: add infra.volume to the iacProvider.resourceTypes array
and update the descriptor to mention Block Storage volumes.

CHANGELOG: document the new infra.volume driver and the extended
Droplet config keys (user_data / vpc_uuid / ssh_keys / volumes / tags
/ enable_backups / monitoring / ipv6) plus the new private_ip Droplet
output. No README exists in this repo, so the CHANGELOG is the
canonical surface for these notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(droplet): strict validation for volumes config entries (Copilot finding #1)

strSliceFromConfig silently drops non-string and empty entries, which is
dangerous for volume attachments — a typo or wrong type leaves the Droplet
running without an expected disk and emits no diagnostic. Add
dropletVolumesFromConfig that returns an explicit error for any non-string
or empty entry, mirroring dropletSSHKeysFromConfig's error-message style.

Tests cover both shapes: []any{123, "data"} (type mismatch) and []any{""}
(empty string).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(volume): reject fractional size_gb instead of silent truncation (Copilot finding #2)

intFromConfig truncates float64 silently — size_gb: 100.9 (e.g. via JSON or
YAML float coercion) was creating a 100 GB volume with no diagnostic. Add
intStrictFromConfig that mirrors the float64 fractional-rejection logic
already in dropletSSHKeysFromConfig and use it for size_gb in Create,
Update, and Diff.

Whole-valued float64 (the structpb wire shape) still passes through.
intFromConfig is unchanged for callers where rounding is acceptable
(node_count, instance_count, probe seconds, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(droplet): accept []int and []int64 ssh_keys at top level (Copilot finding #3)

dropletSSHKeysFromConfig accepted []any and []string at the top level but
bailed on Go-native []int{101} or []int64{101} with "expected list, got
[]int" — even though numeric IDs are documented as supported and the
per-element handler already covers int / int64 / float64. Add type-switches
that mirror the per-element validation (non-positive rejection, error
message style).

Tests cover three shapes: []int{101,102}, []int64{555,666}, and the
non-positive rejection path []int{0,101}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sizing): cover infra.volume in resolveSizing test cases (Copilot finding #4)

infra.volume was added to doSizingMap as noopSizing but the manually-
maintained TestResolveSizing cases table was not updated. Without the
case, the registration is unverified and a future refactor that drops
the entry would not fail this test. Add SizeS + SizeXL coverage so the
"n/a" contract is locked in alongside every other noop-sized resource.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(volume): treat description/tags changes as ForceNew (Copilot finding #5)

Diff previously tracked only size_gb / region / filesystem_type. Changes to
description and tags were silently ignored because godo.StorageActions has
no update endpoint for either, and the godo Volume API itself sets both at
creation time only — so there is no in-place update path to make available.

Resolution: surface drift as ForceNew (matching the way region / filesystem
shrinks are handled) so the planner emits a planned replace rather than
dropping the change. Document the constraint in the diff comment.

Surface description and tags in volumeOutput so Diff has values to compare.
Reuse outputsAsStringSlice + equalStringSet (firewall.go) so reordered tags
do NOT trigger a needless replace.

Tests: tags-add forces replace; reordered tags do NOT; description change
forces replace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(droplet): Diff flags vpc/tags/volumes/backups drift as ForceNew (Copilot finding #6)

Extended droplet fields (user_data, vpc_uuid, ssh_keys, volumes, tags,
enable_backups, monitoring, ipv6) were wired into Create only; Diff
compared just "size", so changing any of these on an existing droplet
produced no plan action and the drift was silently dropped. godo
explicitly disallows Update on droplets (PUT only resizes), so Diff is
the only place to flag these — every detected change must be ForceNew.

Surface the comparable fields in dropletOutput (vpc_uuid, enable_backups
derived from BackupIDs, tags, volumes) so Diff has values to compare;
extend Diff with order-irrelevant set comparison for tags / volumes
(reuses outputsAsStringSlice + equalStringSet from firewall.go).

user_data, monitoring, ipv6, ssh_keys are NOT drift-checked: godo.Droplet
does not surface them on Read, so comparing against current Outputs would
produce a perpetually-dirty plan. Documented as a known limitation.

Tests: vpc_uuid change forces replace; tags add forces replace; tags
reorder does NOT force replace; enable_backups toggle forces replace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(plugin): include 'Block Storage volumes' in Manifest.Description

Manifest.Description must match plugin.json's description (test enforced
in plugin_test.go). The merge from main pulled in a longer description
that didn't yet mention Block Storage volumes; this aligns both.

* fix(ci): wfctl-strict-contracts must copy plugin.contracts.json alongside plugin.json

The Validate strict plugin contracts step (added in PR #41) copies only
plugin.json to a tempfile when simulating GoReleaser's before-hook
version rewrite. But wfctl plugin validate --strict-contracts looks for
plugin.contracts.json CO-LOCATED with plugin.json (same directory). The
rewrite-validation therefore always fails with
"missing_module_contract_descriptor: module type \"iac.provider\" has no
strict contract descriptor" — even when plugin.contracts.json IS valid
in the repo root.

This was masked on main because main has a single configurable contract
file and any change rebuilds in-place; surfaced now by PR #55 which
introduces a new resource type and exercises the second validation path.

Fix: use a temp DIR and copy both plugin.json + plugin.contracts.json
into it so the rewritten plugin.json's neighbour is intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(droplet): resolve VolumeIDs to names in Outputs (Copilot round-2 finding #1)

dropletOutput previously stored godo's raw VolumeIDs (UUIDs) in
Outputs["volumes"], while desired config carries volume *names*. After
every successful Create/Read, the next Diff would compare e.g.
["vol-uuid-1"] against ["pg-data"] and force-replace the Droplet —
catastrophic for stateful workloads (the PG data Droplet would be
destroyed and recreated on every deploy, losing all game state).

Resolve each VolumeID to its name via Storage.GetVolume in dropletOutput.
Cache lookups within a single Read so duplicate VolumeIDs only hit the
API once. On resolution failure (e.g. volume deleted out-of-band) fall
back to the raw ID and record the unresolved IDs in
Outputs["volumes_resolution"] so operators can debug.

Update Diff comment to reflect that comparison is now name-vs-name and
stable across plans.

Tests:
- Read with VolumeIDs returns names in Outputs
- Diff with matching names returns no change (regression guard)
- Diff with desired missing a name returns ForceNew
- Resolution failure falls back to ID + records unresolved_ids
- Duplicate IDs only call GetVolume once (cache hit)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(util): intStrictFromConfig surfaces type mismatches explicitly (Copilot round-2 finding #2)

Previously `intStrictFromConfig` returned (defaultVal, false, nil) when
the config value was an unsupported type (e.g. `size_gb: "100"` as a
string). The caller then emitted "size_gb is required" — a misleading
diagnostic that hid the real bug (operator quoted a number).

Return present=true with an explicit "expected integer, got <type>"
error so the caller surfaces the type problem directly.

Tests: string-typed size_gb and bool-typed size_gb both reject with
"expected integer, got <type>" before any CreateVolume call is made.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(droplet): tags clear (non-empty -> empty) surfaces as drift (Copilot round-2 finding #3)

The Diff path skipped tag comparison when len(desired.tags)==0, so
clearing tags on a Droplet was silently ignored. Operators sometimes
need to strip tags to remove a Droplet from a tag-based firewall or
backup schedule; "no diff" is dangerously wrong here.

Switch the guard to "key present in desired" so empty desired vs
non-empty current surfaces as ForceNew, while absent desired still
skips (preserves backwards-compat for YAML that predates the field).

Tests: clearing tags forces replace; absent tags key does NOT trigger
drift; reorder still ignored.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(volume): filesystem_type transitions raw<->ext4 surface as drift (Copilot round-2 finding #4)

The empty-side guard skipped raw(empty)->ext4 and ext4->raw transitions,
so changing filesystem_type after Create produced no diff. DO Block
Storage cannot reformat a volume in place, so any change must surface
as ForceNew rather than being silently ignored.

Switch to "key present in desired" guard so empty<->non-empty surfaces
as drift. Absent desired key still skips (backwards-compat for YAML
predating the field).

Tests: raw->ext4 and ext4->raw both force replace; absent key does NOT
trigger drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(volume): description add/clear from empty surfaces as drift (Copilot round-2 finding #5)

The empty-side guard skipped both empty->non-empty (adding a
description after Create) and non-empty->empty (clearing a description)
transitions. DO Block Storage exposes no description-update endpoint,
so any change must surface as ForceNew rather than being silently
ignored.

Switch to "key present in desired" guard. Absent key still skips
(backwards-compat).

Tests: add-from-empty and clear-to-empty both force replace; absent
key does NOT trigger drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(volume): tags clear surfaces as drift (Copilot round-2 finding #6)

Same pattern as droplet finding #3 and volume description finding #5.
Clearing tags (non-empty current -> empty desired) was silently ignored
because Diff skipped when len(desired.tags)==0. DO Block Storage has no
tag-update endpoint, so any tag change must surface as ForceNew.

Switch to "key present in desired" guard. Absent key still skips.

Tests: clearing tags forces replace; absent tags key does NOT trigger
drift (existing reorder/equal-set behavior preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(droplet): vpc_uuid add-from-empty surfaces as drift (Copilot round-2 finding #7)

The curVPC != "" guard skipped vpc_uuid drift when the current state
had no vpc_uuid. Pre-release Droplet states (created before vpc_uuid
became part of Outputs) would silently ignore an operator adding a
vpc_uuid pin to YAML — exactly when explicit drift detection matters
most.

Switch to "key present in desired" guard. Empty current vs non-empty
desired now triggers ForceNew. Absent desired key still skips
(backwards-compat).

Tests: vpc_uuid add-from-empty forces replace; absent key does NOT
trigger drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(droplet): explicit limitation block for user_data/ssh_keys/monitoring/ipv6 (Copilot round-2 finding #8)

Round-1 documented the gap inline but the silent fallthrough at the end
of Diff was easy to miss. Replace with an explicit block that:

- Names each undetected field and the godo reason
- States the operator workaround (`taint` / delete + re-apply)
- References follow-up issue #56 for the resolution path
- Updates the function-level Diff doc-comment to point at the inline
  rationale

Issue: #56
"Droplet Diff misses user_data / ssh_keys / monitoring / ipv6 (godo Read limitation)"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 5, 2026
5/5 findings addressed:

#1 (validate_plan.go:52) — byName index must exclude delete-action
   resources. Previously a vpc_ref pointing to a VPC scheduled for
   deletion in the same plan would silently 'resolve' as if live; now
   delete-targets are excluded from the index so cross-resource
   references to them surface as Severity=Error dangling references.
   New regression test:
   TestDOProvider_ValidatePlan_DatabaseVPCRefToDeleteTargetIsDangling.

#2 (validate_plan.go:91) — appendDatabaseDiagnostics docstring
   incorrectly said 'Warning when missing'; the implementation has
   always emitted PlanDiagnosticError, and the conformance scenario +
   TDD tests both REQUIRE Error severity. Doc rewritten to match the
   implemented contract.

#3 (validate_plan.go:225) — zonesInGroup docstring promised sorted
   output but returned the raw underlying slice unsorted. Now copies +
   sorts (lexicographic) so diagnostic messages are deterministic and
   the returned slice is owned by the caller (safe to mutate). nil
   for unknown groups; the caller's strings.Join still works.

#4 (provider_conformance_test.go:70) — comment described a
   'stub-then-real swap' with two Initialize calls; the implementation
   has always made one call with the right token chosen up-front. Doc
   rewritten to match the actual single-call flow.

#5 (codemod-report.yml:73) — fork PRs run with a read-only
   GITHUB_TOKEN per GitHub's pull_request workflow security model, so
   issues:create-comment + issues:update-comment fail 403 and would
   block CI. Gate the comment step on
   github.event.pull_request.head.repo.fork == false. The artifact
   upload step still runs unconditionally so the report remains
   reachable from the Actions tab.

Verified locally: go test ./... -count=1 -race PASS, go test
-tags=conformance ./internal/ -run TestConformance PASS, go vet
clean, codemod-report.yml YAML valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intel352 added a commit that referenced this pull request May 5, 2026
…lyPlan) (#61)

* chore(deps): bump workflow to e2c582b (W-7 conformance + W-8 codemod)

Pseudo-version v0.20.6-0.20260505011403-e2c582bece90, the workflow main HEAD
that includes:
  - W-7: iac/conformance scenario suite (12 scenarios) and DO smoke gate
  - W-8: cmd/iac-codemod 4-mode AST tool

Required for TP1-TP5 of PR P-DO (IaC conformance plan §P-DO).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(plugin): codemod-report workflow uploads dry-run output as artifact + sticky PR comment summary

PR P-DO TP1: per pull_request, runs the iac-codemod refactor-apply mode
in -dry-run against the plugin source, uploads the full Markdown report as
a 90-day retention GitHub Actions artifact, and posts/updates a sticky PR
comment with the top-30 lines of the report so drive-by reviewers see the
key findings without downloading the artifact.

Supply-chain note: actions/github-script SHA-pinned per workflow security
policy (Renovate tracks upstream releases via .github/renovate.json).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(provider): collapse Apply to wfctlhelpers.ApplyPlan

PR P-DO TP2: replace the in-Apply per-action switch (create/update/replace/
delete + upsert recovery + nil-out diagnostic) with a single dispatch to
wfctlhelpers.ApplyPlan. The helper handles:

  - ErrResourceAlreadyExists upsert recovery via interfaces.UpsertSupporter
    (DO drivers AppPlatform, VPC, Firewall, Database already implement
    SupportsUpsert() bool, so they satisfy the canonical interface without
    code change — the local upsertSupporter declaration is now removed).
  - JIT ${MODULE.id} / ${VAR} substitution (W-5).
  - Replace decomposition + ReplaceIDMap propagation (W-3a/W-3b).
  - Input-drift postcondition (W-3a).
  - Per-action context cancellation between iterations.

The DO-plugin-specific deferred-update flush (DatabaseDriver type=app
trusted_sources referencing apps created later in the plan; regression
gated by provider_deferred_test.go and CHANGELOG entry for staging-deploy-
blockers Blocker 2) is preserved by wrapping ApplyPlan with the second-
pass loop that calls FlushDeferredUpdates on any deferredUpdater driver.

The wrapper deviates from the codemod's canonical
'return wfctlhelpers.ApplyPlan(ctx, p, plan)' single-statement shape; the
deviation is documented and marked with // wfctl:skip-iac-codemod so
AssertApplyDelegatesToHelper recognizes the intentional shape. When
wfctlhelpers grows a deferred-update lifecycle hook, the wrapper can
collapse and the marker can drop.

The provider_apply_test.go DeleteAction_MissingCurrent regression test
was a v1 pre-flight defense (synthesize an error when action.Current is
nil before calling Delete). Under v2 dispatch the contract is 'driver is
the authority on what an empty ProviderID means' (per wfctlhelpers/
apply.go::doUpdate's analogous comment). The test was rewritten to lock
the new contract: dispatch IS made with an empty-ProviderID ref; real
drivers like FirewallDriver surface the diagnostic via their typed
validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provider): ValidatePlan for App Platform region-VPC constraints

PR P-DO TP3: implement interfaces.ProviderValidator (W-4) on DOProvider
to surface DO region constraints at `wfctl infra align` time before
any cloud API call. Per the W-4 design, R-A10 invokes ValidatePlan via
type-assertion; providers that do not implement it continue to work
unchanged.

The first pass covers three constraint families:

  1. App Platform infra.container_service requires a region GROUP slug
     (nyc, ams, fra, sfo, sgp, syd, tor, blr, lon). Zone slugs (nyc1,
     sfo3 …) are rejected with PlanDiagnosticError. This is the
     "copy-pasted nyc3 from a Droplet config" defense.

  2. Zone-bound resources (infra.vpc, infra.droplet, infra.volume) MUST
     use a zone slug. Bare group slugs are rejected with
     PlanDiagnosticError. Inverse of (1).

  3. Cross-resource: an App Platform with vpc_ref pointing to a VPC in
     the same plan must have a region group whose zones include the
     referenced VPC's region. This locks the recurring 'App Platform in
     nyc cannot reach VPC in sfo3' production bug class (root-cause
     issue D from the conformance design). Cross-resource resolution
     looks at desired spec first, falls back to action.Current's
     Outputs["region"] for unchanged-VPC scenarios. vpc_ref pointing
     to a name not in the plan emits a Severity=Warning so non-strict
     align tolerates external VPCs while --strict escalates.

ValidatePlan is read-only and makes no remote calls per the W-4
contract. Compile-time interface assertion lives at the bottom of
validate_plan.go.

11 TDD tests in validate_plan_test.go cover: nil/empty plan,
group-slug accepted, zone-slug rejected for AP, zone-slug accepted for
VPC/Droplet/Volume, group-slug rejected for VPC, mismatch error
(flagship), happy-path match, unknown-vpc_ref Warning, current-state
fallback, delete-action skipped, compile-time assertion.

Future extensions (deferred follow-ups): database/cache zone slugs,
load balancer zone matching against attached droplets, registry
regional restrictions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(plugin): opt into computePlanVersion: v2

PR P-DO TP4: declare iacProvider.computePlanVersion=v2 at the TOP LEVEL
of plugin.json so wfctl's runtime dispatcher (cmd/wfctl/deploy_providers.go
::iacPluginManifest) routes Apply through wfctlhelpers.ApplyPlan instead
of the legacy in-provider switch.

Schema note: the SDK manifest schema (plugin/sdk/manifest_schema.json)
expects iacProvider.computePlanVersion at the top level of plugin.json.
The existing capabilities.iacProvider sub-block (name, resourceTypes,
configSchema) is a DIFFERENT consumer (plugin discovery + capability
declaration); the two structures coexist in the same file. The runtime
loader unmarshals both into one struct (one Capabilities.IaCProvider.Name
field plus one top-level IaCProvider.ComputePlanVersion field) so a
single plugin.json read serves both code paths.

Validated via three checkers:
  - go run ./cmd/wfctl plugin validate --file plugin.json --strict-contracts (OK)
  - JSON-schema validation against plugin/sdk/manifest_schema.json (OK)
  - sdk.ParseManifest decode confirms EffectiveComputePlanVersion()==v2

Backward compat: wfctl < v0.21.0 ignores the new field; the legacy v1
dispatch (provider.Apply switch, now wrapping wfctlhelpers.ApplyPlan)
continues to work for all existing callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provider): add conformance test + extend ValidatePlan for database vpc_ref + bump to v0.10.0

PR P-DO TP5: ship the conformance test entry point, extend ValidatePlan
to satisfy the conformance suite's cross-resource constraint scenario,
and bump plugin.json + CHANGELOG to v0.10.0.

provider_conformance_test.go (new, behind 'conformance' build tag):
  - Invokes iac/conformance.Run against a freshly-constructed and
    initialized DOProvider. Initialize is always called with a stub
    token so the driver registry is populated for the non-cloud
    scenarios that probe ResourceDriver lookups (structpb-roundtrip,
    cross-module resolution, etc.) — these exercise read-only or
    pure-data paths that don't hit DO's API.
  - LiveCloud (CONFORMANCE_LIVE_CLOUD=1 + DIGITALOCEAN_ACCESS_TOKEN)
    swaps the stub for the real token before driver instantiation.
  - SmokeOnly = testing.Short() limits to Smoke=true scenarios for
    the per-PR smoke gate's narrow contract.

ValidatePlan extension (validate_plan.go):
  - Added appendDatabaseDiagnostics: infra.database vpc_ref pointing to
    a name not in the plan emits a Severity=Error diagnostic. Closes
    the conformance Scenario_CrossResourceConstraintRejection assertion
    that 'at least one Severity=Error diagnostic' fires for a dangling
    cross-resource reference. The assertion was failing before this
    change (database vpc_ref was previously unhandled by ValidatePlan).
  - Two new TDD tests: dangling-vpc_ref → Error (mirrors the
    conformance contract in-tree); in-plan vpc_ref → no diagnostic
    (happy path).

plugin.json:
  - version 0.9.0 → 0.10.0
  - download URL paths v0.9.0 → v0.10.0 (per-OS/arch)

CHANGELOG.md:
  - New v0.10.0 section catalogues TP1-TP5 changes (ValidatePlan,
    computePlanVersion: v2 opt-in, Apply collapse, conformance test,
    codemod-report workflow), the Apply-delete v2-contract change,
    and the workflow dep bump.
  - Migrates the previously-Unreleased infra.vpc id-output fix into
    the v0.10.0 Fixed section (it ships in this release).

Test results:
  - go test -tags=conformance ./internal/ -run TestConformance: 6/6
    non-cloud scenarios PASS (CrossResourceConstraintRejection,
    DiffSurvivesGRPCRoundTrip, InfraOutputCrossModuleResolution,
    PlanStaleDiagnostic, ProtectedReplaceWithOverride,
    ProtectedReplaceWithoutOverride). Upsert-on-already-exists +
    grpc-roundtrip skip when their opt-in driver types are absent
    (DO does not expose infra.compute).
  - go test ./... -count=1 -race: ALL packages PASS.
  - go run ./cmd/wfctl plugin validate --strict-contracts: OK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Copilot review round 1 — 5 substantive findings

5/5 findings addressed:

#1 (validate_plan.go:52) — byName index must exclude delete-action
   resources. Previously a vpc_ref pointing to a VPC scheduled for
   deletion in the same plan would silently 'resolve' as if live; now
   delete-targets are excluded from the index so cross-resource
   references to them surface as Severity=Error dangling references.
   New regression test:
   TestDOProvider_ValidatePlan_DatabaseVPCRefToDeleteTargetIsDangling.

#2 (validate_plan.go:91) — appendDatabaseDiagnostics docstring
   incorrectly said 'Warning when missing'; the implementation has
   always emitted PlanDiagnosticError, and the conformance scenario +
   TDD tests both REQUIRE Error severity. Doc rewritten to match the
   implemented contract.

#3 (validate_plan.go:225) — zonesInGroup docstring promised sorted
   output but returned the raw underlying slice unsorted. Now copies +
   sorts (lexicographic) so diagnostic messages are deterministic and
   the returned slice is owned by the caller (safe to mutate). nil
   for unknown groups; the caller's strings.Join still works.

#4 (provider_conformance_test.go:70) — comment described a
   'stub-then-real swap' with two Initialize calls; the implementation
   has always made one call with the right token chosen up-front. Doc
   rewritten to match the actual single-call flow.

#5 (codemod-report.yml:73) — fork PRs run with a read-only
   GITHUB_TOKEN per GitHub's pull_request workflow security model, so
   issues:create-comment + issues:update-comment fail 403 and would
   block CI. Gate the comment step on
   github.event.pull_request.head.repo.fork == false. The artifact
   upload step still runs unconditionally so the report remains
   reachable from the Actions tab.

Verified locally: go test ./... -count=1 -race PASS, go test
-tags=conformance ./internal/ -run TestConformance PASS, go vet
clean, codemod-report.yml YAML valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Copilot review round 2 — 2 vpc_ref type-mismatch findings

2/2 findings addressed (both legitimate type-safety improvements):

#6 (validate_plan.go:123) — appendDatabaseDiagnostics validated that
   vpc_ref name resolved in the plan but did NOT validate that the
   target type is actually infra.vpc. A vpc_ref pointing to a Droplet
   or App Platform resource silently passed prior validation. Now
   surfaces a typed Severity=Error diagnostic when target.spec.Type
   != "infra.vpc". New regression test:
   TestDOProvider_ValidatePlan_DatabaseVPCRefToNonVPCType.

#7 (validate_plan.go:189) — same bug in appendAppPlatformDiagnostics:
   vpc_ref pointing to a non-VPC resource would silently bypass the
   region-match check (target.spec.Config["region"] would be a
   region GROUP for an App Platform target, not a zone for a VPC) and
   the operator would never see a clear diagnostic. Same fix: typed
   Error when target.spec.Type != "infra.vpc". New regression test:
   TestDOProvider_ValidatePlan_AppPlatformVPCRefToNonVPCType.

Both diagnostics carry the offending type in the message body so the
operator immediately knows whether they typo'd a name (resolves to
nothing) vs. typo'd a TYPE in a name (resolves to wrong resource).

Verified locally: go test ./... -count=1 -race PASS, go test
-tags=conformance ./internal/ -run TestConformance PASS, go vet
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Copilot review round 3 — 4 deeper findings (vpc_ref UUID/template + classifyRegion + delete-test comment)

4/4 findings addressed (deeper architectural correctness):

#8/#9 (validate_plan.go:137,205) — vpc_ref's accepted DO API shapes
   are: (a) a VPC UUID literal, OR (b) a wfctl JIT template like
   ${vpc.id} that resolves to a UUID at apply time via
   wfctlhelpers.ApplyPlan's jitsubst.ResolveSpec. The prior validator
   unconditionally treated vpc_ref as an in-plan resource Name and
   would have rejected production configs that use the canonical UUID
   shape (godo.AppVpcSpec{ID: vpcID} consumes it directly per
   internal/drivers/app_platform_buildspec.go:674). New
   looksLikeResourceName() helper detects UUID literals (RFC-4122
   pattern) and JIT templates (${...} / $(...)), and the in-plan-
   name lookup is skipped for both. Only plain-name vpc_ref values
   trigger the dangling-reference + non-VPC-type checks. Diagnostic
   messages updated to call out the 'plain resource name' branch
   explicitly so an operator who hit the diagnostic understands it
   does not apply to UUID/template forms. Two new TDD tests cover the
   UUID-literal + JIT-template paths for both database and App
   Platform.

#10 (validate_plan.go:307) — classifyRegion emitted 'a zone slug in
   group ""' for legacy zones (nyc2, ams2) that intentionally map
   to an empty group. Now special-cases the empty-group case to emit
   'a zone slug not in any App Platform region group'. New TDD test
   exercises the path via an App Platform action with region=nyc2.

#11 (provider_apply_test.go:94) — the test comment claimed real
   drivers reject empty-ProviderID deletes via typed validation;
   FirewallDriver.Delete actually resolves by name when ProviderID
   is empty. Comment rewritten to reflect that v2 dispatch contract
   is 'driver knows what an empty ProviderID means for its resource
   shape', not 'all drivers reject empty ProviderID'.

Net new test count: 3 (UUID-deferred, JIT-template-deferred,
classify-empty-group). All existing tests still pass.

Verified locally: go test ./... -count=1 -race PASS, go test
-tags=conformance ./internal/ -run TestConformance PASS, go vet
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Copilot review round 4 — 3 polish findings (UUID case + t.Context + workflow perms)

3/3 findings addressed:

#12 (validate_plan.go:16) — uuidPattern was lowercase-only; UUIDs are
   case-insensitive in practice (operator clipboards, templating
   engines, mixed-case API responses). Added (?i) flag so upper-case
   and mixed-case VPC UUIDs also classify as deferred-to-apply, not
   as plain resource names that would trigger false dangling-reference
   diagnostics. New test:
   TestDOProvider_ValidatePlan_VPCRefAsUpperCaseUUIDIsDeferred.

#13 (provider_conformance_test.go:86) — switched
   p.Initialize(context.Background(), ...) to p.Initialize(t.Context(), ...)
   so live-cloud Initialize is interrupted promptly when the test
   is canceled or hits its deadline. Removed the now-unused
   "context" import.

#14 (codemod-report.yml:12) — dropped the unused pull-requests:write
   permission. The workflow only creates/updates an issue comment
   (PR comments are issues at the GitHub API layer) so the surviving
   issues:write is sufficient. Inline doc-comment captures the
   reasoning so future maintainers don't restore the broader grant.
   Aligns with ci.yml's contents:read-only baseline.

Verified locally: go test ./... -count=1 -race PASS, go test
-tags=conformance ./internal/ -run TestConformance PASS, go vet
clean, codemod-report.yml YAML valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Copilot review round 5 — 2 findings (Initialize-ctx comment + unknown-region forward-compat)

2/2 findings addressed:

#15 (provider_conformance_test.go:83) — the prior comment claimed
   t.Context() makes Initialize cancelable, but DOProvider.Initialize
   today constructs its godo client with its own context.Background()
   and ignores the passed-in ctx. Comment rewritten to reflect that
   the change is forward-prep (so any future rev of Initialize that
   honors ctx picks up the test-scoped cancellation/deadline) rather
   than an immediate behavior fix. Tracked as a follow-up to thread
   ctx through the godo client construction.

#16 (validate_plan.go:334) — the hardcoded region/zone allowlists
   would Severity=Error any brand-new DO region the plugin hasn't
   caught up to, blocking apply until the plugin is bumped. Severity
   is now two-bucket:
     - Documented misconfig (group-where-zone-required, zone-where-
       group-required) → Error (the original anti-pattern stays loud).
     - Unknown slug (neither known group nor known zone, e.g. a
       hypothetical 'atl' or 'atl1') → Warning so non-strict align
       lets operators on bleeding-edge DO regions proceed; --strict
       still escalates for cautious operators.
   New regression test:
   TestDOProvider_ValidatePlan_UnknownRegionSlugWarnsNotErrors covers
   both VPC zone and AP group unknown-slug paths.

Net new test count: 1. All existing tests still PASS (the documented
misconfig branches use known specific slugs that hit the Error path).

Verified locally: go test ./... -count=1 -race PASS, go test
-tags=conformance ./internal/ -run TestConformance PASS, go vet
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant