ECI-1486 scaleway onboard setup script#166
Conversation
4a1f185 to
066c892
Compare
|
Can we delete old api keys if this is re-run? I see we are just generating but never cleaning up |
EamonBrady1
left a comment
There was a problem hiding this comment.
Is it possible to embed the yaml/docker files directly into the script to avoid having them install any files at all? I think it would be if we just specified the files in line. It might also be good to run this script by the telemetry-onboarding team to get their opinions
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abbf6a3be0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
If there is any concern with writing files, couldn't the script also fetch them from @rheei where do we anticipate this being run? In a local shell on the user's laptop, or in a cloud shell? If it's the latter, I'd check whether writing files/requests to external domains is blocked by default in that environment. Otherwise, I wouldn't have concerns with either of these approaches. The Azure install script does both. For the sake of maintainability, I think it'd be preferable to not stick these files as strings in a bash script if possible, but also wouldn't be the end of the world if there aren't other good options. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 428426fe78
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf14ee08d8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c30afcba8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
da00dd1 to
96a227e
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 96a227e641
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| local policy_resp | ||
| if [[ -n "$policy_id" ]]; then | ||
| log "Updating IAM policy '${IAM_POLICY_NAME}' (id=${policy_id})..." | ||
| policy_resp=$(scw_request PATCH "/iam/v1alpha1/policies/${policy_id}" "$policy_body") \ |
There was a problem hiding this comment.
Update IAM policy rules through the rules endpoint
When the datadog-integration-policy already exists but is missing the current rule set, this PATCH does not actually refresh the permissions used by the new API key. Scaleway's IAM SDK/API docs describe UpdatePolicyRequest as only accepting name, description, tags, and principal fields, while rule replacement is done via SetRulesRequest (Rules []*RuleSpecs). In that scenario the script prints "Updated" but leaves stale rules in place, so later Cockpit or audit-trail calls can still fail with insufficient permissions.
Useful? React with 👍 / 👎.
| local old_key | ||
| while IFS= read -r old_key; do | ||
| [[ -z "$old_key" ]] && continue | ||
| if scw iam api-key delete "access-key=${old_key}" 2>/dev/null; then |
There was a problem hiding this comment.
Pass the API key as the delete positional argument
For rotations where _IAM_OLD_KEYS is non-empty, this cleanup never deletes the stale keys because the Scaleway CLI documents scw iam api-key delete <access-key ...> and shows scw iam api-key delete SCW000...; access-key=${old_key} is treated as the positional value itself rather than the access key. The setup still succeeds, but every rerun leaves the previous Datadog integration key active instead of revoking it.
Useful? React with 👍 / 👎.
| audit_dir="${script_dir}/audit-trail" | ||
|
|
||
| local _audit_ref="${AUDIT_TRAIL_REF:-main}" | ||
| local _audit_base="https://raw.githubusercontent.com/DataDog/integrations-management/${_audit_ref}/scaleway/log_forwarding/audit-trail" |
There was a problem hiding this comment.
Since the in-code comment above flags this as a pre-GA TODO, dropping a reminder: any push to main can change what gets downloaded onto a deployed instance on the next re-run. Pinning to a tagged release or a SHA would close that. Easy to forget once this merges.
| application_id: $app_id, | ||
| rules: [ | ||
| { | ||
| permission_set_names: ["ObservabilityFullAccess", "AllProductsReadOnly"], |
There was a problem hiding this comment.
The Step 0 docstring describes the policy as "least-privilege," but AllProductsReadOnly here grants read access to every Scaleway product on the project, which is more than log forwarding needs. If something specific requires it (AMS tag resolution, account validation, etc.), a one-line note would help; otherwise it could be dropped to match the docstring.
|
Would be great to test with another fresh user like @gpalmz before merging, just to see if they run into any more blockers |
Addresses Grant's review comment on PR #166: after provision_iam_application swaps SCW_SECRET_KEY to the new app's secret (least-privilege), the cleanup loop calls `scw iam api-key delete` which would normally need IAM write permissions the app doesn't have. The script's bash-local var swap happens to "work" in interactive shells because the env vars aren't exported, so the `scw` child process falls back to the user's owner-level config-file creds. But in CI / containerized / fresh-env runs where SCW_SECRET_KEY is exported before invocation, the child process inherits the restricted app key and the delete 403s, leaking IAM keys across re-runs (matches the 4-key leak observed in earlier testing). Fix: stash _ORIG_SCW_{ACCESS,SECRET}_KEY in provision_iam_application before the swap, then run the cleanup-loop delete with an env-var override so the child `scw` process picks up the original (owner-level) creds.
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4bcb529faf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| : "${DD_API_KEY:?DD_API_KEY is required (your Datadog API key)}" | ||
| : "${DD_APP_KEY:?DD_APP_KEY is required (your Datadog application key)}" | ||
| : "${DD_SITE:?DD_SITE is required (e.g. datadoghq.com)}" |
There was a problem hiding this comment.
Don't require Datadog keys for teardown
When the user runs the documented --teardown path, the script still evaluates these Datadog key assertions before main() gets a chance to return from the teardown branch. That makes bash setup-logs.sh --teardown fail immediately unless unrelated Datadog credentials are present, even though teardown only deletes Scaleway audit-trail instances; gate these checks on TEARDOWN=false or move them after the teardown early return.
Useful? React with 👍 / 👎.
| if grep -qxF "$ds_id" <<< "$exported_ids"; then | ||
| ok "Already exported project=$SCW_PROJECT_ID region=$region datasource=$ds_id" | ||
| skipped=$((skipped + 1)) |
There was a problem hiding this comment.
Update existing Cockpit exporters on rerun
For an existing exporter, the script skips solely because the datasource ID is already exported with this name. On a rerun for the same DD_SITE after rotating DD_API_KEY or changing SCALEWAY_PRODUCTS, this leaves the old Datadog destination/products in Scaleway and logs can continue going to a revoked or unintended key; compare the existing exporter configuration and PATCH it instead of treating every name/datasource match as current.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4bcb529faf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| permission_set_names: ["ObservabilityFullAccess", "AllProductsReadOnly"], | ||
| project_ids: [$project_id] |
There was a problem hiding this comment.
Grant instance provisioning permissions before switching keys
When SCW_SECRET_KEY/SCW_ACCESS_KEY are supplied as environment variables (the tile pre-filled flow), reassigning them to the new application key keeps them exported, so the later scw instance server create inherits this policy. The policy only grants Observability/AllProductsReadOnly/AuditTrailReadOnly, while Scaleway documents InstancesFullAccess as the permission set for “create, read, list, edit and delete Instances” (https://www.scaleway.com/en/docs/iam/reference-content/permission-sets/). With audit trail enabled and no SCW_INSTANCE_IP, auto-provisioning therefore fails with an authorization error; provision the VM before switching credentials or grant the instance permissions required by the create/delete path.
Useful? React with 👍 / 👎.
| local datasource_ids=() | ||
| while IFS= read -r _id; do | ||
| [[ -n "$_id" ]] && datasource_ids+=("$_id") | ||
| done < <(get_log_datasource_ids "$SCW_PROJECT_ID" "$region" 2>/dev/null || true) |
There was a problem hiding this comment.
Track Cockpit list failures instead of counting them as success
If listing data sources fails for a region (for example because the new app key lacks the expected Cockpit permission, the region is invalid, or Scaleway returns a transient error), this || true converts the failure into an empty list. The loop then increments _COCKPIT_SKIPPED, and main treats skipped entries as cockpit_working, so with audit trail disabled/not deployed the script can still register a Datadog account even though no exporter was created or confirmed. Propagate these list failures into _COCKPIT_FAILED separately from the legitimate empty-project case.
Useful? React with 👍 / 👎.
End-to-end onboarding for Scaleway → Datadog log forwarding. Single
bash script with no manual tile entry required.
Bootstrap
- Detects whether `scw` CLI is installed; offers to install via the
official Scaleway installer (Linux) or Homebrew / direct binary
download from GitHub releases (macOS, including Apple Silicon).
- Runs `scw init` if scw is installed but no credentials are configured.
Pre-flights (fail fast — no Scaleway resources created on failure)
- Verifies the Datadog API/App key has the `integrations_manage`
permission via a side-effect-free DELETE probe against a zero-UUID.
- When audit trail will provision an Instance, confirms at least one
Scaleway IAM SSH key is registered (Scaleway auto-installs them on
new Instances so the script can SSH in to deploy the collector).
Step 0 — IAM provisioning
- Creates a least-privilege `datadog-integration` IAM application with
policy rules scoped to the target project (ObservabilityFullAccess
+ AllProductsReadOnly) and org (AuditTrailReadOnly). Rules are set
via `PUT /iam/v1alpha1/rules` so re-runs actually refresh them
(Scaleway's policy PATCH silently drops the rules field).
- Generates an API key pair and switches subsequent calls to those
credentials. Stashes the original (owner-level) creds for end-of-run
IAM key cleanup, since the new app's policy lacks IAM write.
Part 1 — Cockpit native exports
- Creates Datadog log exporters for every Scaleway-managed Cockpit
data source across all supported regions. Regions are discovered
dynamically from the scw CLI so new regions are picked up
automatically. Idempotent per exporter (dedup-by-name).
Part 2 — Audit trail collector
- Gated by the `SCW_AUDIT_TRAIL_ENABLED=true` toggle from the
integration tile UI.
- Auto-provisions a small Scaleway Instance (DEV1-S, ~€6.34/mo)
tagged `datadog-audit-trail`, or honors a user-supplied
`SCW_INSTANCE_IP` for BYO use.
- Builds an OpenTelemetry Collector locally with the `scwaudittrail`
receiver, deploys the binary over SSH with a pinned host key,
installs a systemd unit.
- `--teardown` flag deletes only the auto-provisioned Instance (by
tag) along with its IP and volumes; never touches BYO Instances.
Part 3 — Datadog account registration
- POST or PATCH to `/api/v2/web-integrations/scaleway/accounts` with
the provisioned credentials. Paginated list to find an existing
account by name. Skipped if Cockpit failed to produce any working
exporters and audit trail didn't deploy, so a permission failure or
partial outage doesn't leave a dangling integration entry.
Flags / env vars
- `--dry-run`: print every API call (method, URL, body) without
executing; fake credentials accepted.
- `--teardown`: delete the auto-provisioned audit-trail Instance and
exit.
- `SCW_AUDIT_TRAIL_ENABLED`, `SCW_INSTANCE_IP`, `PROVISION_INSTANCE`,
`SCW_AUDIT_INSTANCE_{TYPE,ZONE,IMAGE}` for overrides.
Adds `@DataDog/emerging-cloud-integrations` as the CODEOWNER for
`scaleway/`.
Verified end-to-end against a real Scaleway organization and Datadog
account: provisioning ~24s to SSH-ready, idempotent re-runs reuse the
Instance by tag, teardown leaves user-owned resources untouched, and
audit trail events land in Datadog under `service:scaleway.audit-trail`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
b3dc966 to
8c6aedc
Compare
macOS ships bash 3.2.x by default. The only bash-4.2 dependency was `printf '%(...)T'` for log timestamps; swap to `date '+%H:%M:%S'` which works on bash 3.2. Drops the version gate, so macOS users can run the script without first installing a newer bash via Homebrew.
Summary
Adds
scaleway/log_forwarding/setup-logs.sh, an end-to-end setup script that provisions a Scaleway project for Datadog log forwarding with no manual tile entry required. Pre-filled credentials and the audit-trail toggle state come from the integration tile UI (web-ui#306167).Bootstrap
scwCLI install (Linux only): ifscwisn't on PATH, the script offers to install it via the official Scaleway install script. macOS users get abrew install scwhint. Skipped in--dry-run.scw init(when needed): ifscwis installed but has no credentials, runsscw initinteractively so the user is prompted for their Scaleway access/secret keys, with a pointer to https://console.scaleway.com/iam/api-keys.Pre-flights (fail fast before any Scaleway-side work)
integrations_managepermission: probesGET /api/v2/web-integrations/scaleway/accountswith the supplied DD_API_KEY/DD_APP_KEY. Bails on 401/403 with an actionable fix.SCW_INSTANCE_IPis set (BYO) or audit trail is disabled.Step 0 – IAM provisioning
Creates a least-privilege
datadog-integrationIAM application with anObservabilityFullAccesspolicy scoped to the target project, generates an API key pair, and switches all subsequent calls to those credentials. Idempotent — reuses the app and policy if they already exist.Part 1 – Cockpit native exports
Creates Datadog log exporters for all Scaleway-managed data sources across all supported regions. Supported regions are discovered dynamically from the
scwCLI so the list stays in sync as Scaleway adds new ones. Idempotent per exporter.Part 2 – Audit trail export
Gated by
SCW_AUDIT_TRAIL_ENABLED=true(the UI toggle). The script:datadog-audit-trailviascw instance server create -w. Idempotent: subsequent runs reuse the same Instance by tag.SCW_INSTANCE_IPfor customers who want to deploy the collector to an existing Instance they already operate (advanced/BYO).scwaudittrailreceiver and deploys it over SSH with a pinned host key.--teardownflag deletes only auto-provisioned Instances (filtered by tag) — never touches BYO Instances.Part 3 – Datadog account registration
Calls
POST /api/v2/web-integrations/scaleway/accountsto create (orPATCHto update) the Datadog integration account with the provisioned credentials. Skipped if Cockpit produced zero working exporters AND audit trail didn't deploy, so a permission failure or partial outage doesn't leave a dangling integration account record with no data flowing.Flags & env vars
--dry-run: prints every API call (method, URL, body) without executing — fake credentials are accepted.--teardown: deletes everything taggeddatadog-audit-trailand exits.SCW_AUDIT_TRAIL_ENABLED=true|false— pre-filled by the tile toggle.SCW_INSTANCE_IP— BYO Instance escape hatch.PROVISION_INSTANCE=auto|true|false— controls the cost-confirmation prompt.SCW_AUDIT_INSTANCE_{TYPE,ZONE,IMAGE}— provisioning overrides.Test plan
--dry-runsmoke test passes with fake credentials:SCW_SECRET_KEY=x SCW_ACCESS_KEY=x SCW_ORGANIZATION_ID=x SCW_PROJECT_ID=x DD_API_KEY=x DD_APP_KEY=x DD_SITE=datadoghq.com SCW_AUDIT_TRAIL_ENABLED=false bash scaleway/log_forwarding/setup-logs.sh --dry-run--dry-run --teardownsmoke test passes--teardowncleaned up Instance + IP + volumesbash -nandshellcheck -S warningcleanRelated
instance_ip): DataDog/dd-source#437350