Skip to content

feat(config): load identity blob from GCP Secret Manager#72

Merged
keanji-x merged 3 commits intoaptos-nodefrom
feat/gcp-secret-manager-identity
Apr 24, 2026
Merged

feat(config): load identity blob from GCP Secret Manager#72
keanji-x merged 3 commits intoaptos-nodefrom
feat/gcp-secret-manager-identity

Conversation

@keanji-x
Copy link
Copy Markdown

@keanji-x keanji-x commented Apr 24, 2026

Summary

Adds a GCP Secret Manager source for the validator identity blob alongside the existing on-disk file source. Secret payload is the same YAML bytes as the on-disk IdentityBlob, so the ops change is just uploading the existing identity file to Secret Manager instead of mounting it on the VM.

Two parallel enum variants:

  • Identity::FromGcpSecret (network / validator-network identity)
  • InitialSafetyRulesConfig::FromGcpSecret (consensus identity incl. BLS key + overrides)

Both carry a projects/<P>/secrets/<S>/versions/<V> resource path (short form without /versions/<V> defaults to latest). Fetch happens once at startup through the same identity_blob() / identity_key() / peer_id() call sites, so downstream consensus/network code is untouched.

Pilot test (GCE VM, single-node cluster)

Verified end-to-end on gravity-mainnet-node-la-0:

  • Built gravity_node with --features gcp-secret-manager, deployed single-node cluster via cluster/ scripts
  • Uploaded a fresh identity.yaml to Secret Manager (gravity-sm-pilot-identity)
  • validator.yaml rendered with from_gcp_secret in all three identity slots (safety_rules, validator_network, full_node_networks)
  • Node loaded identity.yaml via HTTPS fetch from SM (GCP_ACCESS_TOKEN fallback since VM has no bound SA)
  • ValidatorSet account_address exactly matched the SM-uploaded identity
  • make start → block production reached epoch 2 round ~2800 (19 blocks in 10s)
  • Zero identity.yaml on disk in the node's config directory

Config

full_node_networks:
  - identity:
      type: from_gcp_secret
      resource: projects/my-proj/secrets/validator-identity/versions/latest

consensus:
  safety_rules:
    initial_safety_rules_config:
      from_gcp_secret:
        identity_blob_secret: projects/my-proj/secrets/validator-identity/versions/latest
        overriding_identity_secrets: []
        waypoint: { from_config: "0:abc..." }

Auth

Picked in order:

  1. GCP_ACCESS_TOKEN env var — escape hatch for non-GCE dev/CI (e.g. export GCP_ACCESS_TOKEN=$(gcloud auth print-access-token)).
  2. GCE metadata server — zero-config on GCE VMs/GKE pods with a bound service account that has roles/secretmanager.secretAccessor and the VM is created with --scopes=cloud-platform.

Deliberately does not pull google-cloud-auth (async, drags tonic/tokio) into aptos-config — the crate today has no async deps. Reuses reqwest + base64 which are already workspace deps.

Hardening commits

  1. Cargo feature gcp-secret-manager (off by default). reqwest and base64 are optional = true in aptos-config. Binaries that only load identity from disk no longer transitively link the HTTP/TLS stack through aptos-config. Forwarding feature gaptos/gcp-secret-manager is exposed for downstream crates.
  2. Per-process fetch cache keyed by normalized resource path so NetworkConfig::identity_key + NetworkConfig::peer_id + safety_rules don't each issue their own metadata_token + secretmanager.access round-trip at startup.
  3. Resource name in load-failure messages. Bare .unwrap() becomes unwrap_or_else with a panic message that includes the GCP resource; safety_rules_config.rs adds .with_context on the two call sites.

Scope boundaries

  • aptos-genesis/builder.rs still uses Identity::from_file — genesis/keygen writes the YAML locally, then ops uploads it to Secret Manager. No reason to read from GCP during genesis.
  • Does not replace IdentityBlob::to_file (still used by keygen).
  • Does not touch safety_rules OnDiskStorage (BLS consensus key still materializes to secure_storage.json at first boot). That's a separate concern to address via SecureBackend::Vault or a new GcpSecret backend — out of scope here.
  • Does not touch gravity_cli/signer GCP KMS signer — that's EVM secp256k1 signing, orthogonal to the identity blob (which carries x25519 / Ed25519 / BLS12-381).

Test plan

  • cargo check -p aptos-config (no feature) — 0 new warnings
  • cargo check -p aptos-config --features gcp-secret-manager — 0 new warnings
  • cargo check -p aptos-safety-rules — clean (with and without smoke-test feature)
  • cargo test -p aptos-config --lib gcp_secret — 3/3 pass (normalize_* unit tests)
  • End-to-end on GCE VM with single-node cluster producing blocks

Companion PRs (ordered merge)

  1. This PR (gravity-aptos)
  2. gravity-reth: bump gravity-api-types pin to match this PR's merge SHA — prevents dual api-types in downstream graph (PipeExecLayerApi impl ConfigStorage trait mismatch). Will be opened after this merges.
  3. gravity-sdk [Storage] Rename config default_prune_window -> ledger_prune_window aptos-labs/aptos-core#688: bumps gaptos + greth revs, adds cluster.toml identity = { source = "gcp_secret" } support and deploy.sh rendering.

🤖 Generated with Claude Code

keanji-x and others added 3 commits April 24, 2026 10:59
Adds a parallel GCP-backed variant to both `Identity::FromFile` (network
identity) and `InitialSafetyRulesConfig::FromFile` (consensus identity).
Secret payload is the same YAML bytes as the on-disk IdentityBlob, so
the only operational change is uploading the existing identity file to
Secret Manager instead of mounting it on the VM.

Config YAML:

  identity:
    type: from_gcp_secret
    resource: projects/<P>/secrets/<S>/versions/latest

  initial_safety_rules_config:
    from_gcp_secret:
      identity_blob_secret: projects/<P>/secrets/<S>/versions/latest
      waypoint: ...

Auth: GCE metadata server (zero-config on GCE VMs with a bound SA that
has roles/secretmanager.secretAccessor and --scopes=cloud-platform), or
GCP_ACCESS_TOKEN env var for non-GCE dev/CI. Intentionally avoids
pulling the google-cloud-auth async stack into aptos-config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… errors

Three small hardening improvements on top of the initial GCP Secret Manager
identity loader, all backwards-compatible with existing file-based configs.

1. Cargo feature `gcp-secret-manager` (off by default)

   `reqwest` and `base64` are now `optional = true` in `aptos-config`.
   Binaries that only load identity from disk no longer transitively link
   the HTTP/TLS stack through `aptos-config`.

   `gcp_secret::fetch_secret` still exists with the feature off; it returns
   an error pointing the operator at the missing feature flag. Enum variants
   (`Identity::FromGcpSecret`, `InitialSafetyRulesConfig::FromGcpSecret`) are
   intentionally kept unconditional so downstream `match` sites compile the
   same with or without the feature.

   A forwarding feature is exposed on the top-level `gaptos` crate:
   `gaptos/gcp-secret-manager` → `aptos-config/gcp-secret-manager`.

2. Per-process cache for fetched payloads

   `NetworkConfig::identity_key` and `NetworkConfig::peer_id` each call
   `IdentityBlob::from_gcp_secret` independently, and a validator runs both
   a validator_network and one-or-more full_node_networks. Without the
   cache, startup would issue N × (metadata_token + secretmanager.access)
   HTTP calls for the same payload.

   The cache lives in `gcp_secret::CACHE` (`OnceLock<Mutex<HashMap<_,_>>>`)
   and is keyed by normalized resource so `projects/p/secrets/s` and
   `projects/p/secrets/s/versions/latest` collapse to one entry.

3. Resource name included in load-failure messages

   `network_config.rs` changes the bare `.unwrap()` to `unwrap_or_else`
   with a panic message that includes the GCP resource. `safety_rules_config.rs`
   adds `.with_context(|| format!("... {resource}"))` on the two
   `IdentityBlob::from_gcp_secret` call sites. Failures now identify which
   secret path is broken instead of surfacing a bare anyhow Error.

Verified:
- \`cargo check -p aptos-config\` (no feature)         → 0 new warnings
- \`cargo check -p aptos-config --features gcp-secret-manager\` → 0 new warnings
- \`cargo check -p aptos-safety-rules\`               → clean
- \`cargo test -p aptos-config --lib gcp_secret\`     → 3/3 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`InitialSafetyRulesConfig::overriding_identity_blob_paths_mut` is a
smoke-test-only helper that returns `&mut Vec<PathBuf>`. The initial
commit added the `FromGcpSecret` variant but missed this match site
(it's behind `#[cfg(feature = "smoke-test")]` so `cargo check` without
the feature does not surface it).

With smoke tests only ever exercising the file-based identity path, the
new arm is an `unreachable!()` with an explanatory message.

Backwards-compatible: the method signature is unchanged; existing
smoke-test callers continue to use the `FromFile` arm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@keanji-x keanji-x merged commit e9544c8 into aptos-node Apr 24, 2026
84 of 94 checks passed
keanji-x added a commit to Galxe/gravity-reth that referenced this pull request Apr 24, 2026
Follows Galxe/gravity-aptos#72 which introduced Identity::FromGcpSecret
and InitialSafetyRulesConfig::FromGcpSecret on the aptos-node branch.

The gravity-sdk consumers import ConfigStorage from gaptos (which will
bump to this same aptos-node tip in the companion sdk PR). Without this
bump, greth ends up with api-types @ 1d1153b6fd while sdk's gaptos
would be at the post-merge aptos-node tip — cargo sees two distinct
api-types packages keyed by source rev, and the impl of
`ConfigStorage for PipeExecLayerApi` here no longer resolves against
sdk's imported ConfigStorage trait (fails with E0599
'fetch_config_bytes not found').

api-types contents are unchanged between 1d1153b6fd and e9544c8cb3 —
the three intermediate commits only touch aptos-config/config/ (adds
the GCP Secret Manager identity variants behind an off-by-default
cargo feature). This bump is a pure version alignment, no behavior
change in greth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
keanji-x added a commit to Galxe/gravity-reth that referenced this pull request Apr 25, 2026
)

Follows Galxe/gravity-aptos#72 which introduced Identity::FromGcpSecret
and InitialSafetyRulesConfig::FromGcpSecret on the aptos-node branch.

The gravity-sdk consumers import ConfigStorage from gaptos (which will
bump to this same aptos-node tip in the companion sdk PR). Without this
bump, greth ends up with api-types @ 1d1153b6fd while sdk's gaptos
would be at the post-merge aptos-node tip — cargo sees two distinct
api-types packages keyed by source rev, and the impl of
`ConfigStorage for PipeExecLayerApi` here no longer resolves against
sdk's imported ConfigStorage trait (fails with E0599
'fetch_config_bytes not found').

api-types contents are unchanged between 1d1153b6fd and e9544c8cb3 —
the three intermediate commits only touch aptos-config/config/ (adds
the GCP Secret Manager identity variants behind an off-by-default
cargo feature). This bump is a pure version alignment, no behavior
change in greth.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lchangliang pushed a commit that referenced this pull request Apr 25, 2026
* feat(config): load identity blob from GCP Secret Manager

Adds a parallel GCP-backed variant to both `Identity::FromFile` (network
identity) and `InitialSafetyRulesConfig::FromFile` (consensus identity).
Secret payload is the same YAML bytes as the on-disk IdentityBlob, so
the only operational change is uploading the existing identity file to
Secret Manager instead of mounting it on the VM.

Config YAML:

  identity:
    type: from_gcp_secret
    resource: projects/<P>/secrets/<S>/versions/latest

  initial_safety_rules_config:
    from_gcp_secret:
      identity_blob_secret: projects/<P>/secrets/<S>/versions/latest
      waypoint: ...

Auth: GCE metadata server (zero-config on GCE VMs with a bound SA that
has roles/secretmanager.secretAccessor and --scopes=cloud-platform), or
GCP_ACCESS_TOKEN env var for non-GCE dev/CI. Intentionally avoids
pulling the google-cloud-auth async stack into aptos-config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(config): gate gcp-secret-manager behind feature + cache + richer errors

Three small hardening improvements on top of the initial GCP Secret Manager
identity loader, all backwards-compatible with existing file-based configs.

1. Cargo feature `gcp-secret-manager` (off by default)

   `reqwest` and `base64` are now `optional = true` in `aptos-config`.
   Binaries that only load identity from disk no longer transitively link
   the HTTP/TLS stack through `aptos-config`.

   `gcp_secret::fetch_secret` still exists with the feature off; it returns
   an error pointing the operator at the missing feature flag. Enum variants
   (`Identity::FromGcpSecret`, `InitialSafetyRulesConfig::FromGcpSecret`) are
   intentionally kept unconditional so downstream `match` sites compile the
   same with or without the feature.

   A forwarding feature is exposed on the top-level `gaptos` crate:
   `gaptos/gcp-secret-manager` → `aptos-config/gcp-secret-manager`.

2. Per-process cache for fetched payloads

   `NetworkConfig::identity_key` and `NetworkConfig::peer_id` each call
   `IdentityBlob::from_gcp_secret` independently, and a validator runs both
   a validator_network and one-or-more full_node_networks. Without the
   cache, startup would issue N × (metadata_token + secretmanager.access)
   HTTP calls for the same payload.

   The cache lives in `gcp_secret::CACHE` (`OnceLock<Mutex<HashMap<_,_>>>`)
   and is keyed by normalized resource so `projects/p/secrets/s` and
   `projects/p/secrets/s/versions/latest` collapse to one entry.

3. Resource name included in load-failure messages

   `network_config.rs` changes the bare `.unwrap()` to `unwrap_or_else`
   with a panic message that includes the GCP resource. `safety_rules_config.rs`
   adds `.with_context(|| format!("... {resource}"))` on the two
   `IdentityBlob::from_gcp_secret` call sites. Failures now identify which
   secret path is broken instead of surfacing a bare anyhow Error.

Verified:
- \`cargo check -p aptos-config\` (no feature)         → 0 new warnings
- \`cargo check -p aptos-config --features gcp-secret-manager\` → 0 new warnings
- \`cargo check -p aptos-safety-rules\`               → clean
- \`cargo test -p aptos-config --lib gcp_secret\`     → 3/3 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(config): cover FromGcpSecret in overriding_identity_blob_paths_mut

`InitialSafetyRulesConfig::overriding_identity_blob_paths_mut` is a
smoke-test-only helper that returns `&mut Vec<PathBuf>`. The initial
commit added the `FromGcpSecret` variant but missed this match site
(it's behind `#[cfg(feature = "smoke-test")]` so `cargo check` without
the feature does not surface it).

With smoke tests only ever exercising the file-based identity path, the
new arm is an `unreachable!()` with an explanatory message.

Backwards-compatible: the method signature is unchanged; existing
smoke-test callers continue to use the `FromFile` arm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants