Skip to content

Security: Replace long-lived google-credentials.json service-account keys with keyless authentication (Workload Identity / ADC) #6800

@thainguyensunya

Description

@thainguyensunya

Summary

All Omi backend services currently authenticate to GCP (Firestore, Firebase Auth, GCS, Pub/Sub, etc.) using a long-lived service-account JSON key — either shipped as a file (google-credentials.json) or injected as an env var (SERVICE_ACCOUNT_JSON). This is flagged by Google as one of the highest-risk credential patterns and should be replaced with keyless authentication (Application Default Credentials backed by GKE Workload Identity, Workload Identity Federation for Modal/CI, and user ADC for local dev).


Why this is a problem

Long-lived service-account keys have well-known drawbacks:

  • Never expire — if leaked (git history, image layer, log line, backup, laptop compromise), an attacker has persistent access until the key is manually rotated.
  • Hard to rotate — rotation requires coordinated redeploys across every service that embeds the key.
  • Blast radius — the same key is reused across multiple services, making least-privilege IAM and audit attribution difficult.
  • Baked into Docker images — in at least one service the key is COPY'd into the image at build time (see below), meaning anyone with pull access to the registry has the key.
  • Developer machines hold production-adjacent keys — the setup docs instruct developers to place a JSON key on disk.
  • OWASP A02 (Cryptographic Failures) / A07 (Identification & Authentication Failures) — secrets-at-rest and broken authentication exposure.
  • Google's own guidance: [Best practices for managing service account keys](https://cloud.google.com/iam/docs/best-practices-for-managing-service-account-keys) explicitly recommends avoiding keys in favor of Workload Identity / WIF.

Current state in the codebase

1. SERVICE_ACCOUNT_JSON env var carrying the full key

main.py

if os.environ.get('SERVICE_ACCOUNT_JSON'):
    service_account_info = json.loads(os.environ["SERVICE_ACCOUNT_JSON"])
    credentials = firebase_admin.credentials.Certificate(service_account_info)
    firebase_admin.initialize_app(credentials)
else:
    firebase_admin.initialize_app()

Same pattern in:

  • main.py
  • job.py

2. GOOGLE_APPLICATION_CREDENTIALS pointing at a JSON key file on disk

main.py

cred_path = os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
if cred_path:
    firebase_admin.initialize_app(credentials.Certificate(cred_path))
else:
    firebase_admin.initialize_app()

3. Key file copied directly into a container image

Dockerfile

# CI writes google-credentials.json to backend/ before build (gcp_backend_pusher.yml)
COPY backend/google-credentials.json ./google-credentials.json

4. Helm chart injecting the key path from a K8s Secret

prod_omi_backend_listen_values.yaml

- name: GOOGLE_APPLICATION_CREDENTIALS
  valueFrom:
    secretKeyRef:
      name: ...
      key: GOOGLE_APPLICATION_CREDENTIALS

5. Developer setup instructs copying a key onto local disk

Backend_Setup.mdx

cp ~/.config/gcloud/application_default_credentials.json ./google-credentials.json

6. Tests force a JSON key even though they call ApplicationDefault()

conftest.py

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = backend_dir + "/" + os.getenv('GOOGLE_APPLICATION_CREDENTIALS')
...
cred = credentials.ApplicationDefault()
firebase_admin.initialize_app(cred)

Also affected: 001_enhanced_protection_default.py and all sibling migration scripts (002006), plus backend/scripts/stt/* which prepend paths to GOOGLE_APPLICATION_CREDENTIALS.


Proposed fix — keyless authentication end-to-end

Target state: every service calls firebase_admin.initialize_app() / credentials.ApplicationDefault() with no JSON key anywhere. The environment supplies the identity.

GKE workloads (backend, pusher, diarizer, agent-proxy)

  • Create one GCP service account per workload, least-privilege IAM (e.g. backend-listen-sa, pusher-sa, agent-proxy-sa).

  • Enable GKE Workload Identity on the cluster (if not already).

  • Bind each K8s ServiceAccount to its GCP SA via roles/iam.workloadIdentityUser and annotate:

    serviceAccount:
      annotations:
        iam.gke.io/gcp-service-account: backend-listen-sa@<project>.iam.gserviceaccount.com
  • Remove GOOGLE_APPLICATION_CREDENTIALS, SERVICE_ACCOUNT_JSON, and any JSON-key mounts from:

    • backend/charts/backend-listen/*values.yaml
    • backend/charts/pusher/*values.yaml
    • backend/charts/diarizer/*values.yaml
    • backend/charts/agent-proxy/*values.yaml
  • Remove COPY backend/google-credentials.json from Dockerfile (and any other Dockerfiles that bake it in).

  • Simplify [main.py](http://main.py/) in each service to just firebase_admin.initialize_app() — the google-auth library discovers the metadata-server identity automatically.

Modal workloads (job.py, vad)

Modal doesn't run on GCP, so metadata-server ADC isn't available. Preferred:

  • Workload Identity Federation with OIDC: configure a WIF pool that trusts Modal's OIDC issuer, exchange the token for a short-lived GCP access token at startup.
  • Fallback: store the key in Modal Secrets (never in the image), scope the SA to only what the job needs, and enable periodic rotation.

CI / GitHub Actions

  • Replace any JSON-key usage with google-github-actions/auth@v2 using workload_identity_provider + service_account.
  • Remove any step that writes google-credentials.json into the repo/build context (e.g. the one referenced in gcp_backend_pusher.yml).

Local development

Update Backend_Setup.mdx to recommend:

gcloud auth application-default login
# optionally impersonate a lower-privilege SA:
gcloud auth application-default login \\
  --impersonate-service-account=dev-readonly@<project>.iam.gserviceaccount.com

and drop the instruction to copy the key to ./google-credentials.json. Ensure google-credentials.json stays in .gitignore and .dockerignore.

Tests

In conftest.py, stop forcing GOOGLE_APPLICATION_CREDENTIALS to a committed-style file path; let ADC discover the developer's gcloud login:

cred = credentials.ApplicationDefault()
firebase_admin.initialize_app(cred)

Migrations and backend/scripts/stt/* should follow the same pattern.


Org-level hardening (once migration is complete)

  • Enable Org Policies:
    • constraints/iam.disableServiceAccountKeyCreation
    • constraints/iam.disableServiceAccountKeyUpload
    • constraints/iam.serviceAccountKeyExpiryHours (belt-and-suspenders if any keys remain)
  • Delete all existing JSON keys for the affected service accounts after cutover.
  • Add a CI check that fails on any google-credentials.json or SERVICE_ACCOUNT_JSON reference in charts/, Dockerfiles, or .env examples.
  • Add Security Command Center / Cloud Asset Inventory alert on any new SA key creation.

Migration plan (low-risk order)

  • Inventory every GCP SA currently in use and map → service + required IAM roles.
  • Create per-service GCP SAs with least-privilege roles.
  • Enable GKE Workload Identity on the target cluster.
  • Migrate agent-proxy first (lowest traffic), verify in prod.
  • Migrate diarizer, then pusher, then backend.
  • Migrate Modal jobs (notifications-job, vad) via WIF-OIDC.
  • Migrate GitHub Actions workflows to WIF.
  • Update Backend_Setup.mdx and [conftest.py](http://conftest.py/).
  • Rotate & delete all remaining JSON keys.
  • Enable org policies blocking new key creation.
  • Add CI guardrail grepping for forbidden patterns.

Acceptance criteria

  • No service reads SERVICE_ACCOUNT_JSON or a JSON key file at startup.
  • No Helm chart, Dockerfile, or CI workflow contains google-credentials.json or injects GOOGLE_APPLICATION_CREDENTIALS pointing at a key.
  • All existing service-account keys listed in GCP IAM for these workloads are deleted.
  • Org policy iam.disableServiceAccountKeyCreation is enforced.
  • Local dev and integration tests work with only gcloud auth application-default login.
  • CI guardrail rejects PRs reintroducing key-file patterns.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions