infra PR3: add Terraform env stacks, CI validation, and runbook docs#18
Conversation
| module "secrets" { | ||
| source = "../../modules/secrets" | ||
|
|
||
| project_id = var.project_id | ||
| secret_ids = var.secret_ids | ||
| runner_service_account_email = module.runner.service_account_email | ||
| } |
There was a problem hiding this comment.
Missing secrets module breaks init/validate
Both the dev and prod environment stacks reference ../../modules/secrets, but this module does not exist anywhere in the repository — not in the base branch (codex/infra-pr2-terraform-gcp-modules), not at HEAD, and not introduced in this PR. The base branch only contains network, runner_vm, and observability modules.
This will cause terraform init to fail with a "module not found" error, which means the CI workflow's "Terraform validate environments" step will also fail. The PR's own validation steps (terraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validate) cannot pass without it.
Either add the secrets module in this PR or remove the module "secrets" references from the environment stacks until the module is introduced.
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 44-50
Comment:
**Missing `secrets` module breaks init/validate**
Both the dev and prod environment stacks reference `../../modules/secrets`, but this module does not exist anywhere in the repository — not in the base branch (`codex/infra-pr2-terraform-gcp-modules`), not at HEAD, and not introduced in this PR. The base branch only contains `network`, `runner_vm`, and `observability` modules.
This will cause `terraform init` to fail with a "module not found" error, which means the CI workflow's "Terraform validate environments" step will also fail. The PR's own validation steps (`terraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validate`) cannot pass without it.
Either add the `secrets` module in this PR or remove the `module "secrets"` references from the environment stacks until the module is introduced.
How can I resolve this? If you propose a fix, please make it concise.| terraform { | ||
| required_version = ">= 1.5.0, < 2.0.0" | ||
|
|
||
| required_providers { | ||
| google = { | ||
| source = "hashicorp/google" | ||
| version = "~> 5.0" | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Dev stack missing remote backend block
The prod environment declares backend "gcs" {} for remote state, but the dev environment has no backend block and will default to local state. This means the dev state file won't be shared or persisted across team members or CI runs.
If this is intentional (e.g., dev is always local-only), it may be worth adding a comment to document the choice. Otherwise, consider adding the same backend "gcs" {} block so that dev state can also be managed remotely with -backend-config flags, matching the pattern described in the runbook.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 1-10
Comment:
**Dev stack missing remote backend block**
The prod environment declares `backend "gcs" {}` for remote state, but the dev environment has no backend block and will default to local state. This means the dev state file won't be shared or persisted across team members or CI runs.
If this is intentional (e.g., dev is always local-only), it may be worth adding a comment to document the choice. Otherwise, consider adding the same `backend "gcs" {}` block so that dev state can also be managed remotely with `-backend-config` flags, matching the pattern described in the runbook.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| - name: Terraform validate modules | ||
| run: | | ||
| for module in network runner_vm secrets observability; do | ||
| terraform -chdir=infra/terraform/modules/$module init -backend=false | ||
| terraform -chdir=infra/terraform/modules/$module validate | ||
| done |
There was a problem hiding this comment.
CI validates a secrets module that doesn't exist
The secrets module is listed in the loop (for module in network runner_vm secrets observability), but infra/terraform/modules/secrets/ does not exist in the repository. This will cause the "Terraform validate modules" step to fail with a directory-not-found error.
Remove secrets from this loop until the module is added, or add the module in this PR.
| - name: Terraform validate modules | |
| run: | | |
| for module in network runner_vm secrets observability; do | |
| terraform -chdir=infra/terraform/modules/$module init -backend=false | |
| terraform -chdir=infra/terraform/modules/$module validate | |
| done | |
| run: | | |
| for module in network runner_vm observability; do | |
| terraform -chdir=infra/terraform/modules/$module init -backend=false | |
| terraform -chdir=infra/terraform/modules/$module validate | |
| done |
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/terraform.yml
Line: 36-41
Comment:
**CI validates a `secrets` module that doesn't exist**
The `secrets` module is listed in the loop (`for module in network runner_vm secrets observability`), but `infra/terraform/modules/secrets/` does not exist in the repository. This will cause the "Terraform validate modules" step to fail with a directory-not-found error.
Remove `secrets` from this loop until the module is added, or add the module in this PR.
```suggestion
run: |
for module in network runner_vm observability; do
terraform -chdir=infra/terraform/modules/$module init -backend=false
terraform -chdir=infra/terraform/modules/$module validate
done
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| - name: Terraform validate modules | ||
| run: | | ||
| for module in network runner_vm secrets observability; do |
There was a problem hiding this comment.
secrets module doesn't exist in infra/terraform/modules/ - this validation step will fail
| for module in network runner_vm secrets observability; do | |
| for module in network runner_vm observability; do |
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/terraform.yml
Line: 38
Comment:
`secrets` module doesn't exist in `infra/terraform/modules/` - this validation step will fail
```suggestion
for module in network runner_vm observability; do
```
How can I resolve this? If you propose a fix, please make it concise.| module "secrets" { | ||
| source = "../../modules/secrets" | ||
|
|
||
| project_id = var.project_id | ||
| secret_ids = var.secret_ids | ||
| runner_service_account_email = module.runner.service_account_email |
There was a problem hiding this comment.
../../modules/secrets doesn't exist - terraform init will fail. The base branch is missing this module despite the commit message mentioning it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 44-49
Comment:
`../../modules/secrets` doesn't exist - terraform init will fail. The base branch is missing this module despite the commit message mentioning it.
How can I resolve this? If you propose a fix, please make it concise.| module "secrets" { | ||
| source = "../../modules/secrets" | ||
|
|
||
| project_id = var.project_id | ||
| secret_ids = var.secret_ids | ||
| runner_service_account_email = module.runner.service_account_email | ||
| } |
There was a problem hiding this comment.
../../modules/secrets doesn't exist - terraform init will fail. Need to either add the secrets module or remove this block.
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/prod/main.tf
Line: 49-55
Comment:
`../../modules/secrets` doesn't exist - terraform init will fail. Need to either add the secrets module or remove this block.
How can I resolve this? If you propose a fix, please make it concise.| -backend-config=\"bucket=neural-tf-state-prod\" \ | ||
| -backend-config=\"prefix=neural/prod\" |
There was a problem hiding this comment.
Escaped quotes in bash command - these should be unescaped for copy-paste usability
| -backend-config=\"bucket=neural-tf-state-prod\" \ | |
| -backend-config=\"prefix=neural/prod\" | |
| -backend-config="bucket=neural-tf-state-prod" \ | |
| -backend-config="prefix=neural/prod" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/workflows/terraform-runbook.mdx
Line: 33-34
Comment:
Escaped quotes in bash command - these should be unescaped for copy-paste usability
```suggestion
-backend-config="bucket=neural-tf-state-prod" \
-backend-config="prefix=neural/prod"
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| echo -n \"<KALSHI_API_KEY_ID>\" | gcloud secrets versions add kalshi-api-key-id --data-file=- | ||
| echo -n \"<KALSHI_PRIVATE_KEY_PEM>\" | gcloud secrets versions add kalshi-private-key-pem --data-file=- |
There was a problem hiding this comment.
Escaped quotes in bash commands - remove backslashes for proper copy-paste
| echo -n \"<KALSHI_API_KEY_ID>\" | gcloud secrets versions add kalshi-api-key-id --data-file=- | |
| echo -n \"<KALSHI_PRIVATE_KEY_PEM>\" | gcloud secrets versions add kalshi-private-key-pem --data-file=- | |
| echo -n "<KALSHI_API_KEY_ID>" | gcloud secrets versions add kalshi-api-key-id --data-file=- | |
| echo -n "<KALSHI_PRIVATE_KEY_PEM>" | gcloud secrets versions add kalshi-private-key-pem --data-file=- |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/workflows/terraform-runbook.mdx
Line: 62-63
Comment:
Escaped quotes in bash commands - remove backslashes for proper copy-paste
```suggestion
echo -n "<KALSHI_API_KEY_ID>" | gcloud secrets versions add kalshi-api-key-id --data-file=-
echo -n "<KALSHI_PRIVATE_KEY_PEM>" | gcloud secrets versions add kalshi-private-key-pem --data-file=-
```
How can I resolve this? If you propose a fix, please make it concise.| output "secret_ids" { | ||
| value = module.secrets.secret_ids | ||
| description = "Provisioned Secret Manager IDs" | ||
| } |
There was a problem hiding this comment.
references module.secrets.secret_ids but secrets module doesn't exist
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/outputs.tf
Line: 16-19
Comment:
references `module.secrets.secret_ids` but secrets module doesn't exist
How can I resolve this? If you propose a fix, please make it concise.| output "secret_ids" { | ||
| value = module.secrets.secret_ids | ||
| description = "Provisioned Secret Manager IDs" | ||
| } |
There was a problem hiding this comment.
references module.secrets.secret_ids but secrets module doesn't exist
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/prod/outputs.tf
Line: 16-19
Comment:
references `module.secrets.secret_ids` but secrets module doesn't exist
How can I resolve this? If you propose a fix, please make it concise.| project_id = "my-gcp-project" | ||
| region = "us-central1" | ||
| zone = "us-central1-f" | ||
|
|
||
| notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"] |
There was a problem hiding this comment.
prod example missing allow_ssh_cidrs and enable_alert_policy that are shown in dev example. Consider adding these for consistency:
| project_id = "my-gcp-project" | |
| region = "us-central1" | |
| zone = "us-central1-f" | |
| notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"] | |
| project_id = "my-gcp-project" | |
| region = "us-central1" | |
| zone = "us-central1-f" | |
| allow_ssh_cidrs = ["35.235.240.0/20"] # IAP TCP tunnel range | |
| enable_alert_policy = true | |
| notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"] |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/prod/terraform.tfvars.example
Line: 1-5
Comment:
prod example missing `allow_ssh_cidrs` and `enable_alert_policy` that are shown in dev example. Consider adding these for consistency:
```suggestion
project_id = "my-gcp-project"
region = "us-central1"
zone = "us-central1-f"
allow_ssh_cidrs = ["35.235.240.0/20"] # IAP TCP tunnel range
enable_alert_policy = true
notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"]
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| variable "allow_ssh_cidrs" { | ||
| description = "CIDR blocks allowed for SSH" | ||
| type = list(string) | ||
| default = [] | ||
| } |
There was a problem hiding this comment.
dev defaults to no SSH access [] while prod defaults to IAP range ["35.235.240.0/20"]. Typically dev needs easier access for debugging. Consider aligning defaults or documenting the rationale.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/variables.tf
Line: 30-34
Comment:
dev defaults to no SSH access `[]` while prod defaults to IAP range `["35.235.240.0/20"]`. Typically dev needs easier access for debugging. Consider aligning defaults or documenting the rationale.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| output "secret_ids" { | ||
| value = module.secrets.secret_ids | ||
| description = "Provisioned Secret Manager IDs" | ||
| } |
There was a problem hiding this comment.
dev outputs missing log_metric_type that prod has - consider adding for consistency:
| } | |
| description = "Provisioned Secret Manager IDs" | |
| } | |
| output "log_metric_type" { | |
| value = module.observability.log_metric_type | |
| description = "Log metric used for error alerting" | |
| } |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/outputs.tf
Line: 19
Comment:
dev outputs missing `log_metric_type` that prod has - consider adding for consistency:
```suggestion
description = "Provisioned Secret Manager IDs"
}
output "log_metric_type" {
value = module.observability.log_metric_type
description = "Log metric used for error alerting"
}
```
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| @@ -1,3 +1,5 @@ | |||
| """Helpers for resolving Polymarket US auth credentials from env/files.""" | |||
There was a problem hiding this comment.
polymarket docstrings and formatting changes are unrelated to the infrastructure PR scope - consider separating unrelated changes into their own PRs
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: neural/auth/polymarket_us_env.py
Line: 1
Comment:
polymarket docstrings and formatting changes are unrelated to the infrastructure PR scope - consider separating unrelated changes into their own PRs
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.…-pr3-terraform-envs-ci-docs
…-pr3-terraform-envs-ci-docs
| module "runner" { | ||
| source = "../../modules/runner_vm" | ||
|
|
||
| project_id = var.project_id | ||
| zone = var.zone | ||
| instance_name = "${var.stack_name}-runner" | ||
| machine_type = var.machine_type | ||
| network_self_link = module.network.network_self_link | ||
| subnetwork_self_link = module.network.subnetwork_self_link | ||
|
|
||
| startup_script = var.startup_script | ||
| tags = ["neural-runner", "env-dev"] | ||
| } |
There was a problem hiding this comment.
Service account name collision across environments
Both dev and prod runner modules rely on the runner_vm module's default service_account_id = "neural-runner" (see infra/terraform/modules/runner_vm/variables.tf:41), since neither environment passes an explicit override. If both environments are deployed to the same GCP project, terraform apply on the second environment will fail with a "service account already exists" error because GCP service account IDs are project-unique.
Consider passing a per-environment service_account_id to disambiguate:
| module "runner" { | |
| source = "../../modules/runner_vm" | |
| project_id = var.project_id | |
| zone = var.zone | |
| instance_name = "${var.stack_name}-runner" | |
| machine_type = var.machine_type | |
| network_self_link = module.network.network_self_link | |
| subnetwork_self_link = module.network.subnetwork_self_link | |
| startup_script = var.startup_script | |
| tags = ["neural-runner", "env-dev"] | |
| } | |
| module "runner" { | |
| source = "../../modules/runner_vm" | |
| project_id = var.project_id | |
| zone = var.zone | |
| instance_name = "${var.stack_name}-runner" | |
| machine_type = var.machine_type | |
| network_self_link = module.network.network_self_link | |
| subnetwork_self_link = module.network.subnetwork_self_link | |
| service_account_id = "${var.stack_name}-runner" | |
| startup_script = var.startup_script | |
| tags = ["neural-runner", "env-dev"] | |
| } |
The same change is needed in infra/terraform/environments/prod/main.tf:35-47.
Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 35-47
Comment:
**Service account name collision across environments**
Both dev and prod runner modules rely on the `runner_vm` module's default `service_account_id = "neural-runner"` (see `infra/terraform/modules/runner_vm/variables.tf:41`), since neither environment passes an explicit override. If both environments are deployed to the same GCP project, `terraform apply` on the second environment will fail with a "service account already exists" error because GCP service account IDs are project-unique.
Consider passing a per-environment `service_account_id` to disambiguate:
```suggestion
module "runner" {
source = "../../modules/runner_vm"
project_id = var.project_id
zone = var.zone
instance_name = "${var.stack_name}-runner"
machine_type = var.machine_type
network_self_link = module.network.network_self_link
subnetwork_self_link = module.network.subnetwork_self_link
service_account_id = "${var.stack_name}-runner"
startup_script = var.startup_script
tags = ["neural-runner", "env-dev"]
}
```
The same change is needed in `infra/terraform/environments/prod/main.tf:35-47`.
How can I resolve this? If you propose a fix, please make it concise.…-pr3-terraform-envs-ci-docs
…-pr3-terraform-envs-ci-docs # Conflicts: # docs/basics/infrastructure.mdx # docs/mint.json
…-pr3-terraform-envs-ci-docs
Additional Comments (1)
Without these, developers risk committing:
|
Additional Comments (1)
The While the This directly contradicts the runbook's section 4 ("resolve secrets at startup using the runner service account"). Add The same change is needed in |
📚 Documentation Status✅ Documentation changes detected
This comment is automatically generated by the documentation workflow. |
Summary
infra/terraform/environments/devinfra/terraform/environments/prodworkflows/terraform-runbook)Notes
codex/infra-pr2-terraform-gcp-modules.Validation
terraform -chdir=infra/terraform init -backend=falseterraform -chdir=infra/terraform fmt -check -recursiveterraform -chdir=infra/terraform validateterraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validateterraform -chdir=infra/terraform/environments/prod init -backend=false && terraform -chdir=infra/terraform/environments/prod validateGreptile Summary
This PR establishes the foundational Infrastructure as Code (IaC) layer for Neural by adding complete Terraform environment stacks for dev and prod, CI validation, and operational documentation.
Major additions:
infra/terraform/environments/{dev,prod}that compose network, runner VM, secrets, and observability modulesworkflows/terraform-runbook) covering bootstrap, state management, secret injection, and troubleshootingEnvironment configuration:
e2-standard-2machine,10.30.0.0/24subnet, alerting disabled by defaulte2-standard-4machine,10.40.0.0/24subnet, alerting enabled, additionalneural-runtime-envsecret35.235.240.0/20)Note: The service account collision issue mentioned in previous comments remains - both environments use the default
neural-runnerservice account ID. If deploying to the same GCP project, overrideservice_account_idin one environment to avoid conflicts.Confidence Score: 4/5
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[Environment Stack<br/>dev/prod] --> B[Network Module] A --> C[Runner VM Module] A --> D[Secrets Module] A --> E[Observability Module] B --> F[VPC & Subnet<br/>10.30.0.0/24 dev<br/>10.40.0.0/24 prod] B --> G[Firewall Rules<br/>IAP SSH Access] C --> H[Compute Engine VM<br/>e2-standard-2 dev<br/>e2-standard-4 prod] C --> I[Service Account<br/>neural-runner] C --> J[Startup Script<br/>Docker Bootstrap] D --> K[Secret Manager<br/>kalshi-api-key-id<br/>kalshi-private-key-pem] D --> L[IAM Binding<br/>secretAccessor role] E --> M[Log Metrics<br/>Error Alerting] E --> N[Alert Policy<br/>enabled in prod] I -.grants access.-> L style A fill:#e1f5ff style I fill:#fff3cd style K fill:#d4eddaLast reviewed commit: 6e56b85