Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/terraform.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Terraform

on:
pull_request:
branches: [ main ]
paths:
- "infra/terraform/**"
- ".github/workflows/terraform.yml"
push:
branches: [ main ]
paths:
- "infra/terraform/**"
- ".github/workflows/terraform.yml"

jobs:
terraform:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.5.7

- name: Terraform fmt check
run: terraform -chdir=infra/terraform fmt -check -recursive

- name: Terraform init (root)
run: terraform -chdir=infra/terraform init -backend=false

- name: Terraform validate (root)
run: terraform -chdir=infra/terraform validate

- name: Terraform validate modules
run: |
for module in network runner_vm secrets observability; do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secrets module doesn't exist in infra/terraform/modules/ - this validation step will fail

Suggested change
for module in network runner_vm secrets observability; do
for module in network runner_vm observability; do
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/terraform.yml
Line: 38

Comment:
`secrets` module doesn't exist in `infra/terraform/modules/` - this validation step will fail

```suggestion
          for module in network runner_vm observability; do
```

How can I resolve this? If you propose a fix, please make it concise.

terraform -chdir=infra/terraform/modules/$module init -backend=false
terraform -chdir=infra/terraform/modules/$module validate
done
Comment on lines +36 to +41
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI validates a secrets module that doesn't exist

The secrets module is listed in the loop (for module in network runner_vm secrets observability), but infra/terraform/modules/secrets/ does not exist in the repository. This will cause the "Terraform validate modules" step to fail with a directory-not-found error.

Remove secrets from this loop until the module is added, or add the module in this PR.

Suggested change
- name: Terraform validate modules
run: |
for module in network runner_vm secrets observability; do
terraform -chdir=infra/terraform/modules/$module init -backend=false
terraform -chdir=infra/terraform/modules/$module validate
done
run: |
for module in network runner_vm observability; do
terraform -chdir=infra/terraform/modules/$module init -backend=false
terraform -chdir=infra/terraform/modules/$module validate
done
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/terraform.yml
Line: 36-41

Comment:
**CI validates a `secrets` module that doesn't exist**

The `secrets` module is listed in the loop (`for module in network runner_vm secrets observability`), but `infra/terraform/modules/secrets/` does not exist in the repository. This will cause the "Terraform validate modules" step to fail with a directory-not-found error.

Remove `secrets` from this loop until the module is added, or add the module in this PR.

```suggestion
        run: |
          for module in network runner_vm observability; do
            terraform -chdir=infra/terraform/modules/$module init -backend=false
            terraform -chdir=infra/terraform/modules/$module validate
          done
```

How can I resolve this? If you propose a fix, please make it concise.


- name: Terraform validate environments
run: |
for env in dev prod; do
terraform -chdir=infra/terraform/environments/$env init -backend=false
terraform -chdir=infra/terraform/environments/$env validate
done
5 changes: 3 additions & 2 deletions docs/architecture/start-here.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ Data flows top to bottom: you ingest markets (and optional context), transform t
1. **Bootstrap the environment** – follow `getting-started` to install, load credentials, and smoke test the APIs.
2. **Run the quickstart bot** – `workflows/build-first-bot` glues market fetch → signal → paper execution so you see the system end-to-end.
3. **Deep dive where needed** – branch into the stack-specific docs from the table above.
4. **Promote to production** – use `workflows/promotion-checklist` once your strategy, risk settings, and monitoring are dialed in.
4. **Plan infrastructure topology** – use `basics/infrastructure` and `workflows/terraform-runbook` for OSS infra baseline and environment operations.
5. **Promote to production** – use `workflows/promotion-checklist` once your strategy, risk settings, and monitoring are dialed in.

## Tips for exploring

Expand All @@ -63,4 +64,4 @@ Bookmark this page and `getting-started`; together they give you both the 10,000
- Get hands-on immediately: `getting-started`
- Review infrastructure dependencies: `basics/infrastructure`
- Jump to the quickstart bot: `workflows/build-first-bot`

- Manage IaC operations: `workflows/terraform-runbook`
12 changes: 11 additions & 1 deletion docs/basics/infrastructure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,16 @@ Summarize the external services Neural touches (REST, WebSocket, FIX), their lat
| FIX API | `fix.elections.kalshi.com:8228` | Ultra-low-latency order entry and execution reports | ✅ operational |
| WebSocket | `/trade-api/ws/v2` | Real-time market data stream | ⚠️ requires Kalshi approval |

Latency reference: REST polling at 1s intervals, FIX round-trips ~5–10 ms, WebSocket delivers pushes \<100 ms once enabled.
Latency reference: REST polling at 1s intervals, FIX round-trips ~5–10 ms, WebSocket delivers pushes <100 ms once enabled.

## Deployment split model

Neural infrastructure can be split cleanly by responsibility:

- **Open-source baseline**: Terraform modules for network, runner VM, secrets, and observability.
- **Private runtime**: Environment-specific deployment providers (for example, Daytona-based runtimes) that plug into the shared deployment interface.

This keeps infrastructure reproducible in OSS while preserving proprietary runtime orchestration logic in private repositories.

## Deployment runtime model

Expand Down Expand Up @@ -64,3 +73,4 @@ REST polling (baseline) ─┬─> Strategy / Aggregator ──> TradingClient
- Review execution options: `trading/overview`
- Plan deployment workflows: `workflows/promotion-checklist`
- Build custom runtime integrations: `workflows/deployment-providers`
- Operate Terraform environments: `workflows/terraform-runbook`
3 changes: 2 additions & 1 deletion docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@
"workflows/build-first-bot",
"workflows/promotion-checklist",
"workflows/data-pipeline",
"workflows/deployment-providers"
"workflows/deployment-providers",
"workflows/terraform-runbook"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/workflows/promotion-checklist.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ Consistently iterating through this loop keeps strategies resilient as Kalshi mi
- Wire monitoring pipelines: `workflows/data-pipeline`
- Review execution surfaces: `trading/trading-client`
- Keep iterating on strategy design: `analysis/strategy-foundations`

- Use Terraform operations runbook: `workflows/terraform-runbook`
91 changes: 91 additions & 0 deletions docs/workflows/terraform-runbook.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: 'Terraform Deployment Runbook'
description: 'Bootstrap, plan, apply, and troubleshoot the GCP Terraform environments for Neural.'
---

Use this runbook when operating the reference Terraform stack under `infra/terraform`.

## 1. Prerequisites

- Terraform `1.5.x`
- GCP project with billing enabled
- `gcloud` authenticated to the target project
- IAM permissions for VPC, Compute Engine, Secret Manager, and Monitoring resources

## 2. Bootstrap remote state

Create a dedicated state bucket (one-time):

```bash
gcloud storage buckets create gs://neural-tf-state-prod \
--project <PROJECT_ID> \
--location us-central1 \
--uniform-bucket-level-access

gcloud storage buckets update gs://neural-tf-state-prod --versioning
```

Initialize the production environment with backend config:

```bash
cd infra/terraform/environments/prod
terraform init \
-backend-config="bucket=neural-tf-state-prod" \
-backend-config="prefix=neural/prod"
```

Repeat with a different bucket/prefix for `dev`.

## 3. Plan and apply

Create `terraform.tfvars` from `terraform.tfvars.example`, then run:

```bash
terraform fmt -check -recursive
terraform validate
terraform plan -out=tfplan
terraform apply tfplan
```

Recommended safety controls:

1. Keep `plan` output in PR artifacts before apply.
2. Require manual approval for `prod` applies.
3. Use separate service accounts for `dev` and `prod`.

## 4. Secret injection model

The reference stack provisions Secret Manager containers and grants runner access.
Add secret **versions** outside Terraform to avoid storing secret values in state:

```bash
echo -n "<KALSHI_API_KEY_ID>" | gcloud secrets versions add kalshi-api-key-id --data-file=-
echo -n "<KALSHI_PRIVATE_KEY_PEM>" | gcloud secrets versions add kalshi-private-key-pem --data-file=-
```

In runtime bootstrap scripts, resolve secrets at startup using the runner service account.

## 5. Destroy workflow

```bash
terraform plan -destroy -out=tfdestroy
terraform apply tfdestroy
```

Before destroy:

1. Drain/disable bot workloads.
2. Export required logs and runtime artifacts.
3. Confirm no shared resources are referenced by other environments.

## 6. Troubleshooting

- **`terraform init` backend errors**: verify bucket name, region, and IAM access.
- **Provider auth failures**: run `gcloud auth application-default login` or configure workload identity.
- **Secret access denied**: verify `roles/secretmanager.secretAccessor` on the runner service account.
- **No alert notifications**: ensure `notification_channels` are valid Monitoring channel resource IDs.

## Next

- Module and contract reference: `basics/infrastructure`
- Production promotion checklist: `workflows/promotion-checklist`
1 change: 1 addition & 0 deletions infra/terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Inputs:
- `subnet_cidr` (string)
- `enable_private_google_access` (bool, default `true`)
- `allow_ssh_cidrs` (list(string), default `[]`)
- `ssh_target_tags` (list(string), default `["neural-runner"]`)
- `internal_tcp_ports` (list(string), default `[]`)
- `internal_udp_ports` (list(string), default `[]`)

Expand Down
22 changes: 22 additions & 0 deletions infra/terraform/environments/dev/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions infra/terraform/environments/dev/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
terraform {
required_version = ">= 1.5.0, < 2.0.0"

required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}

# Configure this with backend config flags or a backend.hcl file during init.
# Example:
# terraform init -backend-config="bucket=neural-tf-state-dev" -backend-config="prefix=neural/dev"
backend "gcs" {}
}
Comment on lines +1 to +15
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev stack missing remote backend block

The prod environment declares backend "gcs" {} for remote state, but the dev environment has no backend block and will default to local state. This means the dev state file won't be shared or persisted across team members or CI runs.

If this is intentional (e.g., dev is always local-only), it may be worth adding a comment to document the choice. Otherwise, consider adding the same backend "gcs" {} block so that dev state can also be managed remotely with -backend-config flags, matching the pattern described in the runbook.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 1-10

Comment:
**Dev stack missing remote backend block**

The prod environment declares `backend "gcs" {}` for remote state, but the dev environment has no backend block and will default to local state. This means the dev state file won't be shared or persisted across team members or CI runs.

If this is intentional (e.g., dev is always local-only), it may be worth adding a comment to document the choice. Otherwise, consider adding the same `backend "gcs" {}` block so that dev state can also be managed remotely with `-backend-config` flags, matching the pattern described in the runbook.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.


provider "google" {
project = var.project_id
region = var.region
zone = var.zone
}

module "network" {
source = "../../modules/network"

project_id = var.project_id
region = var.region
network_name = "${var.stack_name}-vpc"
subnet_name = "${var.stack_name}-subnet"
subnet_cidr = var.subnet_cidr

allow_ssh_cidrs = var.allow_ssh_cidrs
}

module "runner" {
source = "../../modules/runner_vm"

project_id = var.project_id
zone = var.zone
instance_name = "${var.stack_name}-runner"
machine_type = var.machine_type
network_self_link = module.network.network_self_link
subnetwork_self_link = module.network.subnetwork_self_link

startup_script = var.startup_script
tags = ["neural-runner", "env-dev"]
}
Comment on lines +35 to +47
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service account name collision across environments

Both dev and prod runner modules rely on the runner_vm module's default service_account_id = "neural-runner" (see infra/terraform/modules/runner_vm/variables.tf:41), since neither environment passes an explicit override. If both environments are deployed to the same GCP project, terraform apply on the second environment will fail with a "service account already exists" error because GCP service account IDs are project-unique.

Consider passing a per-environment service_account_id to disambiguate:

Suggested change
module "runner" {
source = "../../modules/runner_vm"
project_id = var.project_id
zone = var.zone
instance_name = "${var.stack_name}-runner"
machine_type = var.machine_type
network_self_link = module.network.network_self_link
subnetwork_self_link = module.network.subnetwork_self_link
startup_script = var.startup_script
tags = ["neural-runner", "env-dev"]
}
module "runner" {
source = "../../modules/runner_vm"
project_id = var.project_id
zone = var.zone
instance_name = "${var.stack_name}-runner"
machine_type = var.machine_type
network_self_link = module.network.network_self_link
subnetwork_self_link = module.network.subnetwork_self_link
service_account_id = "${var.stack_name}-runner"
startup_script = var.startup_script
tags = ["neural-runner", "env-dev"]
}

The same change is needed in infra/terraform/environments/prod/main.tf:35-47.

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 35-47

Comment:
**Service account name collision across environments**

Both dev and prod runner modules rely on the `runner_vm` module's default `service_account_id = "neural-runner"` (see `infra/terraform/modules/runner_vm/variables.tf:41`), since neither environment passes an explicit override. If both environments are deployed to the same GCP project, `terraform apply` on the second environment will fail with a "service account already exists" error because GCP service account IDs are project-unique.

Consider passing a per-environment `service_account_id` to disambiguate:

```suggestion
module "runner" {
  source = "../../modules/runner_vm"

  project_id           = var.project_id
  zone                 = var.zone
  instance_name        = "${var.stack_name}-runner"
  machine_type         = var.machine_type
  network_self_link    = module.network.network_self_link
  subnetwork_self_link = module.network.subnetwork_self_link
  service_account_id   = "${var.stack_name}-runner"

  startup_script = var.startup_script
  tags           = ["neural-runner", "env-dev"]
}
```

The same change is needed in `infra/terraform/environments/prod/main.tf:35-47`.

How can I resolve this? If you propose a fix, please make it concise.


module "secrets" {
source = "../../modules/secrets"

project_id = var.project_id
secret_ids = var.secret_ids
runner_service_account_email = module.runner.service_account_email
Comment on lines +49 to +54
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

../../modules/secrets doesn't exist - terraform init will fail. The base branch is missing this module despite the commit message mentioning it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 44-49

Comment:
`../../modules/secrets` doesn't exist - terraform init will fail. The base branch is missing this module despite the commit message mentioning it.

How can I resolve this? If you propose a fix, please make it concise.

}
Comment on lines +49 to +55
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing secrets module breaks init/validate

Both the dev and prod environment stacks reference ../../modules/secrets, but this module does not exist anywhere in the repository — not in the base branch (codex/infra-pr2-terraform-gcp-modules), not at HEAD, and not introduced in this PR. The base branch only contains network, runner_vm, and observability modules.

This will cause terraform init to fail with a "module not found" error, which means the CI workflow's "Terraform validate environments" step will also fail. The PR's own validation steps (terraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validate) cannot pass without it.

Either add the secrets module in this PR or remove the module "secrets" references from the environment stacks until the module is introduced.

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/main.tf
Line: 44-50

Comment:
**Missing `secrets` module breaks init/validate**

Both the dev and prod environment stacks reference `../../modules/secrets`, but this module does not exist anywhere in the repository — not in the base branch (`codex/infra-pr2-terraform-gcp-modules`), not at HEAD, and not introduced in this PR. The base branch only contains `network`, `runner_vm`, and `observability` modules.

This will cause `terraform init` to fail with a "module not found" error, which means the CI workflow's "Terraform validate environments" step will also fail. The PR's own validation steps (`terraform -chdir=infra/terraform/environments/dev init -backend=false && terraform -chdir=infra/terraform/environments/dev validate`) cannot pass without it.

Either add the `secrets` module in this PR or remove the `module "secrets"` references from the environment stacks until the module is introduced.

How can I resolve this? If you propose a fix, please make it concise.


module "observability" {
source = "../../modules/observability"

project_id = var.project_id
instance_name = module.runner.instance_name
enable_alert_policy = var.enable_alert_policy
notification_channels = var.notification_channels
}
24 changes: 24 additions & 0 deletions infra/terraform/environments/dev/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
output "runner_instance_name" {
value = module.runner.instance_name
description = "Compute instance name"
}

output "runner_external_ip" {
value = module.runner.instance_external_ip
description = "External IP of runner instance"
}

output "runner_service_account_email" {
value = module.runner.service_account_email
description = "Runner service account email"
}

output "secret_ids" {
value = module.secrets.secret_ids
description = "Provisioned Secret Manager IDs"
}
Comment on lines +16 to +19
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references module.secrets.secret_ids but secrets module doesn't exist

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/outputs.tf
Line: 16-19

Comment:
references `module.secrets.secret_ids` but secrets module doesn't exist

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev outputs missing log_metric_type that prod has - consider adding for consistency:

Suggested change
}
description = "Provisioned Secret Manager IDs"
}
output "log_metric_type" {
value = module.observability.log_metric_type
description = "Log metric used for error alerting"
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: infra/terraform/environments/dev/outputs.tf
Line: 19

Comment:
dev outputs missing `log_metric_type` that prod has - consider adding for consistency:

```suggestion
  description = "Provisioned Secret Manager IDs"
}

output "log_metric_type" {
  value       = module.observability.log_metric_type
  description = "Log metric used for error alerting"
}
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.


output "log_metric_type" {
value = module.observability.log_metric_type
description = "Log metric used for error alerting"
}
9 changes: 9 additions & 0 deletions infra/terraform/environments/dev/terraform.tfvars.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
project_id = "my-gcp-project"
region = "us-central1"
zone = "us-central1-a"

allow_ssh_cidrs = ["35.235.240.0/20"] # IAP TCP tunnel range

# Optional: enable alerting and wire notification channels
# enable_alert_policy = true
# notification_channels = ["projects/my-gcp-project/notificationChannels/1234567890"]
Loading
Loading