Simple GCP Infrastructure as Code for machine learning training workflows.
- VM (e2-medium) - AI/ML training compute instance
- GCS bucket - dataset storage and results
- Labels - resource organization and tracking
Prerequisites:
- Terraform ~> 1.7
- Google Cloud CLI
- GCP project with billing
Setup:
gcloud auth login
gcloud config set project YOUR_PROJECT_IDRun Workflow:
.\run-workflow.ps1 -DurationHours 2 -ProjectId your-gcp-projectNotes:
- The launcher auto-sanitizes the
ownerlabel from your gcloud account (emails likeuser@example.combecomeuser-example-com). - If running Terraform manually, ensure
ownerinterraform.tfvarsmatches GCP label rules: lowercase letters, digits,-or_, max 63 chars.
State storage:
- Terraform state is stored remotely in GCS at
gs://firefly-tfstate-backend/vm/default.tfstate. - The launcher ensures the bucket exists and migrates any local state automatically.
- Deploy - VM + bucket created with proper labels
- Train - VM runs ML training workflow (~15 minutes)
- Monitor - Background monitoring for remainder of time
- Cleanup - Auto-destroy after workflow completion
main.tf- Core infrastructurevariables.tf- Configuration optionsoutputs.tf- Resource infostartup-script.sh- AI training simulationrun-workflow.ps1- One-command launcher
~$0.10 for 2-hour workflow (e2-medium + minimal storage)
If you prefer manual control:
# 1. Setup
terraform init -reconfigure -migrate-state # uses remote GCS backend
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your project_id (and optional labels). Ensure `owner` is label-compliant.
# 2. Deploy
terraform plan
terraform apply
# 3. Wait for training completion (2 hours)
# 4. Cleanup
terraform destroyPerfect for: ML experimentation, model training, automated workflows, cost-effective compute 🤖