gpunow lets you quickly spin up ephemeral GPU VMs or cluster in GCP. It's great for running one-off expermentation, training,
or other GPU-heavy workloads.
# Make sure you have just installed
$ just build
# Install into ~/.local/bin and configs into ~/.config/gpunow
./bin/gpunow install
# Start a new shell, or exec bash/zsh so PATH is refreshed.
# Authenticate with GCP
gcloud auth application-default login
# Spin up a quick 3-node cluster
$ gpunow create my-cluster -n 3 --start
✓ Created cluster my-cluster (3 instances) in local state
# Preview estimated cost before creating resources
$ gpunow create my-cluster -n 3 --estimate-cost
# See status
➜ gpunow status
# SSH to a specific instance
$ gpunow ssh my-cluster/0
my-cluster-0$
# By default, all instances terminate in 12 hours, to terminate manually
# and optionally delete all resources
gpunow stop my-cluster [--delete] [--delete-disks]- GCP credentials with Compute Engine permissions.
- Required APIs enabled for your project (see below).
- Go 1.25.6 (to build from source)
Required:
compute.googleapis.com(Compute Engine API)
Required only when using gpunow create --estimate-cost:
cloudbilling.googleapis.com(Cloud Billing Catalog API)
Enable with gcloud:
gcloud services enable compute.googleapis.com --project <your-project-id>
gcloud services enable cloudbilling.googleapis.com --project <your-project-id> # cost estimation onlyEnable in Console:
- Open
APIs & Servicesfor your project. - Enable
Compute Engine API. - If you use cost estimation, also enable
Cloud Billing API.
Recommended (user credentials):
gcloud auth application-default login
gcloud auth application-default set-quota-project <your-project-id>just buildBinary output: ./bin/gpunow
Install gpunow and initialize a home profile directory:
./bin/gpunow installCluster:
./bin/gpunow create my-cluster -n 3
./bin/gpunow start my-cluster
./bin/gpunow create my-cluster -n 3 --start
./bin/gpunow create my-cluster -n 3 --estimate-cost
./bin/gpunow create my-cluster -n 3 --estimate-cost --refresh
./bin/gpunow status my-cluster
./bin/gpunow update my-cluster --max-hours 24
./bin/gpunow stop my-cluster --delete
./bin/gpunow stop my-cluster --delete --keep-disks
./bin/gpunow stop my-cluster --delete --delete-disksReference a node using <cluster>/<index> or <cluster>-<index>:
./bin/gpunow ssh my-cluster/0
./bin/gpunow ssh my-cluster/2
./bin/gpunow ssh my-cluster-2SSH and SCP:
./bin/gpunow ssh my-cluster/0
./bin/gpunow ssh my-cluster/2 -u mo -- nvidia-smi
./bin/gpunow scp ./local.txt my-cluster/2:/home/mo/
./bin/gpunow scp my-cluster/0:/home/mo/logs.txt ./
./bin/gpunow scp -r -P 2222 ./dir my-cluster/2:/home/mo/
./bin/gpunow scp -- -weird ./local.txt # use -- to separate flags from pathsState:
./bin/gpunow state
./bin/gpunow state rawStatus:
./bin/gpunow status
./bin/gpunow status syncConfiguration profiles live under profiles/<name> and contain:
config.tomlcloud-init.yamlsetup.shzshrc
The default profile is profiles/default. Use -p/--profile to select another profile:
./bin/gpunow create my-cluster -n 3 -p gpu-l4
./bin/gpunow start my-cluster -p gpu-l4gpunow resolves its home directory in this order:
GPUNOW_HOMEif set.~/.config/gpunow.
Profiles are read from <home>/profiles, and state is written to <home>/state/state.json.
Use gpunow install to create ~/.config/gpunow/profiles/default.
Key settings in config.toml:
- Schema version (
version) - Project and zone
- Instance defaults (machine type, GPU type/count, max run hours)
- Network defaults and exposed ports
- Service account and scopes
- Disk image and size
- Optional hostname domain for FQDN hostnames (
instance.hostname_domain) - Optional SSH identity file (
ssh.identity_file)
- Each cluster gets its own VPC and subnet.
- All cluster nodes get ephemeral public IPv4 addresses (destroyed with the VM).
gpunow sshandgpunow scpconnect directly to each node.- Firewall rules apply to all cluster nodes.
- Host-level
ufwis enabled and allows SSH (22/tcp) by default. - Instance lifecycle states are tracked in local state as:
TERMINATED -> STARTING -> PROVISIONING -> READY -> TERMINATING -> TERMINATED. - During first boot, cloud-init installs a local readiness sentinel on each VM:
http://<instance-public-ip>:34223/returns one ofready,running, orerror. gpunowensures both GCP firewall and hostufwallow34223/tcpfor readiness probes.gpunow startwaits for sentinelreadybefore marking an instanceREADY.gpunow sshchecks instance lifecycle state and waits forREADYwhen needed.- Network defaults control additional allowed ports when configured.
- Hostnames: GCE requires a fully qualified domain name (FQDN) if you set
instance.hostname_domain. Leave it empty to use the default internal DNS hostname derived from the instance name. gpunow create --estimate-costestimates VM core/RAM, GPU, and boot disk pricing using the Cloud Billing Catalog API.- Pricing data is cached at
<home>/state/pricing-cache.jsonand reused automatically. - Use
--refreshwith--estimate-costto force re-download of pricing data. - Estimates intentionally exclude egress, discounts/credits, taxes, and OS/license premiums.
go/: Go source codeprofiles/: profile templatesVERSION: version string used at build timejustfile: build/test helpers
just test
just fmt
just vetCut a release tag and trigger GitHub Actions:
just release [version]If no version is provided, just release bumps the patch version from VERSION (strips -dev), commits the change, creates an annotated vX.Y.Z tag, and pushes the commit and tag to origin. GitHub Actions runs tests and publishes release artifacts for darwin/linux and amd64/arm64.
Local release troubleshooting (same steps as GitHub Actions) can be run with:
just release-local [version]This runs tests and builds release artifacts into dist/ for darwin/linux and amd64/arm64.
Service account alternative:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"Required permissions (at minimum):
roles/compute.admin(instances, disks, networks, firewall rules)roles/iam.serviceAccountUser(attach the configured service account)
Troubleshooting invalid_grant:
- Re-run
gcloud auth application-default login. - If it persists, revoke and re-login:
gcloud auth application-default revoke
gcloud auth application-default login