Skip to content

Nebius provider: boot disk creation fails with BootCpuArchitecture is invalid #111

@RaleighSF

Description

@RaleighSF

Summary

Creating any Nebius-backed workspace via Brev fails immediately with:

rpc error: code = InvalidArgument desc = BootDisk.BootCpuArchitecture is invalid (type: Error, retryable: true)

Reproduced consistently on 2026-04-20 for SKU gpu-h200-sxm.1gpu-16vcpu-200gb via both direct API POST and the production web console at brev.nvidia.com.

Root cause

v1/providers/nebius/instance.go::buildDiskCreateRequest constructs a compute.DiskSpec with Size, Type, and Source fields but does not set SourceImageCpuArchitecture.

Nebius's proto (nebius/api/nebius/compute/v1/disk.proto) declares:

enum SourceImageCPUArchitecture {
  SOURCE_IMAGE_CPU_UNSPECIFIED = 0;
  AMD64 = 1;
  ARM64 = 2;
}

SourceImageCPUArchitecture source_image_cpu_architecture = 9;

When the field is unset, it defaults to SOURCE_IMAGE_CPU_UNSPECIFIED, which Nebius rejects with the error above. Since buildDiskCreateRequest is shared across every Nebius SKU code path, this affects all Nebius-backed workspaces (H100 SXM, H200 SXM, 1-GPU and 8-GPU variants).

Reproduction

  1. Create any Nebius workspace via brev.nvidia.com or via a direct POST to /api/organizations/{org}/workspaces.
  2. Workspace lands in status: FAILURE immediately with the error above in statusMessage.
  3. brev reset <workspace> returns rpc error: code = Internal desc = not implemented — no client-side remediation path.

Captured working-shape payload from the web console (still produces FAILURE because the bug is server-side in the Brev→Nebius call, not in the payload from the client to Brev):

{
  "name": "cosmos-reason-lab",
  "workspaceGroupId": "brev-nebius-prod",
  "workspaceTemplateId": "4nbb4lg2s",
  "instanceType": "gpu-h200-sxm.1gpu-16vcpu-200gb",
  "diskStorage": "500Gi",
  "workspaceVersion": "v1",
  "vmBuild": {"forceJupyterInstall": true}
}

Proposed fix

In buildDiskCreateRequest, set the CPU architecture on the disk spec before attaching the image source:

baseReq.Spec.SourceImageCpuArchitecture = compute.SourceImageCPUArchitecture_AMD64

For ARM images, derive from the image family name or from an attrs.Architecture field if surfaced.

Impact

Blocks all Nebius-backed workspace creation for affected orgs. Nebius is the only Brev provider surfacing stop/start-capable H100/H200 SKUs in many orgs' catalogs (Shadeform SKUs are stoppable: false), so this effectively blocks modern-GPU demo usage requiring stop/start.

Environment

  • Brev CLI: v0.6.322 (latest)
  • Org IDs affected: confirmed on org-3BzqVpk4eldrvOy47zcCQOhCFHq
  • Failure timestamps: 2026-04-20 / 2026-04-21 UTC
  • SKUs tested: gpu-h200-sxm.1gpu-16vcpu-200gb

Related gaps

  • brev reset returns "not implemented" for the Nebius provider — worth tracking separately.
  • brevdev/brev-cli pkg/instancetypes/instancetypes.go last updated 2024-05-30; brev start --gpu rejects every modern SKU returned by brev search gpu (separate issue in brev-cli).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions