Skip to content

fix(server): handle null spec.template in pool-mode BatchSandbox#910

Merged
Pangjiping merged 1 commit into
alibaba:mainfrom
qingyuppp:fix/pool-platform-extract
May 18, 2026
Merged

fix(server): handle null spec.template in pool-mode BatchSandbox#910
Pangjiping merged 1 commit into
alibaba:mainfrom
qingyuppp:fix/pool-platform-extract

Conversation

@qingyuppp
Copy link
Copy Markdown
Contributor

Problem

Creating a sandbox via pool mode (extensions.poolRef) returns HTTP 500 with {"code":"KUBERNETES::API_ERROR","message":"Failed to create sandbox: 'NoneType' object has no attribute 'get'"}, even though the underlying sandbox is created successfully and reaches Running state.

Pool-mode BatchSandbox CRs have spec.poolRef and spec.taskTemplate but no spec.template. Because the CRD declares template as an optional preserve-unknown-fields object, the Kubernetes API server returns the CR with spec.template: null (key present, value None).

In server/opensandbox_server/services/k8s/workload_mapper.py:85:

spec = workload.get("spec", {})
pod_spec = (
    spec.get("template", {}).get("spec")  # crashes when template is None
    or spec.get("podTemplate", {}).get("spec")
    or {}
)

The default {} is only returned when the key is absent — not when its value is None. So spec.get("template", {}) returns None and the next .get("spec") crashes.

Impact

  • All pool-mode sandbox creations fail with HTTP 500 at the API layer.
  • The K8s-layer BatchSandbox is created and the Pod reaches Running, but clients see an error and have no sandbox_id returned. Resources leak unless cleaned up out-of-band.
  • Affects every caller of POST /sandboxes that uses extensions.poolRef (the documented way to use pools).

Reproduction:

  1. Create a Pool CR and wait for AVAILABLE >= 1.
  2. POST /sandboxes with extensions.poolRef=<pool-name>.
  3. Server returns HTTP 500; kubectl get batchsandbox shows the sandbox is actually ALLOCATED=1 READY=1.

Fix

Treat null and missing the same when reading template / podTemplate. The same or {} pattern is already used correctly in the sibling function _build_sandbox_from_workload (workload_mapper.py:50).

spec = workload.get("spec") or {}
template = spec.get("template") or {}
pod_template = spec.get("podTemplate") or {}
pod_spec = (
    (template.get("spec") if isinstance(template, dict) else None)
    or (pod_template.get("spec") if isinstance(pod_template, dict) else None)
    or {}
)

Added server/tests/k8s/test_workload_mapper.py with 6 cases:

  • pool-mode (null template) — the original bug
  • pool-mode (template key absent)
  • template-mode with full platform — happy path
  • podTemplate alias still works
  • null spec
  • empty workload

Testing

  • Unit tests
  • Manual verification — POST /sandboxes with extensions.poolRef now returns the expected CreateSandboxResponse with state: Running

Breaking Changes

  • None — purely defensive, only changes behavior in cases that previously crashed

Checklist

  • Linked Issue or clearly described motivation
  • Added/updated docs — N/A, bug fix only
  • Added/updated tests
  • Security impact considered — none
  • Backward compatibility considered — fully backwards compatible

In pool mode (extensions.poolRef), BatchSandbox CRs have spec.poolRef
and spec.taskTemplate but no spec.template. The K8s API returns the CR
with spec.template: null because the CRD declares it as an optional
preserve-unknown-fields object.

_extract_platform_from_workload did spec.get("template", {}).get("spec"),
which only returns {} when the key is absent — not when its value is
None — so the second .get() crashes with
'NoneType' object has no attribute 'get'.

The sandbox is actually created and reaches Running, but the server
fails to build the response and returns HTTP 500 to the client.

Fix: treat null and missing as the same case when reading template /
podTemplate. Same pattern is already used correctly in
_build_sandbox_from_workload (workload_mapper.py:50).

Add regression tests covering pool mode (null template, missing
template), template-mode platform extraction, podTemplate alias, null
spec, and empty workload.
Copilot AI review requested due to automatic review settings May 18, 2026 09:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes a server-side crash when pool-mode BatchSandbox workloads have spec.template: null, ensuring platform extraction is defensive and pool-mode sandbox creation no longer fails with HTTP 500.

Changes:

  • Hardened _extract_platform_from_workload to treat missing/None spec.template and spec.podTemplate as empty objects.
  • Added regression/unit tests covering null/missing template, null spec, and empty workload cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
server/opensandbox_server/services/k8s/workload_mapper.py Makes platform extraction resilient to None template/podTemplate in dict workloads.
server/tests/k8s/test_workload_mapper.py Adds regression tests for the spec.template: None crash and related edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/opensandbox_server/services/k8s/workload_mapper.py
@Pangjiping Pangjiping added bug Something isn't working component/server labels May 18, 2026
@Pangjiping Pangjiping self-assigned this May 18, 2026
Copy link
Copy Markdown
Collaborator

@Pangjiping Pangjiping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pangjiping Pangjiping merged commit ef13d88 into alibaba:main May 18, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working component/server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants