Skip to content

[WAITING] chore: Upgrade to kubernetes v36 and fix ApiClient.__new__ incompatibility#260

Draft
morgan-wowk wants to merge 1 commit into
kubernetes-pin-v35from
kubernetes-v36-compat
Draft

[WAITING] chore: Upgrade to kubernetes v36 and fix ApiClient.__new__ incompatibility#260
morgan-wowk wants to merge 1 commit into
kubernetes-pin-v35from
kubernetes-v36-compat

Conversation

@morgan-wowk
Copy link
Copy Markdown
Collaborator

@morgan-wowk morgan-wowk commented May 23, 2026

Problem

Executions fail on kubernetes package v36.0.0 with the error 403 - User "system:anonymous.

Before merging

⚠️ Check the status of the bugs with v36 mentioned in the comments. Pin to the latest patch of v36 because we know v36.0.0 has breaking changes.

  1. Read the full breaking changes from the changelog when ready to upgrade to v36.
  2. Upgrade our code as necessary according to the breaking changes.

⚠️ Right now, no patch exists beyond v36.0.0 and we should wait for that; Which might allow us to remove the manual changes to cloud_pipelines_backend/launchers/kubernetes_launchers.py .

⚠️ The current uv.lock in this PR is not meant to be the final version. We need to wait for the 7 day cooldown on package upgrades before we can reliably upgrade only kubernetes.

Changes

kubernetes v36.0.0 changed ApiClient.__deserialize_model() to access
self.configuration. Constructing ApiClient via new() bypasses init
and leaves configuration unset, causing AttributeError at deserialization time.

Replace ApiClient.new(ApiClient) with ApiClient() in both
_kubernetes_serialize and _kubernetes_deserialize so that init runs
and configuration is properly initialized.

Copy link
Copy Markdown
Collaborator Author

morgan-wowk commented May 23, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Copy Markdown
Collaborator Author

Local-dev RCA: Kubernetes Python Client v36 — Execution Failures

Date: 2026-05-22
Severity: Local only. All cache-miss pipeline executions failing.


Summary

All pipeline executions failed with 403 Forbidden errors immediately after a routine
dependency sync upgraded kubernetes (the Python client) from v35.x to v36.0.0.
A second, latent bug in the same upgrade caused an AttributeError crash when
deserializing Kubernetes API responses. Both failures were caused by breaking changes
introduced in the kubernetes-client/python v36.0.0 release without an official
migration guide. A pin to <36 was applied as an immediate fix; this branch contains a compatibility
workaround for v36 pending the upstream patch release.


Timeline

Time (UTC) Event
2026-05-20 kubernetes-client/python v36.0.0 published to PyPI
2026-05-21 Upstream fix PR #2585 merged (one day after the release)
2026-05-22 ~17:08 uv sync run after a routine submodule bump; venv upgraded to kubernetes v36.0.0
2026-05-22 ~17:15 Execution attempts begin failing: 403 - User "system:anonymous"
2026-05-22 Root cause identified; exec-credential workaround applied to launcher
2026-05-22 AttributeError: 'ApiClient' object has no attribute 'configuration' surfaced as second failure mode
2026-05-22 Both workarounds applied and verified; pin-to-v35 and v36-compat branches created

Root Cause

Break 1 — Authentication silently dropped (403 system:anonymous)

Source: openapi-generator v6.6.0 upgrade bundled in the v36 release regenerated
ApiClient. The auth_settings() method was changed to check api_key['BearerToken'],
but the config loaders (kube_config.py, incluster_config.py) still write exec
credential tokens to api_key['authorization'].

When update_params_for_auth() is called with auth_settings=['BearerToken'], it finds
no value and omits the Authorization header entirely. The API server sees an
unauthenticated request and returns 403 - User "system:anonymous".

This regression was silent — no exception, no warning, just all API calls
proceeding as anonymous.

Tracked upstream: kubernetes-client/python #2584
Fixed in: PR #2585 (merged
2026-05-21, one day after the release; not yet in a patch release)

Workaround applied to the launcher's create_gke_launcher() after
new_client_from_config_dict:

# Workaround: kubernetes Python v36 bug — exec credentials are stored under
# api_key['authorization'] but auth_settings() checks api_key['BearerToken'].
_auth = k8s_client.configuration.api_key.get('authorization', '')
if _auth:
    _token = _auth[len('Bearer '):] if _auth.startswith('Bearer ') else _auth
    k8s_client.configuration.api_key['BearerToken'] = _token
    k8s_client.configuration.api_key_prefix['BearerToken'] = 'Bearer'
    _orig_hook = k8s_client.configuration.refresh_api_key_hook
    if _orig_hook:
        def _hooked_refresh(cfg, _orig=_orig_hook):
            _orig(cfg)
            _t = cfg.api_key.get('authorization', '')
            cfg.api_key['BearerToken'] = _t[len('Bearer '):] if _t.startswith('Bearer ') else _t
        k8s_client.configuration.refresh_api_key_hook = _hooked_refresh

This workaround can be removed once PR #2585 ships in v36.0.1+.


Break 2 — Deserialization crash (AttributeError: 'ApiClient' object has no attribute 'configuration')

Source: The same openapi-generator regeneration changed ApiClient.__init__ to
call Configuration.get_default_copy() instead of Configuration(). A helper in
kubernetes_launchers.py used ApiClient.__new__(ApiClient) to construct a
"shallow" client for serialization/deserialization purposes, bypassing __init__.
In v36, __deserialize_model accesses self.configuration unconditionally, which
is never set when __init__ is skipped.

Fix: Replace ApiClient.__new__(ApiClient) with ApiClient() everywhere:

# Before (broken in v36):
def _kubernetes_deserialize(obj_dict: dict[str, Any], cls: typing.Type[_T]) -> _T:
    shallow_client = k8s_client_lib.ApiClient.__new__(k8s_client_lib.ApiClient)
    return shallow_client._ApiClient__deserialize(obj_dict, cls)

def _kubernetes_serialize(obj) -> dict[str, Any]:
    shallow_client = k8s_client_lib.ApiClient.__new__(k8s_client_lib.ApiClient)
    return shallow_client.sanitize_for_serialization(obj)

# After:
def _kubernetes_deserialize(obj_dict: dict[str, Any], cls: typing.Type[_T]) -> _T:
    client = k8s_client_lib.ApiClient()
    return client._ApiClient__deserialize(obj_dict, cls)

def _kubernetes_serialize(obj) -> dict[str, Any]:
    client = k8s_client_lib.ApiClient()
    return client.sanitize_for_serialization(obj)

Tracked upstream: kubernetes-client/python #2582


Contributing Factors

  1. No official migration guide. The v36.0.0 release notes describe the openapi-generator
    upgrade as a source of change but do not enumerate the specific auth or ApiClient
    constructor semantics that changed. Engineers had no way to anticipate these breaks
    from release notes alone.
  2. The auth failure was silent. No exception was raised — the client simply sent
    unauthenticated requests, making the failure look like a permissions or token
    expiry problem rather than a client bug. Initial debugging focused on token
    refresh and cluster context before the client itself was suspected.
  3. Upstream fix shipped after the release. PR #2585 was merged the day after
    v36.0.0 was published, meaning there was a window where the only released version
    was broken for exec credential users.

Fix

The kubernetes-v36-compat branch will become the active pin once upstream PR #2585
ships in a patch release and we confirm no further regressions.


Prevention

  1. Add a regression test for the auth path. A unit test that constructs a
    Configuration with a token in api_key['authorization'] and asserts that
    update_params_for_auth(['BearerToken']) produces a non-empty Authorization
    header would have caught Break 1 immediately.
  2. Wrap _kubernetes_serialize / _kubernetes_deserialize with a smoke test.
    A test that round-trips any V1Pod through both functions would have caught
    Break 2.
  3. Upper-bound major kubernetes client versions. The >=33.1.0 lower-bound-only
    constraint allowed a silent major version jump. >=33.1.0,<36 (or narrower) gives
    a forcing function to explicitly review breaking changes before upgrading.
  4. When reverting uv.lock, always re-run uv sync. A reverted lockfile leaves
    the venv and lockfile out of sync. The correct sequence after git checkout uv.lock
    is uv sync --frozen to bring the venv back in line with the now-reverted lockfile.

References

kubernetes v36.0.0 changed ApiClient.__deserialize_model() to access
self.configuration. Constructing ApiClient via __new__() bypasses __init__
and leaves configuration unset, causing AttributeError at deserialization time.

Replace ApiClient.__new__(ApiClient) with ApiClient() in both
_kubernetes_serialize and _kubernetes_deserialize so that __init__ runs
and configuration is properly initialized.
@morgan-wowk morgan-wowk force-pushed the kubernetes-v36-compat branch from 0963050 to bca04f8 Compare May 23, 2026 02:23
@morgan-wowk morgan-wowk changed the title Upgrade to kubernetes v36 and fix ApiClient.__new__ incompatibility [WAITING] chore: Upgrade to kubernetes v36 and fix ApiClient.__new__ incompatibility May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant