Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod injection fails with Istio 1.22.0 but succeeds with Istio 1.21.2 #51071

Closed
2 tasks done
kaiburjack opened this issue May 15, 2024 · 10 comments · Fixed by #51228
Closed
2 tasks done

Pod injection fails with Istio 1.22.0 but succeeds with Istio 1.21.2 #51071

kaiburjack opened this issue May 15, 2024 · 10 comments · Fixed by #51228
Assignees

Comments

@kaiburjack
Copy link

kaiburjack commented May 15, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

After upgrading to Istio 1.22.0 (with the istiod Helm chart) from version 1.21.2, we observe that pod injection for gateways does not work anymore and is aborted with:

admission webhook "object.sidecar-injector.istio.io" denied the request: failed to run injection template: template: inject:21:29: executing "inject" at <.Values.global.proxy.image>: invalid value; expected string

We are overriding the .Values.global.hub to point to a GCP Artifact Registry Remote Repository (to offload Docker image downloading to a private GCP Artifact Registry instance). The concrete value is: europe-west3-docker.pkg.dev/our-gcp-project/docker-hub-remote/istio.
We are also specifying .Values.global.variant to distroless.

EDIT: The overriding of .Values.global.hub or .Values.global.variant does not seem to cause this. The same error(s) happen when not overriding these.

Also pod injection for sidecars fails with:

admission webhook "auto.sidecar-injector.istio.io" denied the request: failed to run injection template: failed applying injection overlay: unmarshal patched pod: json: cannot unmarshal string into Go struct field Probe.spec.containers.readinessProbe.failureThreshold of type int32

even though some containers do not have any readinessProbes defined while others have proper integer readinessProbes.failureThreshold defined.

This setup works with Istio 1.21.2, but fails with above mentioned error message at a new pod creation under Istio 1.22.0.

Istiod itself does start properly (also with the right image registry, repository and tag), but no workload pod can start due to failed injection.

I also reproduced this locally with a kind cluster (no special configuration overrides), just kind create cluster.
Then:

helm install istio-base istio/istio-base --version 1.21.2
helm install istiod istio/istiod --version 1.21.2
helm upgrade --install istio-base istio/istio-base --version 1.22.0
helm upgrade --install istiod istio/istiod --version 1.22.0

After I've done this, I could not create any pods anymore, resulting in the same errors mentioned above.

Also following the "in-place" Istio with Helm upgrade procedure yields the same errors: https://istio.io/latest/docs/setup/upgrade/helm/#in-place-upgrade

Final Analysis

The actual cause of the error messages was the following Helm value override:

pilot:
  cni: false

which should have been:

pilot:
  cni:
    enabled: false

(see comments below for full story)

Version

1.22.0

Additional Information

No response

@howardjohn
Copy link
Member

Someone reported the failureThreshold issue on 1.20 the other day: https://istio.slack.com/archives/C37A4KAAD/p1715623761368409

EDIT: The overriding of .Values.global.hub or .Values.global.variant does not seem to cause this. The same error(s) happen when not overriding these.

Can you maybe show your full values.yaml or the istio-sidecar-injector configmap? That seems odd that with defaults it would fail; surely we would have caught that.

I'll try to repro with the hub set as well

@kaiburjack
Copy link
Author

Thanks for the response!
I am honing in on the actual issue (or rather the value override that caused it, I think).
That means, the following works perfectly without any errors (when no Helm values are overridden):

kind create cluster
helm install istio-base istio/base --version 1.21.2 --namespace istio-system --create-namespace --skip-crds --set base.enableCRDTemplates=true
helm install istiod istio/istiod --version 1.21.2 --namespace istio-system --create-namespace 
helm upgrade --install istio-base istio/base --version 1.22.0 -n istio-system --skip-crds --set base.enableCRDTemplates=true
helm upgrade --install istiod istio/istiod --version 1.22.0 -n istio-system
kubectl run nginx --image=nginx --restart=Never

So it must be some overridden value that makes the upgrade fail. I'll do a bit of binary searching and report back which value is responsible.

@kaiburjack
Copy link
Author

kaiburjack commented May 15, 2024

It's definitely caused by some Helm value override in this (which we had been using successfully so far with Istio 1.21.2):

sidecarInjectorWebhook:
  enableNamespacesByDefault: true
meshConfig:
  accessLogFormat: |
    {
      "authority": "%REQ(:AUTHORITY)%",
      "bytes_received": "%BYTES_RECEIVED%",
      "bytes_sent": "%BYTES_SENT%",
      "downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%",
      "downstream_peer_cert_v_end": "%DOWNSTREAM_PEER_CERT_V_END%",
      "downstream_peer_cert_v_start": "%DOWNSTREAM_PEER_CERT_V_START%",
      "downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%",
      "downstream_tls_cipher": "%DOWNSTREAM_TLS_CIPHER%",
      "downstream_tls_version": "%DOWNSTREAM_TLS_VERSION%",
      "x_envoy_external_address": "%REQ(X-ENVOY-EXTERNAL-ADDRESS)%",
      "duration": "%DURATION%",
      "hostname": "%HOSTNAME%",
      "istio_policy_status": "%DYNAMIC_METADATA(istio.mixer:status)%",
      "method": "%REQ(:METHOD)%",
      "path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%",
      "protocol": "%PROTOCOL%",
      "request_duration": "%REQUEST_DURATION%",
      "request_id": "%REQ(X-REQUEST-ID)%",
      "requested_server_name": "%REQUESTED_SERVER_NAME%",
      "status": "%RESPONSE_CODE%",
      "status_details": "%RESPONSE_CODE_DETAILS%",
      "response_duration": "%RESPONSE_DURATION%",
      "response_tx_duration": "%RESPONSE_TX_DURATION%",
      "response_flags": "%RESPONSE_FLAGS%",
      "route_name": "%ROUTE_NAME%",
      "start_time": "%START_TIME%",
      "upstream_cluster": "%UPSTREAM_CLUSTER%",
      "upstream_host": "%UPSTREAM_HOST%",
      "upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%",
      "upstream_service_time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%",
      "upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%",
      "upstream_request_attempt_count": "%UPSTREAM_REQUEST_ATTEMPT_COUNT%",
      "user_agent": "%REQ(USER-AGENT)%",
      "x_forwarded_for": "%REQ(X-FORWARDED-FOR)%",
      "x_forwarded_host": "%REQ(X-FORWARDED-HOST)%",
      "x_forwarded_proto": "%REQ(X-FORWARDED-PROTO)%",
      "x_forwarded_port": "%REQ(X-FORWARDED-PORT)%",
      "x_b3_traceid": "%REQ(X-B3-TRACEID)%",
      "x_b3_spanid": "%REQ(X-B3-SPANID)%",
      "x_b3_sampled": "%REQ(X-B3-SAMPLED)%",
      "x_b3_parentspanid": "%REQ(X-B3-PARENTSPANID)%",
      "traceparent": "%REQ(TRACEPARENT)%",
      "tracestate": "%REQ(TRACESTATE)%",
      "request_cache_control": "%REQ(CACHE-CONTROL)%",
      "response_cache_control": "%RESP(CACHE-CONTROL)%",
      "etag": "%RESP(ETAG)%",
      "expires": "%RESP(EXPIRES)%",
      "age": "%RESP(AGE)%",
      "if_match": "%REQ(IF-MATCH)%",
      "if_none_match": "%REQ(IF-NONE-MATCH)%",
      "if_modified_since": "%REQ(IF-MODIFIED-SINCE)%",
      "if_unmodified_since": "%REQ(IF-UNMODIFIED-SINCE)%",
      "if_range": "%REQ(IF-RANGE)%",
      "vary": "%RESP(VARY)%",
      "date": "%RESP(DATE)%",
      "last_modified": "%RESP(LAST-MODIFIED)%",
      "x_varnish": "%RESP(X-VARNISH)%",
      "x_cache": "%RESP(X-CACHE)%",
      "cache_status": "%RESP(CACHE-STATUS)%",
      "is_lighthouse": "%REQ(IS-LIGHTHOUSE)%",
      "referer": "%REQ(REFERER)%",
      "accept_encoding": "%REQ(ACCEPT-ENCODING)%",
      "content_encoding": "%RESP(CONTENT-ENCODING)%",
      "content_length": "%RESP(CONTENT-LENGTH)%",
      "content_type": "%RESP(CONTENT-TYPE)%",
      "x_robots_tag": "%RESP(X-ROBOTS-TAG)%",
      "range": "%REQ(RANGE)%",
      "accept_ranges": "%RESP(ACCEPT-RANGES)%",
      "content_range": "%RESP(CONTENT-RANGE)%",
      "request_connection": "%REQ(CONNECTION)%",
      "response_connection": "%RESP(CONNECTION)%",
      "origin": "%REQ(ORIGIN)%",
      "access_control_request_headers": "%REQ(ACCESS-CONTROL-REQUEST-HEADERS)%",
      "access_control_request_method": "%REQ(ACCESS-CONTROL-REQUEST-METHOD)%",
      "access_control_allow_origin": "%RESP(ACCESS-CONTROL-ALLOW-ORIGIN)%",
      "access_control_allow_methods": "%RESP(ACCESS-CONTROL-ALLOW-METHODS)%",
      "access_control_allow_headers": "%RESP(ACCESS-CONTROL-ALLOW-HEADERS)%",
      "access_control_max_age": "%RESP(ACCESS-CONTROL-MAX-AGE)%"
    }
  enablePrometheusMerge: false
  accessLogEncoding: JSON
  accessLogFile: "/dev/stdout"
  disableEnvoyListenerLog: false
  enableTracing: false
  dnsRefreshRate: 300s
  defaultConfig:
    proxyMetadata:
      # Search for EXIT_ON_ZERO_ACTIVE_CONNECTIONS on:
      # https://istio.io/latest/docs/reference/commands/pilot-agent/#envvars
      # Also, see: https://github.com/istio/istio/issues/34855
      EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true'
    # The duration that Envoy will wait for draining existing connections
    # and completing existing requests after receiving a SIGTERM or SIGINT
    # before shutting down.
    # See: https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#ProxyConfig
    terminationDrainDuration: 15s
global:
  variant: distroless # <- use smaller and safer distroless images!
  logAsJson: true
  defaultPodDisruptionBudget:
    enabled: true
  proxy:
    # Delay application startup until the Istio proxy sidecar is ready.
    # This avoids unnecessary errors and container restarts when outbound
    # network connectivity has not been established by the Istio proxy sidecar.
    # See: https://github.com/istio/istio/pull/24737
    holdApplicationUntilProxyStarts: true
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: "2"
        memory: 512Mi
pilot:
  autoscaleEnabled: true
  autoscaleMin: 4 # <- for higher availability in case of a node shutdown
  autoscaleMax: 6
  topologySpreadConstraints:
    # Spread across nodes
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app: istiod
          istio: pilot
    # Spread across zones
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app: istiod
          istio: pilot
  resources:
    requests:
      cpu: 200m
      memory: 256Mi

With this values.yaml applied to the istiod install (and upgrade), the following kubectl pod creation fails with a (new) interesting error:

Error from server: admission webhook "auto.sidecar-injector.istio.io" denied the request: failed to run injection template: template: inject:241:14: executing "inject" at <.InboundTrafficPolicyMode>: can't evaluate field InboundTrafficPolicyMode in type *inject.SidecarTemplateData

I tested this with:

kind create cluster
helm install istio-base istio/base --version 1.21.2 --namespace istio-system --create-namespace --skip-crds --set base.enableCRDTemplates=true
helm install istiod istio/istiod --version 1.21.2 --namespace istio-system --create-namespace -f values-istiod.yaml
helm upgrade --install istio-base istio/base --version 1.22.0 -n istio-system --skip-crds --set base.enableCRDTemplates=true
helm upgrade --install istiod istio/istiod --version 1.22.0 -n istio-system -f values-istiod.yaml
kubectl run nginx --image=nginx --restart=Never

@howardjohn
Copy link
Member

Missing InboundTrafficPolicyMode sounds like a mismatch between the istiod image version and the istio-sidecar-injector contents

@kaiburjack
Copy link
Author

Okay, I am sorry, now I cannot reproduce my last case anymore, myself.
I need to investigate more and find a 100% reproducible example first.

@kaiburjack
Copy link
Author

kaiburjack commented May 15, 2024

Okay, I got it. It is this Helm values override:

istio_cni:
  enabled: false
pilot:
  cni: false # <-- new in 1.22.0 (according to release notes)

The istio_cni.enabled: false was used to explicitly disable the usage of istio-cni (which we did not install). We were using (and still planning to continue using) the istio-init init container, instead of CNI with the istio-validator init container.

With this Helm override (which was also in place) we get this error now consistently for new pods:

Error from server: admission webhook "auto.sidecar-injector.istio.io" denied the request: failed to run injection template: failed applying injection overlay: unmarshal patched pod: json: cannot unmarshal string into Go struct field Probe.spec.containers.readinessProbe.failureThreshold of type int32

@kaiburjack
Copy link
Author

kaiburjack commented May 15, 2024

Oh... and it was at this moment, that I realized, that it probably should have been:

pilot:
  cni:
    enabled: false

and not just:

pilot:
  cni: false

EDIT: I just made a huge mistake when following the upgrade notes: https://istio.io/latest/news/releases/1.22.x/announcing-1.22/change-notes/#deprecation-notices (those are clear, but I followed them wrong!)

@kaiburjack
Copy link
Author

Yepp... That was it... oh my god... :)
Sorry, for all the confusion.

@howardjohn
Copy link
Member

This seems like a terrible error message for that mistake. I am going to re-open this to see if we can improve.

Appreciate all the details here!

@howardjohn howardjohn reopened this May 15, 2024
@howardjohn
Copy link
Member

Oof.

2024-05-24T00:20:42.473431Z     error   failed to process webhook config: could not parse configuration values: json: cannot unmarshal bool into Go value of type map[string]json.RawMessage

then this failure.

Similar problem as #48991

@howardjohn howardjohn self-assigned this May 24, 2024
howardjohn added a commit to howardjohn/istio that referenced this issue May 24, 2024
howardjohn added a commit to howardjohn/istio that referenced this issue May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants