Unstable OpenShift builds after bumping Kubernetes Client to 6.4 #2024

manusa · 2023-01-26T10:48:55Z

Describe the bug

After the recent upgrade to Kubernetes Client 6.4.0 (#1988) we're getting random errors when using the OpenShift Build goal/task:

Failed to execute the build: Unable to build the image using the OpenShift build service: Can't instantiate binary build, due to error reading/writing stream. Can be caused if the output stream was closed by the server.

Eclipse JKube version

1.11-SNAPSHOT

Component

OpenShift Maven Plugin

Steps to reproduce

Perform OpenShift Builds with SNAPSHOT version, especially for big images.

Expected behavior

Builds should complete successfully

Tasks

deps: rollback to Kubernetes Client 6.1.1 #2025
Fix issue in Kubernetes Client
Bump to latest Kubernetes Client release

The text was updated successfully, but these errors were encountered:

manusa · 2023-01-26T11:08:30Z

/cc @shawkins for awareness

shawkins · 2023-01-26T12:29:15Z

Is this the debug issue?

manusa · 2023-01-26T12:38:22Z

I'm out of context now, can't recall what the debug issue is. If you mean the one about the logs, no it's not the same.

In this one we're getting some sort of timeouts when sending the binary file to OpenShift. I'm preparing a demo for tomorrow that requires other things from our SNAPSHOT and couldn't further investigate the issue 😓

I'll probably spend some time on this on Monday.

manusa · 2023-02-21T14:35:20Z

The TimeoutException stacktrace:

io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:542)
	at io.fabric8.openshift.client.dsl.internal.build.BuildConfigOperationsImpl.submitToApiServer(BuildConfigOperationsImpl.java:267)
	at io.fabric8.openshift.client.dsl.internal.build.BuildConfigOperationsImpl.fromFile(BuildConfigOperationsImpl.java:165)
	at io.fabric8.openshift.client.dsl.internal.build.BuildConfigOperationsImpl.fromFile(BuildConfigOperationsImpl.java:63)
	at io.fabric8.openshift.BuildIT.buildImage(BuildIT.java:118)
...
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:519)
	... 125 more

manusa · 2023-02-21T14:43:13Z

It seems that the OperationSupport#waitForResult call is using the global config instead of the one provided by the context at BuildConfigOperationsImpl#submitToApiServer.

Probably introduced by fabric8io/kubernetes-client#4678

shawkins · 2023-02-21T14:55:37Z

@manusa there are too many paths for timeout handling... there was a change to the OperationSupport to enforce the readTimeout client side as okhttp in particular was prone to enqueing a request and not enforcing the timeout while enqueued. Rather than updating the httpclient to expose it's config, given the other changes we've made, it would make more sense to move the enforcement of this out of OperationSupport and into the Standard httpclient logic handling of sendAsync.

manusa · 2023-02-21T14:56:55Z

I'm working on the fix, it really doesn't make sense that the submitToApiServer calls the waitForResult method since it should wait indefinitely.
I'll be sending a PR tomorrow morning (CET).

shawkins · 2023-02-21T15:06:36Z

I'm working on the fix, it really doesn't make sense that the submitToApiServer calls the waitForResult method since it should wait indefinitely.

It's no different than the other paths that call sendAsync - it's a blocking operation that is expected to complete within some timeout. The issue is that waitForResult is referencing the default timeout value - and that's a problem we could have with any of the other code paths (none of which are currently affected) if we don't move (or remove) the OperationSupport enforcement. You could make the case for removal if we believe that the changes to the okhttp threading defaults are sufficient for addressing this and that it's unlikely to be a problem with the other httpclients.

manusa · 2023-02-21T15:12:49Z

it's a blocking operation that is expected to complete within some timeout

I'm not sure there is a specific timeout value in this case. What should we default this value to? This is an OpenShift S2i build, I'm not aware of any restrictions in terms of timeout for these scenarios. And I would rather not put myself in the position of imposing one arbitrary value. IMO the timeout should happen server-side.

shawkins · 2023-02-21T16:07:07Z

I'm not sure there is a specific timeout value in this case.

It defaults to 0, but is settable by the user here https://github.com/fabric8io/kubernetes-client/blob/1d3e263cc6216f4a3adad6b677539186179f41c2/openshift-client/src/main/java/io/fabric8/openshift/client/dsl/internal/build/BuildConfigOperationsImpl.java#L244

It's then set on the httpclient https://github.com/fabric8io/kubernetes-client/blob/1d3e263cc6216f4a3adad6b677539186179f41c2/openshift-client/src/main/java/io/fabric8/openshift/client/dsl/internal/build/BuildConfigOperationsImpl.java#L260 and in the following line.

Setting the readTimeout overlaps with the behavior of RequestConfig.getRequestTimeout.

IMO the timeout should happen server-side.

Just to make sure we're on the same page, all the places that we're manipulating the httpclient reade/write timeouts the expectation it that it will be enforced by the httpclient. In particular the write timeout is not enforced by JDK, it is enforced as the sum of read and write timeouts by jetty (which may confuse the meaning of either being a 0/indefinite value), and as an idle write timeout on vertx (that seems incorrect at first glance). Given the state of things, I'd definitely be in favor of getting rid of this write timeout from the HttpClient. I'm not sure about how consistently the readtimeout is treated - we know it won't be enforced by okhttp while the request is enqueued.

manusa · 2023-02-22T05:58:21Z

It defaults to 0, but is settable by the user here https://github.com/fabric8io/kubernetes-client/blob/1d3e263cc6216f4a3adad6b677539186179f41c2/openshift-client/src/main/java/io/fabric8/openshift/client/dsl/internal/build/BuildConfigOperationsImpl.java#L244

OK, thought that these values were not applicable from this scenario. Then waitForResult needs some refactoring and probably be moved elsewhere. I'll update my changes.

IMO the timeout should happen server-side.

This statement refers exclusively to the S2I binary build scenario_, more specifically to the case where the user doesn't provide a timeout._ Meaning that our client-side read/write timeouts should default to 0 -run indefinitely-.

Setting the readTimeout overlaps with the behavior of RequestConfig.getRequestTimeout.

I'm not even sure what requestTimeout stands for. I understand connection timeout and read timeout, and I can try to understand write timeout. So possible assumptions are a) the sum of the values (IMO wrong), b) the minimum of the sum of connect+read and connect+write, c) something else.

I think that to make things consistent RequestConfig should admit customized values for connect and read (and maybe write if we keep this setting), instead of a totally new field.

Given the state of things, I'd definitely be in favor of getting rid of this write timeout from the HttpClient.

+1

shawkins · 2023-02-22T12:28:43Z

OK, thought that these values were not applicable from this scenario. Then waitForResult needs some refactoring and probably be moved elsewhere. I'll update my changes.

Prior to optional the enforcement of the timeout waitForResult was simply the central method to make calls blocking so that the same exception handling did not end up in every blocking location.

I'm not even sure what requestTimeout stands for.

Prior to adding the waitForResult check, which more closely matches the full request cycle, it was treated as the default read timeout.

manusa added the bug Something isn't working label Jan 26, 2023

manusa self-assigned this Jan 26, 2023

manusa mentioned this issue Jan 26, 2023

deps: rollback to Kubernetes Client 6.1.1 #2025

Merged

17 tasks

manusa mentioned this issue Feb 10, 2023

chore(deps): Bump openshift-client from 6.1.1 to 6.4.1 #2049

Closed

manusa removed their assignment Feb 15, 2023

manusa mentioned this issue Feb 17, 2023

OpenShift deployment fails while instantiating the build quarkusio/quarkus#31228

Closed

manusa self-assigned this Feb 20, 2023

manusa mentioned this issue Feb 22, 2023

fix: BuildConfigs.instantiateBinary().fromFile() does not time out fabric8io/kubernetes-client#4899

Merged

11 tasks

shawkins mentioned this issue Feb 23, 2023

Timeout consolidation fabric8io/kubernetes-client#4911

Closed

manusa closed this as completed in fabric8io/kubernetes-client#4899 Feb 28, 2023

manusa mentioned this issue Mar 9, 2023

When sending a binary file, the inNamespace property is ignored fabric8io/kubernetes-client#4958

Closed

manusa mentioned this issue May 11, 2023

deps: bump Kubernetes Client to 6.6.2 #2162

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable OpenShift builds after bumping Kubernetes Client to 6.4 #2024

Unstable OpenShift builds after bumping Kubernetes Client to 6.4 #2024

manusa commented Jan 26, 2023 •

edited

manusa commented Jan 26, 2023

shawkins commented Jan 26, 2023

manusa commented Jan 26, 2023 •

edited

manusa commented Feb 21, 2023 •

edited

manusa commented Feb 21, 2023 •

edited

shawkins commented Feb 21, 2023

manusa commented Feb 21, 2023 •

edited

shawkins commented Feb 21, 2023 •

edited

manusa commented Feb 21, 2023

shawkins commented Feb 21, 2023

manusa commented Feb 22, 2023

shawkins commented Feb 22, 2023

Unstable OpenShift builds after bumping Kubernetes Client to 6.4 #2024

Unstable OpenShift builds after bumping Kubernetes Client to 6.4 #2024

Comments

manusa commented Jan 26, 2023 • edited

Describe the bug

Eclipse JKube version

Component

Steps to reproduce

Expected behavior

Tasks

manusa commented Jan 26, 2023

shawkins commented Jan 26, 2023

manusa commented Jan 26, 2023 • edited

manusa commented Feb 21, 2023 • edited

manusa commented Feb 21, 2023 • edited

shawkins commented Feb 21, 2023

manusa commented Feb 21, 2023 • edited

shawkins commented Feb 21, 2023 • edited

manusa commented Feb 21, 2023

shawkins commented Feb 21, 2023

manusa commented Feb 22, 2023

shawkins commented Feb 22, 2023

manusa commented Jan 26, 2023 •

edited

manusa commented Jan 26, 2023 •

edited

manusa commented Feb 21, 2023 •

edited

manusa commented Feb 21, 2023 •

edited

manusa commented Feb 21, 2023 •

edited

shawkins commented Feb 21, 2023 •

edited