Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pipe status when the pod is full of exceptions #4977

Closed
matzew opened this issue Dec 7, 2023 · 6 comments · Fixed by #5020
Closed

Improve pipe status when the pod is full of exceptions #4977

matzew opened this issue Dec 7, 2023 · 6 comments · Fixed by #5020
Labels
area/observability Logging, monitoring and tracing
Milestone

Comments

@matzew
Copy link
Member

matzew commented Dec 7, 2023

Requirement

As a developer I want to see on the k get pipes.camel.apache.org if things are ok

Problem

I have a broken Kamelet, in a pipe. see here for details: apache/camel-kamelets#1785

The pod for that aws-s3-source-pipe is not working, since it is full of exceptions.

Now, checking the pipe status:

k get pipes.camel.apache.org 
NAME                 PHASE   REPLICAS
aws-s3-source-pipe   Ready   1

In fact the pipe is not ready, since its pod has an exceptions and it should be set to ready:false, with a reason for that, like other kube resources do.

As the reason some text from the exception could be used.

But in order to learn more about the error I have to check the log of the pod, and do not see this on a kube-native level (e.g. via CRs)

Proposal

Reflect expection and do not set the status to READY, but report that it is NOT ready, and reason the exception details

Open questions

No response

@matzew matzew added the kind/feature New feature or request label Dec 7, 2023
@squakez
Copy link
Contributor

squakez commented Dec 7, 2023

I think the error is cascaded from the Integration (which is the custom resource in charge to run the application) to the Pipe. We'll have a look to ensure this is still the case, thanks for reporting.

@squakez squakez added kind/bug Something isn't working area/kamelets and removed kind/feature New feature or request labels Dec 7, 2023
@squakez
Copy link
Contributor

squakez commented Jan 4, 2024

I had a look and this is what happening. The Pipe creates an Integration which finally ends up in spinning up a Pod. The Pod is tried to restart if it fails for certain reasons:

k get pods
NAME                                  READY   STATUS             RESTARTS      AGE
aws-s3-source-pipe-7568b69c9b-q9gzx   0/1     CrashLoopBackOff   6 (42s ago)   6m31s

So, the Integration, therefore the Pipe, are having a very short period of time, likely a few seconds while the Pod is trying to restart that is in Running state, but eventually turn to the correct error state:

$ k get pipe -w
NAME                 PHASE   REPLICAS
aws-s3-source-pipe   Error   1
aws-s3-source-pipe   Ready   1
aws-s3-source-pipe   Error   1
aws-s3-source-pipe   Ready   1
aws-s3-source-pipe   Error   1
aws-s3-source-pipe   Ready   1
aws-s3-source-pipe   Error   1
aws-s3-source-pipe   Ready   1
aws-s3-source-pipe   Error   1

I think this is the expected behavior in the sense that the Integration is running successfully until the Pod crashes. I am not sure if we can change this behavior, and, if that would make sense considering that eventually the Pipe and Integration are correctly set their status. Wdyt @lburgazzoli ?

@squakez squakez added area/observability Logging, monitoring and tracing and removed kind/bug Something isn't working area/kamelets labels Jan 4, 2024
squakez added a commit to squakez/camel-k that referenced this issue Jan 4, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
squakez added a commit to squakez/camel-k that referenced this issue Jan 4, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
@squakez
Copy link
Contributor

squakez commented Jan 4, 2024

I applied a fix that should close this issue. However, that would work when the user defines a readiness probe (via health trait for instance). The Integration (hence, the Pipe) won't be moved to the running phase until the Pod ready condition is reached.

squakez added a commit to squakez/camel-k that referenced this issue Jan 5, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
squakez added a commit to squakez/camel-k that referenced this issue Jan 5, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
squakez added a commit to squakez/camel-k that referenced this issue Jan 8, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
@squakez squakez added this to the 2.3.0 milestone Jan 8, 2024
squakez added a commit to squakez/camel-k that referenced this issue Jan 8, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
@lburgazzoli
Copy link
Contributor

I think this is the expected behavior in the sense that the Integration is running successfully until the Pod crashes. I am not sure if we can change this behavior, and, if that would make sense considering that eventually the Pipe and Integration are correctly set their status. Wdyt @lburgazzoli ?

I think it all depends on the reason for which the POD fails, i.e.:

  • in the past the pod was failing till some of the knative env vars were not injected and in such case we should probably improve the handling of such case
  • if the pod crashes because i.e. fo a missing required parameter, then of course there is nothing we can really do (except maybe enabling route supervising by default for pipes)

what was the reason fro this Ready -> Error loop ?

@squakez
Copy link
Contributor

squakez commented Jan 8, 2024

@lburgazzoli no, it was more a general problem in the order we were monitoring the Integrations. I found the root cause and applied a fix under review right now. Thanks.

@lburgazzoli
Copy link
Contributor

ah ok, gogin on my backlog of notification so I'm not fully up to date yet :)

squakez added a commit to squakez/camel-k that referenced this issue Jan 8, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
squakez added a commit to squakez/camel-k that referenced this issue Jan 8, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes apache#4977
squakez added a commit that referenced this issue Jan 8, 2024
When the user uses a startup probe, the Integration won't turn as running until the condition is reached

Closes #4977
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability Logging, monitoring and tracing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants