Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky scale integration test or scale intermittent issue #2250

Closed
squakez opened this issue Apr 30, 2021 · 3 comments · Fixed by #2251
Closed

Flaky scale integration test or scale intermittent issue #2250

squakez opened this issue Apr 30, 2021 · 3 comments · Fixed by #2251
Labels
area/continuous integration Related to CI and automated testing

Comments

@squakez
Copy link
Contributor

squakez commented Apr 30, 2021

--- FAIL: TestIntegrationScale/Scale_integration_with_Camel_K_client (60.05s)

I'm running this problem many times during the last period (last time here). It seems that sometimes this test fails for no reasons, so, either the test is a flaky one or there is indeed some problem in the scaling. Whichever the case I think we should fix it.

@astefanutti
Copy link
Member

astefanutti commented Apr 30, 2021

It seems the scaling works as expected, as one of the pod is marked as deleted after the down-scaling:

DeletionTimestamp: {
  Time: 2021-04-30T07:33:16Z,
},

But the pod is not actually deleted. Either it's just the timeout that's too short (60s), or there is an issue during the graceful shutdown of the Integration pod:

 ContainerStatuses: [
    {
        Name: "integration",
        State: {
            Waiting: {
                Reason: "ContainerCreating",
                Message: "",
            },
            Running: nil,
            Terminated: nil,
        },
        LastTerminationState: {
            Waiting: nil,
            Running: nil,
            Terminated: {
                ExitCode: 137,
                Signal: 0,
                Reason: "ContainerStatusUnknown",
                Message: "The container could not be located when the pod was deleted.  The container used to be Running",
                StartedAt: {
                    Time: 0001-01-01T00:00:00Z,
                },
                FinishedAt: {
                    Time: 0001-01-01T00:00:00Z,
                },
                ContainerID: "",
            },
        },
        Ready: false,
        RestartCount: 0,
        Image: "kind-registry:5000/test-93060b55-d350-424a-940b-ac238a56a23e/camel-k-kit-c25r4v0p0t9ikun92a80@sha256:be19aebd01eb8c08805670ff92d44f1b7c74915d4d94846d1314a01a3223999d",
        ImageID: "",
        ContainerID: "",
        Started: false,
    },
],

@astefanutti
Copy link
Member

It seems related to kubernetes/kubernetes#97288.

@astefanutti
Copy link
Member

If the assumption is correct, the fix is provided with kubernetes/kubernetes#97980, that's been cherry-picked in 1.20.x with kubernetes/kubernetes#97998.

Let me try bumping Kubernetes version used in CI to 1.20.6 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/continuous integration Related to CI and automated testing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants