[FLINK-23418][e2e] Increase the timeout to make kubernetes application ha test more stable #16602

wangyang0918 · 2021-07-27T06:04:24Z

What is the purpose of the change

The test test_kubernetes_application_ha.sh could fail because of timeout(30s) waiting for log Restoring job 00000000000000000000000000000000 from Checkpoint. I did not find any exceptions, potential bugs for Kubernetes HA and recovery process. It just seems that the JobManager started and recovered slowly. This PR will increase the timeout to 120s to make the test more stable.

Brief change log

Increase the timeout to make kubernetes application ha test more stable

Verifying this change

Already covered by test_kubernetes_application_ha.sh

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…n ha test more stable

flinkbot · 2021-07-27T06:08:03Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 827ad0e (Sat Aug 28 12:23:45 UTC 2021)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2021-07-27T06:21:23Z

CI report:

827ad0e Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

wangyang0918 · 2021-07-27T08:06:20Z

@KarmaGYZ Could you please have a look on this simple change?

KarmaGYZ

LGTM

…n ha test more stable This closes apache#16602.

[FLINK-23418][e2e] Increase the timeout to make kubernetes applicatio…

827ad0e

…n ha test more stable

rmetzger added the review=description? label Jul 27, 2021

rmetzger added component=Deployment/Kubernetes component=Runtime/Coordination labels Jul 27, 2021

KarmaGYZ approved these changes Jul 27, 2021

View reviewed changes

wangyang0918 mentioned this pull request Jul 28, 2021

[BP-1.13][FLINK-23418][e2e] Increase the timeout to make kubernetes application ha test more stable #16613

Closed

wangyang0918 closed this in b296738 Jul 28, 2021

hhkkxxx133 pushed a commit to hhkkxxx133/flink that referenced this pull request Aug 25, 2021

[FLINK-23418][e2e] Increase the timeout to make kubernetes applicatio…

cadd8a6

…n ha test more stable This closes apache#16602.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-23418][e2e] Increase the timeout to make kubernetes application ha test more stable #16602

[FLINK-23418][e2e] Increase the timeout to make kubernetes application ha test more stable #16602

wangyang0918 commented Jul 27, 2021

flinkbot commented Jul 27, 2021 •

edited

flinkbot commented Jul 27, 2021 •

edited

wangyang0918 commented Jul 27, 2021

KarmaGYZ left a comment

[FLINK-23418][e2e] Increase the timeout to make kubernetes application ha test more stable #16602

[FLINK-23418][e2e] Increase the timeout to make kubernetes application ha test more stable #16602

Conversation

wangyang0918 commented Jul 27, 2021

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Jul 27, 2021 • edited

Automated Checks

Review Progress

flinkbot commented Jul 27, 2021 • edited

CI report:

wangyang0918 commented Jul 27, 2021

KarmaGYZ left a comment

Choose a reason for hiding this comment

flinkbot commented Jul 27, 2021 •

edited

flinkbot commented Jul 27, 2021 •

edited