New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-16423][e2e] Introduce timeouts for HA tests #11831
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,12 @@ | |
source "$(dirname "$0")"/common.sh | ||
source "$(dirname "$0")"/common_ha.sh | ||
|
||
# | ||
# NOTE: This script requires at least Bash version >= 4. Mac OS in 2020 still ships 3.x | ||
# | ||
|
||
TEST_TIMEOUT_SECONDS=540 | ||
|
||
TEST_PROGRAM_JAR=${END_TO_END_DIR}/flink-datastream-allround-test/target/DataStreamAllroundTestProgram.jar | ||
|
||
function ha_cleanup() { | ||
|
@@ -100,4 +106,21 @@ STATE_BACKEND_FILE_ASYNC=${2:-true} | |
STATE_BACKEND_ROCKS_INCREMENTAL=${3:-false} | ||
ZOOKEEPER_VERSION=${4:-3.4} | ||
|
||
run_ha_test 4 ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} ${STATE_BACKEND_ROCKS_INCREMENTAL} ${ZOOKEEPER_VERSION} | ||
function kill_test_watchdog() { | ||
local watchdog_pid=`cat $TEST_DATA_DIR/job_watchdog.pid` | ||
echo "Stopping job timeout watchdog (with pid=$watchdog_pid)" | ||
kill $watchdog_pid | ||
} | ||
on_exit kill_test_watchdog | ||
|
||
( | ||
cmdpid=$BASHPID; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, these changes require bash 4. Bash on mac is pretty outdated sadly. I think
https://bashwizard.com/shells-subshells-pids/ But I will add a warning to the script, that is requires at least bash 4 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, you are right. It's in another sub-process. I come up with a work-around way. It probably works that executing the watchdog in original main process and executing the testing scripts in sub-process. We can get the sub-process pid in main process. But I guess it's not a big deal, as MacOS is not a main scenario. I hope this PR could be merged asap. I would help a lot. |
||
(sleep $TEST_TIMEOUT_SECONDS; # set a timeout of 10 minutes for this test | ||
echo "Test (pid: $cmdpid) did not finish after $TEST_TIMEOUT_SECONDS seconds." | ||
echo "Printing Flink logs and killing it:" | ||
cat ${FLINK_DIR}/log/* | ||
kill "$cmdpid") & watchdog_pid=$! | ||
echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid | ||
run_ha_test 4 ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} ${STATE_BACKEND_ROCKS_INCREMENTAL} ${ZOOKEEPER_VERSION} | ||
) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,11 +20,16 @@ | |
source "$(dirname "$0")"/common.sh | ||
source "$(dirname "$0")"/common_ha.sh | ||
|
||
TEST_TIMEOUT_SECONDS=540 | ||
TEST_PROGRAM_JAR_NAME=DataStreamAllroundTestProgram.jar | ||
TEST_PROGRAM_JAR=${END_TO_END_DIR}/flink-datastream-allround-test/target/${TEST_PROGRAM_JAR_NAME} | ||
FLINK_LIB_DIR=${FLINK_DIR}/lib | ||
JOB_ID="00000000000000000000000000000000" | ||
|
||
# | ||
# NOTE: This script requires at least Bash version >= 4. Mac OS in 2020 still ships 3.x | ||
# | ||
|
||
function ha_cleanup() { | ||
stop_watchdogs | ||
kill_all 'StandaloneJobClusterEntryPoint' | ||
|
@@ -148,4 +153,23 @@ STATE_BACKEND_TYPE=${1:-file} | |
STATE_BACKEND_FILE_ASYNC=${2:-true} | ||
STATE_BACKEND_ROCKS_INCREMENTAL=${3:-false} | ||
|
||
run_ha_test 4 ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} ${STATE_BACKEND_ROCKS_INCREMENTAL} | ||
function kill_test_watchdog() { | ||
local watchdog_pid=`cat $TEST_DATA_DIR/job_watchdog.pid` | ||
echo "Stopping job timeout watchdog (with pid=$watchdog_pid)" | ||
kill $watchdog_pid | ||
} | ||
on_exit kill_test_watchdog | ||
|
||
( | ||
cmdpid=$BASHPID; | ||
(sleep $TEST_TIMEOUT_SECONDS; # set a timeout of 10 minutes for this test | ||
echo "Test (pid: $cmdpid) did not finish after $TEST_TIMEOUT_SECONDS seconds." | ||
echo "Printing Flink logs and killing it:" | ||
cat ${FLINK_DIR}/log/* | ||
kill "$cmdpid") & watchdog_pid=$! | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another curiosity, is there any magic of inline There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good question. I don't know if this has to be in the same line or not. Probably not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's OK for me :) |
||
echo $watchdog_pid > $TEST_DATA_DIR/job_watchdog.pid | ||
|
||
run_ha_test 4 ${STATE_BACKEND_TYPE} ${STATE_BACKEND_FILE_ASYNC} ${STATE_BACKEND_ROCKS_INCREMENTAL} | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a curiosity, what's the reason of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
JM_WATCHDOG_PID
is used in thestop_watchdogs
function, which is called from different bash-processes.Because of that, if the variable is set in one process, it's not available in the other one.
That's why I decided to store the PID in a file.