-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-9257][E2E Tests] Fix wrong "All tests pass" message #6053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-9257][E2E Tests] Fix wrong "All tests pass" message #6053
Conversation
8f404f9
to
5e2b8f5
Compare
@zentol Maybe you could have a look at this? |
I'll take a look tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @florianschmidt1994.
I've left some comments. Would be great if @zentol also takes a look.
exit 1 | ||
fi | ||
|
||
source "$(dirname "$0")"/test-scripts/common.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can now be removed?
|
||
check_logs_for_errors | ||
check_logs_for_exceptions | ||
check_logs_for_non_empty_out_files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move all these methods:
start_timer
end_timer
check_logs_for_errors
check_logs_for_exceptions
check_logs_for_non_empty_out_files
to test-runner-common.sh
since that's the only place they are used anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been discussing about changing the semantics at some point to leave it up to each individual test case to check the logs for errors and drop it from the test runner, maybe even with a whitelist / blacklist approach of expected exceptions. If we want to go that way I'd say leave it in common.sh
We could also say we're probably gonna stick with the current approach for a while, then I'd say let's move them to test-runner-common.sh
kill ${watchdog_pid} 2> /dev/null | ||
wait ${watchdog_pid} 2> /dev/null | ||
# | ||
cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test_local_recovery_and_scheduling
test currently bundles several executions of the test (e.g. with different state backend configurations) in a single run of the test script. That's why it required this cleanup within the test itself.
How would the change of this PR affect this?
In general, should we also restructure e2e tests so that each execution configuration variant should be executed with the test-runner-cleanup#run_test
method (instead of cleaning up itself in-between executions)?
AFAIK, only the test_local_recovery_and_scheduling
does this at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be a concern anymore with the new changed where each configuration of test_local_recovery_and_scheduling
is its own test-case?
run_test "Streaming SQL end-to-end test" "$END_TO_END_DIR/test-scripts/test_streaming_sql.sh" | ||
run_test "Streaming bucketing end-to-end test" "$END_TO_END_DIR/test-scripts/test_streaming_bucketing.sh" | ||
run_test "Stateful stream job upgrade end-to-end test" "$END_TO_END_DIR/test-scripts/test_stateful_stream_job_upgrade.sh 2 4" | ||
run_test "Local recovery and sticky scheduling end-to-end test" "$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test currently performs log verifications and cleanups within a single execution of the test script, since it specifies multiple executions with different state backend configurations.
Should we break this up, so that each configuration variant is explicitly executed by the run_test
method (like what we currently do with the savepoint / externalized checkpoint tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to having each execution as a separate test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR introduces a new way to track failure states (test_has_errors
variable initialized as 1, check for empty string), I'd prefer sticking to the existing pattern (EXIT_CODE
variable initialized as 0, check for inequality to 0) for consistency.
f7f6471
to
4b19431
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found 1 small thing. Could you rebase the PR (feel free to squash everything beforehand)?
fi | ||
|
||
source "$(dirname "$0")"/test-scripts/common.sh | ||
source "$(dirname "$0")"/test-scripts/test-runner-common.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use END_TO_END_DIR
instead
Oh you already rebased it in the mean-time. neat. |
run_test "Local recovery and sticky scheduling end-to-end test" "$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks false true" | ||
run_test "Local recovery and sticky scheduling end-to-end test" "$END_TO_END_DIR/test-scripts/test_local_recovery_and_scheduling.sh 4 10 rocks true true" | ||
|
||
run_test "Quickstarts nightly end-to-end test" "$END_TO_END_DIR/test-scripts/test_quickstarts.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be here, subsumed by the java/scala quickstart calls
incremental checkpoints: ${incremental} | ||
kill JVM: ${kill_jvm}" | ||
|
||
TEST_PROGRAM_JAR=$TEST_INFRA_DIR/../../flink-end-to-end-tests/flink-local-recovery-and-allocation-test/target/StickyAllocationAndLocalRecoveryTestJob.jar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use END_TO_END_DIR
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only works if we export END_TO_END_DIR in run_nightly/precommit_tests.sh, but I'll just add that there as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well we already did export it there, but you removed it :P
cleanup | ||
|
||
if [[ ${exit_code} == 0 ]]; then | ||
if [[ ! "$PASS" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we aren't using pass anymore?
|
||
cleanup | ||
|
||
if [[ ${exit_code} == 0 ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I mistaken or didn't you already change this to EXIT_CODE
? Did maybe something go wrong during the rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh damn. Something might have gone wrong here...
I'll look into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that is still in there b.c. I left the check_logs_for_non_empty_out_files
etc. untouched, which again use the PASS
thing to signal whether or not a test case should fail.
I'll go ahead and change this to our new convention as well.
At least it's nothing went wrong during the rebase 😅
b62e14c
to
a114667
Compare
Looks good to me, let's see what travis says. |
@zentol Looks like travis likes it 🙂 |
Travis logs excerpt looks good. Follow up commits looks good. |
merging. |
What is the purpose of the change
Fix a wrongly printed "All tests PASS" message when they actually don't"
Previously the behaviour was like this:
With this PR the whole test-runner is restructured so that
run_test
methodPASS
now do so by exiting with non-zero exit codecheck_result_hash
exits with 1 instead of modifyingPASS
Additionally this PR
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation