New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39622][SQL][TESTS] Add additional error message matching for SPARK-7837 Do not close output writer twice when commitTask() fail
#37245
Conversation
SPARK-7837 Do not close output writer twice when commitTask() fail
in ParquetIOSuite
SPARK-7837 Do not close output writer twice when commitTask() fail
in ParquetIOSuite
cc @HyukjinKwon |
also cc @srowen @dongjoon-hyun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
c97a71a merge SPARK-39831 to pass GA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM for code wise.
For the title, the current one is proper for JIRA title, but a little insufficient for PR title. Could you revise the PR title a little more toward to the proposed solution (newly handled error message)?
SPARK-7837 Do not close output writer twice when commitTask() fail
in ParquetIOSuite
SPARK-7837 Do not close output writer twice when commitTask() fail
Change the title to Add additional error message matching for @dongjoon-hyun Do you think this is OK? |
Thank you. Looks much better! |
|
This test code changes add |
Thanks @LuciferYang for dealing with this! |
Thanks all ~ |
…oot cause ### What changes were proposed in this pull request? The pr follow #37245 StageFailed event should attach with the root cause ### Why are the changes needed? **It may be a good way for users to know the reason of failure.** By carefully investigating the issue: https://issues.apache.org/jira/browse/SPARK-39622, I found the root cause of test failure: StageFailed don't attach the failed reason from executor. when OutputCommitCoordinator execute 'taskCompleted', the 'reason' is ignored. Scenario 1: receive TaskSetFailed (Success) > InsertIntoHadoopFsRelationCommand > FileFormatWriter.write > _**handleTaskSetFailed**_ (**attach root cause**) > abortStage > failJobAndIndependentStages > SparkListenerJobEnd Scenario 1: receive StageFailed (Fail) > InsertIntoHadoopFsRelationCommand > FileFormatWriter.write > _**handleStageFailed**_ (**don't attach root cause**) > abortStage > failJobAndIndependentStages > SparkListenerJobEnd ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual run UT & Pass GitHub Actions Closes #37292 from panbingkun/SPARK-39868. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR adds additional assertions to fix flaky test
SPARK-7837 Do not close output writer twice when commitTask() fail
inParquetIOSuite
.Why are the changes needed?
The test suite
SPARK-7837 Do not close output writer twice when commitTask() fails
only handle theTaskSetFailed
event before SPARK-39195 due tomaxTaskFailures
is 1 when local mode.But after SPARK-39195, In
OutputCommitCoordinator#taskCompleted
, the processing ofstageState.authorizedCommitters(partition) == taskId
changes from debug logging to post anStageFailed
event, the flaky test may handle to one of theTaskSetFailed
event andStageFailed
event, and the execution order of the two events is uncertain, and there may be the following two kinds of logs:- Scenario 1(Success)
- Scenario 2(Failed)
If
maxTaskFailures
is changed to>=2
, Scenario 2 will appear stably, but the flaky test runs in the local mode andmaxTaskFailures
is always 1, so the this pr just adds additional assertions to workaround.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GitHub Actions