Support for Flink's SpeculativeExecution in batch execution mode - Backport of PR #10548 #10776

venkata91 · 2024-07-24T17:38:12Z

Summary

Add support for Flink's Speculative Execution in batch execution mode

Details

Backport from 1.19 to 1.18 required some manual conflict resolution only in the tests. They are as follows:

Instead of getRuntimeContext().getTaskInfo().getIndexOfThisSubtask() and getRuntimeContext().getTaskInfo().getAttemptNumber() use getRuntimeContext().getIndexOfThisSubtask() and getRuntimeContext().getAttemptNumber()
Add dropDatabase() helper method in TestBase which required adding DEFAULT_CATALOG_NAME as a constant in FlinkCatalogFactory.

Backport from 1.18 to 1.17 went through cleanly.

Testing

Added an integration test to verify the expected speculative execution behavior with IcebergSource

backport from 1.19 with some changes

…ckport of PR apache#10548

venkata91 · 2024-07-24T19:45:01Z

cc @pvary for review.

pvary · 2024-07-24T19:46:39Z

The 1st point is fine.
@rodmeneses: How did we handle the drop database stuff in previous Flink releases?

Thanks, Peter

rodmeneses · 2024-07-24T21:21:50Z

The 1st point is fine. @rodmeneses: How did we handle the drop database stuff in previous Flink releases?

Thanks, Peter

We didn't have dropDatabase on 1.17 and 1.18. Only dropCatalog existed there.

pvary · 2024-07-26T13:14:23Z

@rodmeneses: IIRC we faced an issue in the tests when we were introducing Flink 1.19. This was solved by adding the dropDatabase to the tests. We found that it would be good to have a better cleanup method for every tests which uses FlinkSQL, but decided against adding that along with the 1.19 PR. This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

If my recollection above is correct, then we have 2 tasks ahead of us:

This PR doesn't need to add the dropDatabase related changes to the backports
We still need to come up with a better cleanup for the SQL tests

@rodmeneses: Could you please confirm?

Thanks,
Peter

rodmeneses · 2024-07-26T16:50:54Z

@rodmeneses: IIRC we faced an issue in the tests when we were introducing Flink 1.19. This was solved by adding the dropDatabase to the tests. We found that it would be good to have a better cleanup method for every tests which uses FlinkSQL, but decided against adding that along with the 1.19 PR. This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

If my recollection above is correct, then we have 2 tasks ahead of us:

This PR doesn't need to add the dropDatabase related changes to the backports

We still need to come up with a better cleanup for the SQL tests

@rodmeneses: Could you please confirm?

Thanks, Peter

that is correct @pvary. We discussed about having a more thorough cleaning logic for all of FlinkSQL related unit tests, but I didn't have the time nor I wanted to keep adding more complexity on my previous PR.

venkata91 · 2024-07-26T17:29:18Z

This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

@pvary Can you clarify what do you mean by the "inconsistencies" here?

pvary · 2024-07-29T07:38:16Z

This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

@pvary Can you clarify what do you mean by the "inconsistencies" here?

Differences would have been a better word.

Based on the discussion above: @venkata91: Could you please remove the dropDatabase from the tests for now? Would tests work without it?

Thanks,
Peter

venkata91 · 2024-07-29T20:27:51Z

This basically resulted in "inconsistencies" between the 1.17-1.18/1.19 tests.

@pvary Can you clarify what do you mean by the "inconsistencies" here?

Differences would have been a better word.

Based on the discussion above: @venkata91: Could you please remove the dropDatabase from the tests for now? Would tests work without it?

Thanks, Peter

Okay removed the dropDatabase for 1.17 and 1.18 from TestBase and currently dropping the database as part of after().

venkata91 · 2024-07-30T21:20:40Z

@pvary Gentle ping

pvary · 2024-07-31T10:31:30Z

@venkata91: Could you please check the failure?
Otherwise LGTM

venkata91 · 2024-07-31T16:28:01Z

@venkata91: Could you please check the failure? Otherwise LGTM

@pvary I looked at it earlier and it seems to be unrelated. Test failure is in org.apache.iceberg.flink.TestFlinkAnonymousTable and I ran it locally and it passed as well. Could it be a transient test failure?

venkata91 · 2024-07-31T16:32:58Z

@pvary Btw, I merged the main branch changes to my local branch in order to trigger the tests again. Ideally, if it is transient issue, hopefully this should solve it.

venkata91 · 2024-07-31T17:38:02Z

@pvary Btw, I merged the main branch changes to my local branch in order to trigger the tests again. Ideally, if it is transient issue, hopefully this should solve it.

@pvary looks like all the checks passed now.

pvary · 2024-08-01T07:23:16Z

Merged to main.
Thanks @venkata91 for the backport!

venkata91 added 2 commits July 24, 2024 10:23

Support for Flink's SpeculativeExecution in batch execution mode -

67dc060

backport from 1.19 with some changes

Support for Flink's SpeculativeExecution in batch execution mode - Ba…

84ef71f

…ckport of PR apache#10548

github-actions bot added the flink label Jul 24, 2024

Address review comments to remove dropDatabase

3c4c34a

venkata91 force-pushed the vsowrira/spec-exec-1.18 branch from cf6e44d to 3c4c34a Compare July 29, 2024 20:27

Merge branch 'main' into vsowrira/spec-exec-1.18

c20048b

pvary approved these changes Aug 1, 2024

View reviewed changes

pvary merged commit 84c9125 into apache:main Aug 1, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Flink's SpeculativeExecution in batch execution mode - Backport of PR #10548 #10776

Support for Flink's SpeculativeExecution in batch execution mode - Backport of PR #10548 #10776

venkata91 commented Jul 24, 2024

venkata91 commented Jul 24, 2024

pvary commented Jul 24, 2024

rodmeneses commented Jul 24, 2024

pvary commented Jul 26, 2024

rodmeneses commented Jul 26, 2024

venkata91 commented Jul 26, 2024 •

edited

Loading

pvary commented Jul 29, 2024

venkata91 commented Jul 29, 2024

venkata91 commented Jul 30, 2024

pvary commented Jul 31, 2024

venkata91 commented Jul 31, 2024 •

edited

Loading

venkata91 commented Jul 31, 2024

venkata91 commented Jul 31, 2024

pvary commented Aug 1, 2024

Support for Flink's SpeculativeExecution in batch execution mode - Backport of PR #10548 #10776

Support for Flink's SpeculativeExecution in batch execution mode - Backport of PR #10548 #10776

Conversation

venkata91 commented Jul 24, 2024

Summary

Details

Testing

venkata91 commented Jul 24, 2024

pvary commented Jul 24, 2024

rodmeneses commented Jul 24, 2024

pvary commented Jul 26, 2024

rodmeneses commented Jul 26, 2024

venkata91 commented Jul 26, 2024 • edited Loading

pvary commented Jul 29, 2024

venkata91 commented Jul 29, 2024

venkata91 commented Jul 30, 2024

pvary commented Jul 31, 2024

venkata91 commented Jul 31, 2024 • edited Loading

venkata91 commented Jul 31, 2024

venkata91 commented Jul 31, 2024

pvary commented Aug 1, 2024

venkata91 commented Jul 26, 2024 •

edited

Loading

venkata91 commented Jul 31, 2024 •

edited

Loading