Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle invalid rows in the Storage Api sink #17423

Merged
merged 1 commit into from
May 27, 2022

Conversation

reuvenlax
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #17423 (0c9cf43) into master (1dfab62) will increase coverage by 0.01%.
The diff coverage is n/a.

❗ Current head 0c9cf43 differs from pull request most recent head c4af119. Consider uploading reports for the commit c4af119 to get more accurate results

@@            Coverage Diff             @@
##           master   #17423      +/-   ##
==========================================
+ Coverage   73.98%   74.00%   +0.01%     
==========================================
  Files         696      696              
  Lines       91851    91851              
==========================================
+ Hits        67958    67975      +17     
+ Misses      22644    22627      -17     
  Partials     1249     1249              
Flag Coverage Δ
go 50.45% <0.00%> (ø)
python 83.75% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/runners/direct/executor.py 96.46% <0.00%> (-0.55%) ⬇️
sdks/python/apache_beam/transforms/core.py 92.30% <0.00%> (ø)
...ks/python/apache_beam/runners/worker/sdk_worker.py 89.09% <0.00%> (+0.15%) ⬆️
...hon/apache_beam/runners/worker/bundle_processor.py 93.55% <0.00%> (+0.24%) ⬆️
...ks/python/apache_beam/runners/worker/data_plane.py 88.13% <0.00%> (+0.56%) ⬆️
...eam/runners/portability/fn_api_runner/execution.py 93.08% <0.00%> (+0.64%) ⬆️
...hon/apache_beam/runners/direct/test_stream_impl.py 94.02% <0.00%> (+0.74%) ⬆️
...che_beam/runners/interactive/interactive_runner.py 90.64% <0.00%> (+1.43%) ⬆️
sdks/python/apache_beam/utils/interactive_utils.py 97.56% <0.00%> (+2.43%) ⬆️
.../python/apache_beam/testing/test_stream_service.py 92.85% <0.00%> (+4.76%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dfab62...c4af119. Read the comment docs.

@aaltay
Copy link
Member

aaltay commented Apr 28, 2022

@reuvenlax - Is this ready for a review?

@reuvenlax
Copy link
Contributor Author

@aaltay This depends on #17404 being merged first

@reuvenlax
Copy link
Contributor Author

@aaltay this is now ready for review. who would be the best reviewer for this?

@reuvenlax
Copy link
Contributor Author

friendly ping

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

StorageApiWritePayload payload = messageConverter.toMessage(element.getValue());
o.get(successfulWritesTag).output(KV.of(element.getKey(), payload));
} catch (TableRowToStorageApiProto.SchemaConversionException e) {
TableRow tableRow = messageConverter.toTableRow(element.getValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bit worried about just pushing all messages from an exception handler to a DLQ.

(1) This could result in errors from downstream fused steps being sent to DQL instead of being retried.
(2)Messages being send to a DLQ in an unintended way may be perceived as dataloss by a user of the I/O connector.

I think we should build a retry policy around this (or use existing BQ retry policy) so that users explicitly mark messages that should be sent to a DLQ.

WDYT ?

@reuvenlax
Copy link
Contributor Author

reuvenlax commented May 20, 2022 via email

@chamikaramj
Copy link
Contributor

chamikaramj commented May 20, 2022

Can we build a retry policy that only include "SchemaConversionExceptions" by default ?

And can we expose this through the API similar to existing failedInsertRetryPolicy ?

public Write<T> withFailedInsertRetryPolicy(InsertRetryPolicy retryPolicy) {

@reuvenlax
Copy link
Contributor Author

reuvenlax commented May 21, 2022 via email

@chamikaramj
Copy link
Contributor

Ok. Thanks for clarifying. LGTM.

+1 for making SchemaConversionException package private.

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Kotlin_Examples PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

2 similar comments
@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

4 similar comments
@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

2 similar comments
@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax
Copy link
Contributor Author

Run Java PreCommit

@reuvenlax reuvenlax merged commit 25039a8 into apache:master May 27, 2022
@algirdas-k
Copy link

This seems to be in direction of solving BEAM-13158 issue. Maybe issue and PR should be linked? Will there be other PR's regarding row-level error handling via Storage API?

@reuvenlax
Copy link
Contributor Author

reuvenlax commented Jun 2, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants