Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14036] Read Configuration for Pub/Sub SchemaTransform #17730

Conversation

damondouglas
Copy link
Contributor

@damondouglas damondouglas commented May 22, 2022

This PR is the first of four planned PRs. It contributes to BEAM-14036 by implementing a configuration for reading from Pub/Sub. There are no dependencies on this PR.

I would like to request the following to review this PR.
R: @angoenka
R: @chamikaramj


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@asf-ci
Copy link

asf-ci commented May 22, 2022

Can one of the admins verify this patch?

2 similar comments
@asf-ci
Copy link

asf-ci commented May 22, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented May 22, 2022

Can one of the admins verify this patch?

@damondouglas
Copy link
Contributor Author

Run Java PreCommit

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

* letter queue topic string.
*/
@Nullable
public abstract String getDeadLetterQueue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls change to getDeadLetterTopic to be consistent with the Pub/Sub read transform.

public Read<T> withDeadLetterTopic(String deadLetterTopic) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chamikaramj (cc: @angoenka ) Thank you again for reviewing. May we consider leaving the name getDeadLetterQueue?

In a previous use of AutoValueSchema with AutoValue in a different project, I observed it needed the getters to be named as get in order for the serialization to work. For example, I had a property called fooName. When I named the getter fooName(), the return value of fooName() was null when invoked in the context of a DoFn. However, when I changed the getter to getFooName() the return value was what I expected. I am not sure if my observation is still valid.

Adding to supporting get instead of with, the design goals of the configuration class are to hold data needed by its corresponding SchemaProvider. The method is not doing any action implied by the with preposition. The getter is simply getting data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think the confusion here is that this property does not correspond to that PubSubIO.getDeadLetterTopic I referenced by maps to the dlq property below.

Is that correct ?

* {@link org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers}.
*/
@Nullable
public abstract String getFormat();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see format, protoClass, thriftClass attributes in the original Read config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These again map to PubSubSchemaIO not the original PubSubIO. Is that plan here to make Pub/Sub SchemaTransform just consistent with existing Pub/Sub SchemaIO ? I'm not sure.

@TheNeuralBit @dpcollins-google WDYT ? Should we just be consistent with Pub/Sub SchemaIO implementation here or should SchemaTransform be a different design ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These again map to PubSubSchemaIO not the original PubSubIO. Is that plan here to make Pub/Sub SchemaTransform just consistent with existing Pub/Sub SchemaIO ? I'm not sure.

@TheNeuralBit @dpcollins-google WDYT ? Should we just be consistent with Pub/Sub SchemaIO implementation here or should SchemaTransform be a different design ?

The plan and replicate like-for-like SchemaIO questions are critical and blocking design decisions that relates to this thread as well. I will hold off on any changes to this PR until we get the feedback needed. Thank you again.

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #17730 (408664b) into master (0c9cf43) will increase coverage by 0.01%.
The diff coverage is n/a.

❗ Current head 408664b differs from pull request most recent head 3099d60. Consider uploading reports for the commit 3099d60 to get more accurate results

@@            Coverage Diff             @@
##           master   #17730      +/-   ##
==========================================
+ Coverage   74.00%   74.01%   +0.01%     
==========================================
  Files         696      698       +2     
  Lines       91851    92224     +373     
==========================================
+ Hits        67975    68263     +288     
- Misses      22627    22710      +83     
- Partials     1249     1251       +2     
Flag Coverage Δ
go 50.93% <ø> (+0.48%) ⬆️
python 83.60% <ø> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ache_beam/runners/dataflow/dataflow_job_service.py 50.00% <0.00%> (-12.17%) ⬇️
sdks/go/pkg/beam/io/textio/textio.go 55.15% <0.00%> (-10.42%) ⬇️
...s/interactive/dataproc/dataproc_cluster_manager.py 77.41% <0.00%> (-6.80%) ⬇️
.../apache_beam/runners/interactive/dataproc/types.py 96.55% <0.00%> (-3.45%) ⬇️
sdks/python/apache_beam/dataframe/io.py 88.78% <0.00%> (-3.26%) ⬇️
sdks/go/pkg/beam/pardo.go 47.41% <0.00%> (-3.03%) ⬇️
...eam/runners/portability/fn_api_runner/fn_runner.py 87.51% <0.00%> (-2.50%) ⬇️
...nners/portability/fn_api_runner/worker_handlers.py 77.89% <0.00%> (-1.45%) ⬇️
sdks/python/apache_beam/internal/dill_pickler.py 84.89% <0.00%> (-1.44%) ⬇️
...ks/go/pkg/beam/runners/dataflow/dataflowlib/job.go 21.55% <0.00%> (-1.30%) ⬇️
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c9cf43...3099d60. Read the comment docs.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

* letter queue topic string.
*/
@Nullable
public abstract String getDeadLetterQueue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think the confusion here is that this property does not correspond to that PubSubIO.getDeadLetterTopic I referenced by maps to the dlq property below.

Is that correct ?

* {@link org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers}.
*/
@Nullable
public abstract String getFormat();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These again map to PubSubSchemaIO not the original PubSubIO. Is that plan here to make Pub/Sub SchemaTransform just consistent with existing Pub/Sub SchemaIO ? I'm not sure.

@TheNeuralBit @dpcollins-google WDYT ? Should we just be consistent with Pub/Sub SchemaIO implementation here or should SchemaTransform be a different design ?

@damondouglas
Copy link
Contributor Author

@chamikaramj , @TheNeuralBit , @dpcollins-google , @pabloem If I don't see any activity on this PR in the next couple days, I will go ahead and close it to keep the PR space clean in this project.

@chamikaramj
Copy link
Contributor

I think we can just be compatible with existing SchemaIO config for now and update in the future if needed.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@chamikaramj chamikaramj merged commit 6cfcc9d into apache:master Jun 14, 2022
bullet03 pushed a commit to akvelon/beam that referenced this pull request Jun 20, 2022
…7730)

* Read Configuration for Pub/Sub SchemaTransform

* Add idAttribute to Read Configuration

* Add Experimental annotation/remove SuppressWarning
@damondouglas damondouglas deleted the BEAM-14036-PubSubSchemaIOProviderReadConfig branch June 30, 2022 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants