Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12769] Adds support for expanding a Java cross-language transform using the class name and builder methods #15343

Merged
merged 11 commits into from Sep 4, 2021

Conversation

chamikaramj
Copy link
Contributor

This adds proto and Java updates.
Python updates to follow.

Please see here for the design: https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@chamikaramj
Copy link
Contributor Author

R: @robertwb

@codecov
Copy link

codecov bot commented Aug 17, 2021

Codecov Report

Merging #15343 (8c75b93) into master (bd3649e) will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15343      +/-   ##
==========================================
- Coverage   83.77%   83.75%   -0.03%     
==========================================
  Files         442      443       +1     
  Lines       60050    60077      +27     
==========================================
+ Hits        50308    50318      +10     
- Misses       9742     9759      +17     
Impacted Files Coverage Δ
...hon/apache_beam/runners/direct/test_stream_impl.py 94.02% <0.00%> (-2.24%) ⬇️
.../python/apache_beam/transforms/periodicsequence.py 96.72% <0.00%> (-1.64%) ⬇️
sdks/python/apache_beam/io/source_test_utils.py 88.47% <0.00%> (-1.39%) ⬇️
sdks/python/apache_beam/io/localfilesystem.py 91.47% <0.00%> (-0.78%) ⬇️
...apache_beam/runners/dataflow/internal/apiclient.py 76.46% <0.00%> (-0.34%) ⬇️
sdks/python/apache_beam/transforms/util.py 95.81% <0.00%> (-0.17%) ⬇️
setup.py 0.00% <0.00%> (ø)
...he_beam/portability/api/external_transforms_pb2.py 100.00% <0.00%> (ø)
...am/portability/api/external_transforms_pb2_urns.py 0.00% <0.00%> (ø)
...hon/apache_beam/runners/worker/bundle_processor.py 93.64% <0.00%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd3649e...8c75b93. Read the comment docs.

@chamikaramj
Copy link
Contributor Author

cc: @lukecwik

@chamikaramj
Copy link
Contributor Author

Added following based on the discussions in the doc and the mailing list.

  • Added a yaml based allowlist for classes/methods.
  • Added two annotations so that transform authors can customize constructor/builder method names offered.
  • Automatically simplifies the common builder pattern "withXyz" by also allowing the format "xyz".

PTAL

@chamikaramj
Copy link
Contributor Author

Friendly ping :)

All tests pass now.

@chamikaramj
Copy link
Contributor Author

R: @ihji as well

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with complex types for method parameters like List and/or MyCustomUserType that can be loaded via Schema?

Copy link
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. PTAL.

@chamikaramj
Copy link
Contributor Author

Regarding the "complex types" question. Yeah, this is expected to work for all types that can be represented by a Beam Row+schema.

model/pipeline/src/main/proto/external_transforms.proto Outdated Show resolved Hide resolved
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;

public interface ExpansionServiceOptions extends PipelineOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


void setJavaClassLookupAllowlistFile(String file);

class JavaClassLookupAllowListFactory implements DefaultValueFactory<AllowList> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class comment, e.g.:

Loads the allow list from {@link #getJavaClassLookupAllowlistFile}, defaulting to an empty AllowList.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that strField1, strField2 and intField1 was set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that strField1, strField2 and intField1 was set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that strField1, strField2 and intField1 was set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that strField1, abc and xyz was set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


public static class DummyTransform extends PTransform<PBegin, PCollection<String>> {

String strField1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for a wrapper type (e.g. Double field), and a complex type (non string, primitive, wrapper), and a list of simple type and list of complex type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I realized that we do not support some types yet. I added support for arrays.

Now we support:

  • Java primitive types and Strings.
  • Java types that can be represented by a Beam schema (if a schema has not
    been registered service will try to generate a schema using
    'JavaFieldSchema').
  • Arrays of supported types.

We do not support collections as top level parameters yet. This can be added in the future.

I clarified what's supported in the spec and added tests for complex types and arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for lists as well.

Copy link
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. PTAL.

import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;

public interface ExpansionServiceOptions extends PipelineOptions {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


void setJavaClassLookupAllowlistFile(String file);

class JavaClassLookupAllowListFactory implements DefaultValueFactory<AllowList> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

public static final String ALLOW_LIST_VERSION = "v1";

public JavaClassLookupTransformProvider(AllowList allowList) {
if (!allowList.getVersion().equals(ALLOW_LIST_VERSION)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow us to modify the format of the allow-list in the future while supporting old versions.


if (matchingMethods.size() != 1) {
throw new RuntimeException(
"Expected to find exact one matching method in transform "
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.build(),
"strField1"));

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"intField1"));
payloadBuilder.addBuilderMethods(builderMethodBuilder);

testClassLookupExpansionRequestConstruction(payloadBuilder.build());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


public static class DummyTransform extends PTransform<PBegin, PCollection<String>> {

String strField1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I realized that we do not support some types yet. I added support for arrays.

Now we support:

  • Java primitive types and Strings.
  • Java types that can be represented by a Beam schema (if a schema has not
    been registered service will try to generate a schema using
    'JavaFieldSchema').
  • Arrays of supported types.

We do not support collections as top level parameters yet. This can be added in the future.

I clarified what's supported in the spec and added tests for complex types and arrays.

…and each builder method.

Updated the implementation accordingly and added additional tests.
Copy link
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I simplified the proto and the implementation by removing the "Parameter" message and defining a single schema/payload for the constructor and each builder method.

This allows us to support more types in a generic way.

Added tests for complex types, arrays and lists.

PTAL. Thanks.


public static class DummyTransform extends PTransform<PBegin, PCollection<String>> {

String strField1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support for lists as well.

@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

1 similar comment
@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

mostly comment changes and nits

great work on expanding the set of types and simplifying down to one schema + row per method

// Name of the builder method
string name = 1;

// A schema that describes the parameter of the constructor or the constructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A schema that describes the parameter of the constructor or the constructor
// A schema that describes the parameters of the constructor or the constructor's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated now.

Comment on lines 53 to 54
// The top level fields of the schema represents the parameters in order.
// Top level field names map to parameter names to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you disambiguate when we choose names over field ordering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check the number of fields as well as the names of the fields in order.

// Top level field names map to parameter names to use.
Schema schema = 2;

// A payload that maps to the provided builder method schema.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should state that we expect beam:coder:row:v1 encoding of the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Top level field names map to parameter names to use.
Schema constructor_schema = 4;

// A payload that maps to the provided constructor schema.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should state that we expect beam:coder:row:v1 encoding of the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// transform object is constructed.
// Given builder methods will be invoked in order when constructing the
// transform objects.
repeated BuilderMethod builder_methods = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: re-order to place builder methods at the bottom so that constructor fields are grouped together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Top level field names map to parameter names to use.
Schema schema = 2;

// A payload that maps to the provided builder method schema.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should state that we expect beam:coder:row:v1 encoding of the schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 77 to 78
// The top level fields of the schema represent the parameters in order.
// Top level field names map to parameter names to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The top level fields of the schema represent the parameters in order.
// Top level field names map to parameter names to use.
// The top level fields of the schema represent the method parameters in order.
// If able, top level field names are also verified against the method parameters for a match.

Mapping names sounds like we will use the names to map parameters and not the field order. We seem to only use the names for validation that the correct parameter is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 94 to 95
// The top level fields of the schema represents the parameters in order.
// Top level field names map to parameter names to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The top level fields of the schema represents the parameters in order.
// Top level field names map to parameter names to use.
// The top level fields of the schema represent the method parameters in order.
// If able, top level field names are also verified against the method parameters for a match.

Ditto here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

// transform object is constructed.
// Given builder methods will be invoked in order when constructing the
// transform objects.
repeated BuilderMethod builder_methods = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 77 to 78
// The top level fields of the schema represent the parameters in order.
// Top level field names map to parameter names to use.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Top level field names map to parameter names to use.
Schema constructor_schema = 4;

// A payload that maps to the provided constructor schema.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 94 to 95
// The top level fields of the schema represents the parameters in order.
// Top level field names map to parameter names to use.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Top level field names map to parameter names to use.
Schema schema = 2;

// A payload that maps to the provided builder method schema.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Name of the builder method
string name = 1;

// A schema that describes the parameter of the constructor or the constructor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated now.

// Top level field names map to parameter names to use.
Schema schema = 2;

// A payload that maps to the provided builder method schema.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 53 to 54
// The top level fields of the schema represents the parameters in order.
// Top level field names map to parameter names to use.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check the number of fields as well as the names of the fields in order.

@chamikaramj
Copy link
Contributor Author

Run XVR_Dataflow PostCommit

@chamikaramj
Copy link
Contributor Author

Run XVR_Flink PostCommit

@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

@chamikaramj
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

@chamikaramj
Copy link
Contributor Author

PreCommit failures are unrelated but trying again.

@chamikaramj
Copy link
Contributor Author

Run Java PreCommit

@chamikaramj chamikaramj merged commit 1455c54 into apache:master Sep 4, 2021
dpcollins-google pushed a commit to dpcollins-google/beam that referenced this pull request Sep 16, 2021
…rm using the class name and builder methods (apache#15343)

* Adds support for expanding a Java cross-language transform using the class name and builder methods

* Adds an allowlist and adds support for annotations

* Fix tests

* Address CheckerFramework errors

* Adds license

* Addresses reviewer comments.

* Apply suggestions from code review

Co-authored-by: Lukasz Cwik <lcwik@google.com>

* Addresses reviewer comments.

* Updated the proto to include a single schema/payload for constructor and each builder method.
Updated the implementation accordingly and added additional tests.

* Some doc updates and few other minor updates.

* Addressing reviewer comments

Co-authored-by: Lukasz Cwik <lcwik@google.com>
calvinleungyk pushed a commit to calvinleungyk/beam that referenced this pull request Sep 22, 2021
…rm using the class name and builder methods (apache#15343)

* Adds support for expanding a Java cross-language transform using the class name and builder methods

* Adds an allowlist and adds support for annotations

* Fix tests

* Address CheckerFramework errors

* Adds license

* Addresses reviewer comments.

* Apply suggestions from code review

Co-authored-by: Lukasz Cwik <lcwik@google.com>

* Addresses reviewer comments.

* Updated the proto to include a single schema/payload for constructor and each builder method.
Updated the implementation accordingly and added additional tests.

* Some doc updates and few other minor updates.

* Addressing reviewer comments

Co-authored-by: Lukasz Cwik <lcwik@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants