Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12145] Normalize and size-limit transform Ids #14500

Merged
merged 1 commit into from Apr 10, 2021

Conversation

chamikaramj
Copy link
Contributor

@chamikaramj chamikaramj commented Apr 9, 2021

Transform IDs generated by the SDK may be used by runners to identify pipeline steps hence its preferable to keep them normalized, size limited, and unique.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK ULR Dataflow Flink Samza Spark Twister2
Go Build Status --- Build Status Build Status --- Build Status ---
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status --- Build Status Build Status --- Build Status ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@chamikaramj
Copy link
Contributor Author

Run Java PostCommit

@@ -191,6 +193,10 @@ private String getApplicationName(AppliedPTransform<?, ?, ?> appliedPTransform)
if (name.isEmpty()) {
name = "unnamed-ptransform";
}
// Normalize, trim, and uniqify.
int maxNameLength = 100;
name = Normalizer.normalize(name, Form.NFC).replaceAll("[^A-Za-z0-9-_]", "-");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document the required regex in the pipeline proto. Is there validation logic like there is in Python? That would be a good place to check it, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to the proto recommending alphanumeric characters, '_', and '-' for transform IDs.
Also added a test.

Corresponding Python changes are in #14502.

Copy link
Member

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but also let's make sure this is part of spec before merging.

@codecov
Copy link

codecov bot commented Apr 10, 2021

Codecov Report

Merging #14500 (f693052) into master (a48abeb) will increase coverage by 0.00%.
The diff coverage is n/a.

❗ Current head f693052 differs from pull request most recent head 30afbcb. Consider uploading reports for the commit 30afbcb to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master   #14500   +/-   ##
=======================================
  Coverage   83.49%   83.50%           
=======================================
  Files         447      447           
  Lines       58904    58904           
=======================================
+ Hits        49183    49185    +2     
+ Misses       9721     9719    -2     
Impacted Files Coverage Δ
...srcs/sdks/python/apache_beam/typehints/__init__.py
...ache_beam/testing/benchmarks/chicago_taxi/setup.py
...on/apache_beam/runners/portability/spark_runner.py
...he_beam/runners/interactive/pipeline_instrument.py
.../build/srcs/sdks/python/apache_beam/tools/utils.py
...am/portability/api/standard_window_fns_pb2_grpc.py
..._beam/runners/interactive/user_pipeline_tracker.py
.../examples/snippets/transforms/elementwise/pardo.py
...uild/srcs/sdks/python/apache_beam/metrics/cells.py
...ython/apache_beam/io/external/generate_sequence.py
... and 884 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a48abeb...30afbcb. Read the comment docs.

@chamikaramj chamikaramj force-pushed the normalize_java_transform_id branch 2 times, most recently from 2d0b60c to bee1a14 Compare April 10, 2021 07:34
@chamikaramj
Copy link
Contributor Author

Run Python PreCommit

@chamikaramj
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@chamikaramj
Copy link
Contributor Author

Run Java PostCommit

@chamikaramj
Copy link
Contributor Author

Run Java PostCommit

@chamikaramj
Copy link
Contributor Author

Run Python PreCommit

@chamikaramj chamikaramj merged commit 9fd54be into apache:master Apr 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants