Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-1432] Inject number of Shards as a Side Input to Write #1941

Closed
wants to merge 6 commits into from

Conversation

tgroh
Copy link
Member

@tgroh tgroh commented Feb 7, 2017

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

This permits users to pass a PTransform from the input elements to the
number of shards instead of requiring a constant amount. This enables
sharding to be determined based on the input data rather than a constant
value.

This can also convert the DirectRunner to inject a custom sharding
strategy in its override, rather than relying on bundling behavior.

This permits users to pass a PTransform from the input elements to the
number of shards instead of requiring a constant amount. This enables
sharding to be determined based on the input data rather than a constant
value.

Tests still should be written. This can also convert the DirectRunner to
inject a custom sharding strategy in its override, rather than relying
on custom bundling.
@tgroh
Copy link
Member Author

tgroh commented Feb 7, 2017

Tests still need to be written. @dhalperi @reuvenlax you two are likely interested.

@asfbot
Copy link

asfbot commented Feb 7, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7168/
--none--

Copy link
Contributor

@dhalperi dhalperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easy to extend to support ValueProvider for numShards.

return numShards;
.addIfNotNull(
DisplayData.item("sharding", getSharding() == null ? null : getSharding().getClass()))
.include("sharding", getSharding());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent seems off, add a label.

Doesn't .include do L122 already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned this up. include does indeed do that, but also doesn't permit null (and getSharding can of course be null)

this.sink = sink;
this.numShards = numShards;
this.computeNumShards = computeNumShards;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -96,11 +95,11 @@
*/
public static class Bound<T> extends PTransform<PCollection<T>, PDone> {
private final Sink<T> sink;
private int numShards;
private final PTransform<PCollection<T>, PCollectionView<Integer>> computeNumShards;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nullable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -140,6 +130,10 @@ public int getNumShards() {
return sink;
}

public PTransform<PCollection<T>, PCollectionView<Integer>> getSharding() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nullable, javadoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@Override
public void populateDisplayData(DisplayData.Builder builder) {
builder.add(DisplayData.item("numShards", numShards));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with label

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@@ -254,14 +254,11 @@ public void testWriteWithSessions() {
public void testBuildWrite() {
Sink<String> sink = new TestSink() {};
Write.Bound<String> write = Write.to(sink).withNumShards(3);
assertEquals(3, write.getNumShards());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* Returns a new {@link Write.Bound} that will write to the current {@link Sink} with
* runner-determined sharding.
*/
public Bound<T> withoutSharding() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withDynamicSharding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withRunnerDeterminedSharding is more precise (as DynamicSharding can be user-determined)

@tgroh tgroh changed the title Inject number of Shards as a Side Input to Write [BEAM-1432] Inject number of Shards as a Side Input to Write Feb 8, 2017
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.007%) to 69.598% when pulling a33c457 on tgroh:write_inject_sharding into 8c1a577 on apache:master.

@asfbot
Copy link

asfbot commented Feb 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7182/
--none--

@asfbot
Copy link

asfbot commented Feb 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7250/
--none--

@asfbot
Copy link

asfbot commented Feb 9, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7251/
--none--

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 69.697% when pulling 41071dd on tgroh:write_inject_sharding into 2eeeaa2 on apache:master.

@tgroh
Copy link
Member Author

tgroh commented Feb 10, 2017

R: @dhalperi

I have also added the ValueProvider alternative

Copy link
Contributor

@dhalperi dhalperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essentially LGTM, minor comments.

If you're happy with the validation improvements, go ahead and self-merge. If you want me to TAL, let me know.

@@ -151,7 +160,42 @@ public int getNumShards() {
* runner-controlled sharding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistent language? (runner-determined/controlled)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like determined. Done.

if (numShards == null) {
minShardsNeeded = 1;
} else {
minShardsNeeded = c.sideInput(numShards);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate this value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@Override
public void populateDisplayData(DisplayData.Builder builder) {
builder.add(DisplayData.item("numShards", numShards).withLabel("ConstantShards"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fixed Number of Shards" to match the existing label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

public Integer apply(T input) {
@ProcessElement
public void processElement(ProcessContext context) {
Integer shardCount = context.sideInput(numShards);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate this number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 69.192% when pulling 2a642f5 on tgroh:write_inject_sharding into 2eeeaa2 on apache:master.

@asfbot
Copy link

asfbot commented Feb 17, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7551/

Failed Tests: 0


--none--

@dhalperi
Copy link
Contributor

(please self-merge at will)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 69.189% when pulling 321b095 on tgroh:write_inject_sharding into 2eeeaa2 on apache:master.

@asfbot
Copy link

asfbot commented Feb 17, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/7563/
--none--

@asfgit asfgit closed this in 918c7fa Feb 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants