Skip to content

[BEAM-4546] Multi level combine#5639

Merged
robertwb merged 2 commits intoapache:masterfrom
robertwb:multi-level-combine
Jun 14, 2018
Merged

[BEAM-4546] Multi level combine#5639
robertwb merged 2 commits intoapache:masterfrom
robertwb:multi-level-combine

Conversation

@robertwb
Copy link
Contributor


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

@robertwb robertwb force-pushed the multi-level-combine branch from 256a0c4 to c5763b4 Compare June 13, 2018 21:13
@robertwb
Copy link
Contributor Author

R: @katsiapis

Returns:
A PObject holding the result of the combine operation.
"""
def with_hot_key_fanout(self, fanout):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about how this will be used, and what will it do? Especially what is expected from fanout fn? (Would it be a better to rename fanout to fanoutfn?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See below about fanoutfn.

def process(self, element):
key, value = element
fanout = fanout_fn(key)
if not fanout or fanout is 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be invalid to return None from fanout_fn?

Also a high level question, do we expect users to have an idea about what keys would be hot to a point of how much fanout they will want per key? Would it be simpler to just accept an fanout integer that will apply to all keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to require an integer.

There are definitely cases where some keys are much more frequent than others, and can be detected. (E.g. word distributions in natural languages.) However, this also accepts a plain integer, which I expect will be use most of the time, which is why I simply called it fanout.

Copy link
Member

@aaltay aaltay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

# Boolean indicates this is not an accumulator.
yield pvalue.TaggedOutput('cold', (key, (False, value)))
else:
self.counter += 1 # Round-robin should be more even than random.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; "should be more" more what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More even. But this is ambitious with "even than." I'll reword.

Note that a fanout greater than 1 requires the data to be sent through
two GroupByKeys, and a high fanout can also result in more shuffle data
due to less per-bundle combining. Setting the fanout for a key at 1 or less
places values on the "cold key" path that skip the intermeidate level of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; intermeidate -> intermediate

robertwb added 2 commits June 13, 2018 17:41
This is needed to avoid double-counting for multiply-firing
triggers in accumulation mode.
@robertwb robertwb force-pushed the multi-level-combine branch from 8c6416b to c51b18b Compare June 14, 2018 00:41
@robertwb
Copy link
Contributor Author

Jenkins: please run Python PreCommit

@katsiapis
Copy link

R: @xinzha623

Xin, could you and Foo review this? It will be useful for TFMA implementation.

@robertwb robertwb merged commit 65bb1fe into apache:master Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants