-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-9935] Respect allowed split points in Python. #11653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @lukecwik |
| keep_of_element_remainder = keep / (1 - current_element_progress) | ||
| # If it's less than what's left of the current element, | ||
| # try splitting at the current element. | ||
| if (keep_of_element_remainder < 1 and is_valid_split_point(index) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowed_split_points definition is too vague in the case of multiple active elements and we need to scope it down to mean the set of allowed first_residual_element indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've added clarification to the proto.
| # pylint: disable=round-builtin | ||
| stop_index = index + max(1, int(round(current_element_progress + keep))) | ||
| if allowed_split_points and stop_index not in allowed_split_points: | ||
| allowed_split_points = sorted(allowed_split_points) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make it an error to have duplicate split points in allowed_split_points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there's never a reason to have duplicates, but that wouldn't impact this code here (and I don't think should result in failure).
| else: | ||
| prev = allowed_split_points[closest - 1] | ||
| next = allowed_split_points[closest] | ||
| if index < prev and stop_index - prev < next - stop_index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should comment that your choosing the closer of the two points here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| self.assertEqual(self.split(2, 0, 0.5, 16), simple_split(9)) | ||
| self.assertEqual(self.split(6, 0, 0.5, 16), simple_split(11)) | ||
|
|
||
| def test_split_with_element_progres(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def test_split_with_element_progres(self): | |
| def test_split_with_element_progress(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| self.assertEqual( | ||
| self.split(0, 0, 0.25, 16, allowed=(2, 3, 6)), simple_split(3)) | ||
|
|
||
| self.assertEqual( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to either add comments or break out the tests to separate methods to describe the different scenarios such as round to closest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept them in the same methods, because it's easier to understand the values relative to the prior examples.
| self.assertEqual( | ||
| self.sdf_split(0, 0, 0.12, 4), (-1, 'Primary(0.5)', 'Residual(0.5)', 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.assertEqual( | |
| self.sdf_split(0, 0, 0.12, 4), (-1, 'Primary(0.5)', 'Residual(0.5)', 1)) | |
| self.assertEqual( | |
| self.sdf_split(0, 0, 0.125, 4), (-1, 'Primary(0.5)', 'Residual(0.5)', 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this originally, but it bumped the formatting, and .12 was close enough. I can change this back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it confusing that the rounding dropped the fractional part when I was running through the scenarios.
| // empty, there are no constraints on where to split. | ||
| // Specifically, the first_residual_element of a split result must be an | ||
| // allowed split point, and the last_primary_element must immediately | ||
| // preceded an allowed split point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // preceded an allowed split point. | |
| // precede an allowed split point. |
|
Run Java PreCommit |
|
Ack. |
|
Just wanna confirm, the allowed_split_points dictate splits at element boundaries only right? |
|
The allowed split points tell you what is the valid set of return values for |
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.