Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Revert "Revert "[BEAM-5299] Define max timestamp for global w… #7012

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
6 participants
@HuangLED
Copy link
Contributor

HuangLED commented Nov 12, 2018

…indow in proto (#6381)"""

This reverts commit a9723d9.

This PR also breaks internal tests. Rollback to unblock other changes.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---
@HuangLED

This comment has been minimized.

Copy link
Contributor Author

HuangLED commented Nov 12, 2018

R: @chamikaramj @mxm
cc: @lukecwik @boyuanzz

Chamikaramj also mentioned there could be a way to make this change with less effects on postcommit integration tests (correct me if I misunderstood).

@HuangLED HuangLED closed this Nov 12, 2018

@HuangLED

This comment has been minimized.

Copy link
Contributor Author

HuangLED commented Nov 12, 2018

rollback this PR will also require rollbacking internal import CL. Trying out rebuilding image instead.

@tweise

This comment has been minimized.

Copy link
Contributor

tweise commented Nov 12, 2018

@HuangLED I have a general question on why such "Revert" PRs surface out of nowhere? I don't seem to find any email on the dev@ list about this. Which "internal" tests are breaking? Why do we need to revert changes in Beam that have been discussed at length? @lukecwik @chamikaramj @mxm

@HuangLED

This comment has been minimized.

Copy link
Contributor Author

HuangLED commented Nov 13, 2018

@HuangLED I have a general question on why such "Revert" PRs surface out of nowhere? I don't seem to find any email on the dev@ list about this. Which "internal" tests are breaking? Why do we need to revert changes in Beam that have been discussed at length? @lukecwik @chamikaramj @mxm

Sorry for the confusion. internal testing refers to GCP dataflow service internal testing.

Initially thought a rollback was the best option to not block other people. While creating this PR though, I was also chatting with several folks regarding which way would be the best fix. We ended up doing an internal java container release, followed by #7014. Now the breaking test issue is gone.

We thus then closed this PR, before putting more information into it. Please kindly let me know what else can done to clean up better after wards, in addition to closing PR, in such a case.

Thanks!

@tweise

This comment has been minimized.

Copy link
Contributor

tweise commented Nov 13, 2018

@HuangLED I have a general question on why such "Revert" PRs surface out of nowhere? I don't seem to find any email on the dev@ list about this. Which "internal" tests are breaking? Why do we need to revert changes in Beam that have been discussed at length? @lukecwik @chamikaramj @mxm

Sorry for the confusion. internal testing refers to GCP dataflow service internal testing.

There shouldn't be any confusion. Contributing policies and how we work as community are documented here: https://beam.apache.org/contribute/

You won't find a mention of special provision for internal Dataflow testing and more or less under the radar slamming reverts into the Beam repository. This has happened repeatedly in the past and it is time we do better. In the future, please discuss on the email list prior to opening a PR. Thanks!

@chamikaramj

This comment has been minimized.

Copy link
Contributor

chamikaramj commented Nov 13, 2018

Agree that tests that warrant a rollback of a Beam PR should be available for Beam community to observe/update.

In this case, the issue was Dataflow worker container not being built for the updated code. This was built and Beam was updated to use the new worker container (#7014). So we do not need to add a new test.

@aaltay

This comment has been minimized.

Copy link
Contributor

aaltay commented Nov 13, 2018

@tweise I think any PR breaking any runner should be reverted. Otherwise it will block the next release one way or another. Contribution guide does not clearly explain what issues needs to be discussed in dev@ list or not. That explains the confusion.

Are you proposing a change to discuss all PR reverts to start on the mailing list first? Could you clarify?

(cc: @kennknowles)

@tweise

This comment has been minimized.

Copy link
Contributor

tweise commented Nov 13, 2018

Yes, I think that PR reverts that originate from "internal" issues, need to be discussed. Situations that warrant a revert are documented here: https://beam.apache.org/contribute/postcommits-policies/

@tweise I think any PR breaking any runner should be reverted.

That doesn't make sense. Anyone could find a problem in any runner outside of the project and cite that as reason to revert properly reviewed PRs that passed post-commit. I wonder how that would be received if it wasn't Dataflow, but any other runner..

@aaltay

This comment has been minimized.

Copy link
Contributor

aaltay commented Nov 13, 2018

The internal issue could have been better phrased as dataflow runner fails to run with this change. I agree on that.

Post commit policies have content about how to handle post commit failures. It does not explain on all conditions about how and why a PR revert should happen.

If you think that discussing reverts on dev@ list would make sense, I would suggest adding that to the documents. It is not clear to me, it is unlikely to be clear to new comer.

That doesn't make sense. Anyone could find a problem in any runner outside of the project and cite that as reason to revert properly reviewed PRs that passed post-commit. I wonder how that would be received if it wasn't Dataflow, but any other runner..

If there is a user reported issue about a runner not working, I think it makes sense to look at it. This is true for any runner. And in the past we did these investigations at release time for runners other dataflow runner. (example 2.7 release and investigations related to the spark runner.)

The question is, if we know that there is an issue, do we want to wait until a later time (release time) to address it?

One clear improvement here would be to, with any such errors to augment Beam tests to provide better coverage.

@mxm

This comment has been minimized.

Copy link
Contributor

mxm commented Nov 13, 2018

There is nothing wrong reverting this if it happens to break a critical Runner. What was initially missing here, is an explanation, why this is necessary and, more importantly, how it could be mitigated in the future. We track such information in JIRA issues or directly in GitHub, to document and share knowledge.

When I merged the initial PR, I wasn't aware of the Dataflow image and how regularly it builds. Now, I will be more careful and, perhaps, we can work on a solution in which we can test more easily if a change would break Dataflow.

@aaltay

This comment has been minimized.

Copy link
Contributor

aaltay commented Nov 13, 2018

@mxm I completely agree with you. A JIRA with explanation of what is breaking, how to test it and what can be changed would have made this easier. I also agree that, as long as it is possible for each such breaking case there needs to be a new test that can catch such issues earlier.

@kennknowles

This comment has been minimized.

Copy link
Member

kennknowles commented Nov 15, 2018

Thank you for tagging me here. This is an important issue.

Our rollback first policy does not apply to a change that passes all of Beam's postcommit tests. It does apply to Beam's postcommit suites for just one runner. The purpose of rapid rollback is foremost to restore test signal and not to disrupt the work of other contributors.

But...

  • We have at least three examples of runners where there are probably tests outside the Beam repo: Dataflow, Samza runner, and IBM Streams.
  • We also may have users that try running their production loads against Beam master branch to learn early whether the next release will break them.

We should also respect these other sources of information, balancing haste with fairness and transparency. If nothing public and owned/blessed by Beam is broken, then you need to explain the issue and justify it publicly. Certainly you should contact the author(s) and reviewer(s), which you did here anyhow.

In this case, it sounds like a postcommit did fail, so it is OK to rollback with only that as the explanation. You should include a Jira with pointer to the failed run, and also just link it on the PR. So that's missing in this particular case, and the PR description refers to internal tests so it is confusing.

I agree with @tweise that we need to draw an extremely sharp distinction between Beam's own tests and whatever testing a user or vendor might do on their own.

I suggest that dev@ is the best place to continue this if a deeper discussion is desired.

@tweise

This comment has been minimized.

Copy link
Contributor

tweise commented Nov 15, 2018

@kennknowles this is actually a case where the same contribution had already been reverted in green post commit status and this PR was another attempt at it, which I guess gave it the deserved attention. However, the general issue is really overdue to discuss and I was already planning to take it to the ML.

@aaltay let's not compare the type of under the radar revert actions on master we discuss here with release process. Release process is an open and very explicit opportunity for identifying problems downstream before we commit to a release. Problems reported may or may not result in reverts, but in any case they would have been discussed and received the necessary feedback.

@aaltay

This comment has been minimized.

Copy link
Contributor

aaltay commented Nov 17, 2018

Thank you for moving the discussion to the mailing list. We can continue there.

I am not comparing this process with release process. I am highlighting the fact that if we break things, sooner or later we will notice them. It will be more difficult address them if we wait longer.

It is also worth clarifying PRs are not under the radar revert actions. All relevant parties were tagged. Also all PRs generate emails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.