-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-4087] implement configurable dependency versions #8255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@iemejia Is this something you could work with? Would also like to get some feedback from @lukecwik (also @kennknowles if possible) |
|
This looks definitely prettier than what we have. But not sure if it fits the use case I want to tackle, for modules with provided dependencies, e.g. spark, hadoop, kafka, it is the users who provide the dependency but it can be broken in Beam so I would like to detect his in advance, to do so if someone changed something let's say in KafkaIO and I want to run extra tests with the provided dependencies of different versions. So I would like to be able to define something like this And then the tests will be run each time with the corresponding versions, We can eventually this triggered only for the CI to not add all that extra weight to the build and triggerred depending on the paths. |
|
Run Java PreCommit |
1 similar comment
|
Run Java PreCommit |
|
Thx @iemejia for looking into this. Of course, this does not fix https://issues.apache.org/jira/browse/BEAM-7003 . But I consider this change a kind of prerequisite. What we are now able to do, is e.g. Takes about 3mins on my machine and shows we are not compatible with versions less than 0.11. So, I d say this does add some value right now, although it does not provide full integration. We could implement some similar loop on Jenkins here to get this on CI right now. For the build system integration I did not even look any further. I expect that this change will help implementing, but we might need to modify again. As dependencies are declared on project level, we currently would need to mess with testRuntimeClasspath. And - of course - fork a new jvm per test anyway. This is doable, but we might be better off using this 5.3 variants which seem to support such use cases. This change here is easy and quickly implemented. So I decided to give it a try to support discussion, whether this configurability is something we would like to have. As variants go even further and I unfortunately do not have a clue about the implications to our build currently. |
|
Rethinking: The suggestion above (i.e. looping) is probably not what we really want to test. We likely want to assemble with default version and only run tests with those different version as this will probably closest to what is happening, when user change provided dependency. That seems currently difficult together with shadow configuration, as this impacts compile also. Need to do some reading here. Of course a hackish solution is always possible... |
|
After some class path hacking results are similar. Tests fail on versions < 0.11 Did not investigate whether that's a test implementation issue or IO itself isn't compatible |
|
Run Java PreCommit |
2 similar comments
|
Run Java PreCommit |
|
Run Java PreCommit |
lukecwik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this change allow for subprojects to increase the diamond dependency problem for users when these are released as Maven artifacts?
One of the benefits of using a common shared list of dependencies is that we effectively test these subprojects with a known good set of library versions. This increases the interoperability of Beam subprojects when released as Maven artifacts. Getting more and more stuff into the shared list is a good thing for our users.
For BEAM-4087, you could create multiple kafka configurations (like testRuntimeKafka0_11_0) and run tests over each configuration. This is effectively what we do with runners and examples/java here
|
Thanks @lukecwik for looking into this and your feedback. Of course this are very valid concerns.
Well, probably yes. At least if meant in the sense of making it easier for the beam developer to mess it up. What this change does, is first expose the configured versions to be readable for beam (sub)projects. This is something which seems to be required. At least some workarounds are already implemented to get this information. We might argue, that in some of the examples given it would be better to include that other libs also directly into the list of shared libraries. Not sure, whether that would solve all issues which require reading of those 'shared version'. Do you see any reason to not expose this information at least read-only? Now to the other changes. Overwrites of common shared versions.I assume, that's what worries you most. And, indeed, if subporjects start randomly changing versions, just because some dev feels like 'having a newer version is better' we are in deep trouble. So we should be very clear on that you should not overwrite a common shared version. Well, unless you have a very good reason. Unsure, whether [1] qualifies, but let's construct some hypothetical example. Assume having n IOs, with A_IO only working with somelib < X.Y, whereas B_IO does work only with version >= X.Y. All others IOs are compatible with any version of somelib. It is obvious, that A_IO and B_IO will never work together. We could now pin a version globally, and either drop support for A_IO or B_IO, pin no version at all or allow to overwrite the version for e.g. A_IO. I believe third option to be the most flexible here from user perspective. Or consider an IO growing old, getting less support on its dependencies. Do we really want to be forced to either drop support of that IO or keep version of some shared transitive dependency fixed to that old version, only because any newer version stopped working with that IO? During implementation I was also concerned about that 'easy overwrite' and considered implementing something like throwing an exception if flag Adding to that shared version list.This, of course will only add on project, so could be considered useless. On the other hand, we might establish some checks later on which help detecting version issues. Also it is likely, that we continue later by exposing also java.library and corresponding version enforcements. Passing version by CLIThis is mainly sugar. Usually we do not want to use that (although something like this is already implemented, see example above) It might be helpful to easily check, whether a different dependency version is working with beam and might be upgraded (without editing BeamModulePlugin). Also, which I would consider helpful, is enabling a user to rebuild and test an artefact with different dependencies. Most likely the user has external constraints which makes it impossible to run with dependencies chosen by beam (e.g. different spark cluster version, other IOs backend version etc) and having that possibility to check might be helpful.
Yes. This PR does not intend to change this. Unfortunately, the user is still forced to solve diamond dependency problems on her project as beams enforcement is of course not active on users project. But at least she can trust beam to be tested and interoperable if she would be able to enforce to beams transitive dependency versions (which might or might not be possible).
Is this really the case? Wouldn't that mean, that I either have to create a separate project which does not use BeamModulePlugin, as plugin enforces version on all configurations [2], or remove the dependency you want to chan ge from that common shared list? And a 'solution' like [3] to get that running seems also awkward (and obviously a maintainance burden, netty version got updated already on beams shared list). Note, that this is 'working' only as netty-all is not part of common libraries. Honestly, I am unsure, whether the proposed solution on BEAM-7003 [4] is any better. [1] https://lists.apache.org/thread.html/4de4f53a62e862f249ccb27001fa41a38db478cf1c3fe9d15522c2f5@%3Cdev.beam.apache.org%3E beam/examples/java/build.gradle Lines 82 to 84 in b419ac4
[4] https://issues.apache.org/jira/browse/BEAM-7003?focusedCommentId=16815436&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16815436 |
No objections to read-only. What use cases did you find that need to know the version?
I would suspect that we could do any of the following:
The person will have to look into BeamModulePlugin to find the magic string for the version override. Editing the version number in the file seems easier then passing it on the command line once I have already had to look for it in the BeamModulePlugin.
We do release a BOM to help users with some parts of the diamond dependency problem. We could add the shared list to this BOM which would help them with diamond dependency issues since they would be getting the versions of libraries that we are suggesting they use.
Two alternatives could be to:
|
|
Thanks, @lukecwik for your explanations. Unfortunately, I have difficulties to follow your argumentation.
I tried to already reference those in the first description of this PR. E.g. creating a dependency just to get to the version
Isn't this pin no version at all? Drawback here would be, that we not only need to respecify the version on all affected leave project, we also loose the forcing of the version done by BeamModulePlugin, which, if not done, will result in gradles default resolution strategy of 'use latest' which in turn will let us end up with possibly lots of different dependency versions on our modules. Thereby reintroducing that diamond dependency problem we initially wanted to minimise by including this dependency into that list. Isn't that worse than only changing the version on a specific module, thereby restricting diamond issues only if that module is in use?
Sure. We can always drop an IO implementation. But what are our users supposed to do, which business rely on that certain module? Should they fork? Or not upgrade anymore? Or hope they will be able to use that old version IO module in conjunction with e.g. newer version of core?
Seem, I was not clear here. I did not try to imply, the beam IO module getting outdated, but the corresponding backend which we do access by some client lib which has the corresponding transitive dependency. So we would need to take over that (external) client lib. Which might be doable, but I certainly prefer not to be required to go down that route.
Not necessarily. While true currently, after turning those magic strings into 'public api' they should certainly be properly documented. By the way, I actually believe they should already be documented in some way or the other, as those fixed versions do constitute some important information not only to the devs, but also to the user. We might consider implementing some reporting task here or some such.
Although that might look easier, it is most likely not something you want to do. First, editing BeamModulePlugin (or any build.gradle) might end up in an accidental commit on some unrelated PR. So I usually discourage people doing so. Second, editing a required Plugin implies a full rebuild of the repository. As I most likely want to do a full build after changing a common library version, i.e. run a this might have heavy impact. E.g. on my machine changing spark version from 2.4.0 to 2.4.2 by editing the plugin takes about 30min to execute. takes only 90s. Now as you most likely also revert that change and might build again, those numbers get even doubled.
Sure. This might help the user. Most likely she is not only using beam reps, though, so some resolution outside of beams scope has to be done anyway. And currently I do not understand all implications if we decide to implement this.
Isn't that equivalent to the proposal on BEAM-7003?
|
|
Currently, I am unsure how to resolve the conversation. From my point of view, we probably should step back and discuss the validity of BEAM-4087 again. Especially in context to BEAM-7003 and its outlined solution True, this PR implements more than requested there. Try to recap: expose versions to be readableThis is done as some hacks in build files seems to indicate that this might be helpful (examples above) Enable overwritingThis is mainly influenced by @kennknowles suggestion on the mailing list (or better, my understanding thereof) regarding handling of dependencies and resulting discussion [1] and a proposed solution on how to handle such cases. Adding cli parameterThat's actually requested by that ticket and was available with maven. Which of course does not imply, it is something which needs to be implemented. Anyway, there is already something as such in place on subprojects [2], so might indeed be helpful. adding project specific dep versions to that mapInstead of defining them locally in build script, they are exposed. For whatever that might be useful. [1] https://lists.apache.org/thread.html/4de4f53a62e862f249ccb27001fa41a38db478cf1c3fe9d15522c2f5@%3Cdev.beam.apache.org%3E |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
@adude3141 sorry for having unproritized this subject, but I really care about it, it was just there was too much stuff ongoing and could not look at it. Do you think we should keep this open, or take the approach you mention in the other PR, or any other new idea? |
|
@iemejia I guess, the approach on the other ticket is still valid and is not really impacted by the changes here. I d need to have a look at it again, but iirc it was just a matter of extracting that into reusable module/plugin/whateverucallit... |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
This PR introduces configurable dependency versions. [1]
Note: This implies a semantic change of the current dependency definitions. Whereas they were previously only local to beam module plugin and not even accessible, it is now possible to
This change might be arguable, but there are already some 'quirks' established for
so that might be a sign, that we should enable developers to work more easily with this deps.
[1]
beam/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
Lines 55 to 82 in d3deeab
[2] https://github.com/apache/beam/blob/master/runners/flink/job-server/flink_job_server.gradle#L79
[3] https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/build.gradle#L64
[4] https://github.com/apache/beam/blob/master/runners/samza/build.gradle#L58
[5] https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/build.gradle#L24-L27
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.