JFYI: tiny forked adam-core "0.20.0" release #1139

Closed
ryan-williams opened this Issue Aug 29, 2016 · 12 comments

Comments

Projects
None yet
3 participants
@ryan-williams
Member

ryan-williams commented Aug 29, 2016

I'd been building and making available ADAM JARs from various HEAD SHAs for my lab over the last few months, since I've needed some of our scripts to pick up post-0.19.0 fixes (like the bash-arg-parsing fixes for #972 and #1132).

That was fine, but even more recently I needed to give pageant an ADAM Maven coordinate to depend on with some of the new APIs, so after weighing some options I went ahead and made a small fork of adam-core here https://github.com/hammerlab/adam/tree/adam-core and released it to Maven as org.hammerlab.adam:adam-core_2.10:0.20.0.

A couple of interesting things happened along the way:

  • I made adam-core its own standalone project, removed the other modules, and put it in the top of the repo.
  • I did a few other POM hygiene improvements that I'd like to upstream here when I have a moment.

I don't plan to touch that fork any more, if I can help it, just wanted to mention this here and point out the repo in case anyone is curious to inspect some of the Maven-futzing involved with pulling the core module out to stand on its own.

Feel free to close this issue whenever.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 29, 2016

Member

Thanks for the heads up, @ryan-williams!

I did a few other POM hygiene improvements that I'd like to upstream here when I have a moment.

That'd be fantastic!

I made adam-core its own standalone project, removed the other modules, and put it in the top of the repo.

OOC, what are the benefits from your side to having the different modules split out? From my side, the submodules make some usage hairy (specifically, pushing to spark-packages) but makes it easier to build the whole project.

Member

fnothaft commented Aug 29, 2016

Thanks for the heads up, @ryan-williams!

I did a few other POM hygiene improvements that I'd like to upstream here when I have a moment.

That'd be fantastic!

I made adam-core its own standalone project, removed the other modules, and put it in the top of the repo.

OOC, what are the benefits from your side to having the different modules split out? From my side, the submodules make some usage hairy (specifically, pushing to spark-packages) but makes it easier to build the whole project.

@ryan-williams

This comment has been minimized.

Show comment
Hide comment
@ryan-williams

ryan-williams Aug 31, 2016

Member

K, just sent my changes in #1142.

OOC, what are the benefits from your side to having the different modules split out? From my side, the submodules make some usage hairy (specifically, pushing to spark-packages) but makes it easier to build the whole project.

I didn't start with a goal of "having a repo that is just adam-core", but given that all I wanted to fork/deploy was adam-core, reducing the project to that one module seemed like the sane thing to do.

Overall I am pretty unconvinced that Maven's module support adds more than it costs over, say, separate repos for each module (and maybe shared parent POMs to reduce repeated boilerplate? Seems like those would logically go together though actually investigating the latter is still on my TODO list), so this was also an experiment in me getting a little more hands-on experience with what the relationship between a module and the parent POM actually is; it is not intended to imply that I think ADAM should do things one way or another.

Member

ryan-williams commented Aug 31, 2016

K, just sent my changes in #1142.

OOC, what are the benefits from your side to having the different modules split out? From my side, the submodules make some usage hairy (specifically, pushing to spark-packages) but makes it easier to build the whole project.

I didn't start with a goal of "having a repo that is just adam-core", but given that all I wanted to fork/deploy was adam-core, reducing the project to that one module seemed like the sane thing to do.

Overall I am pretty unconvinced that Maven's module support adds more than it costs over, say, separate repos for each module (and maybe shared parent POMs to reduce repeated boilerplate? Seems like those would logically go together though actually investigating the latter is still on my TODO list), so this was also an experiment in me getting a little more hands-on experience with what the relationship between a module and the parent POM actually is; it is not intended to imply that I think ADAM should do things one way or another.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 31, 2016

Member

From experience, separate repositories with shared parent POMs makes life difficult.

I'm personally much more in favor of one big repo and many Maven modules. It comes down to what should all be released at the same time (same repo) and what should be in the same artifact (same Maven module).

I don't think it makes sense for your code to depend on command line classes or Java APIs if it only needs adam-core. That's why we split those into separate Maven modules.

Member

heuermh commented Aug 31, 2016

From experience, separate repositories with shared parent POMs makes life difficult.

I'm personally much more in favor of one big repo and many Maven modules. It comes down to what should all be released at the same time (same repo) and what should be in the same artifact (same Maven module).

I don't think it makes sense for your code to depend on command line classes or Java APIs if it only needs adam-core. That's why we split those into separate Maven modules.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 2, 2016

Member

In any case, what do we need to do to get you back depending on our tree?

I'm afraid since the fork you may be missing out on some of the pain we've been introducing on HEAD towards the 0.20.0 release, so we're missing out on the push-back when we change things too much.

Member

heuermh commented Sep 2, 2016

In any case, what do we need to do to get you back depending on our tree?

I'm afraid since the fork you may be missing out on some of the pain we've been introducing on HEAD towards the 0.20.0 release, so we're missing out on the push-back when we change things too much.

@ryan-williams

This comment has been minimized.

Show comment
Hide comment
@ryan-williams

ryan-williams Oct 6, 2016

Member

Sorry, I'd missed your subsequent comments here @heuermh; lots of response below, as well as new info since our talk yesterday towards the end.

What should be in the same repo as what else?

It comes down to what should all be released at the same time (same repo)

Sure, maybe the simplest thing to say here is that I'm not convinced that the 4 adam-* modules are logically so tightly coupled to one another that they need be released together, only together, and always with the same version numbers as one another, and that changes to any of them need to trigger the continuous-builds of all of them, and that the general development pattern in each should entail building/testing all of them while iterating, etc. etc., all of which the current setup requires or encourages.

The larger BDG repo-constellation has many modules with their own versioning schemes and release-cadences (across utils, bdg-formats, adam, quinine, avocado, etc.), partly dependent and partly independent of one another's, and some of the modules therein are arguably more closely coupled to others that happen to live in other repos than they are to some of the modules in their own repo, so decoupling the ADAM modules further is not that radical an idea; a fair number of things use adam-core but not the other adam-* modules, and we already live in a world where important pieces of the overall picture are (rightly) decomposed into separate repos/build-targets.

While we're on this topic, I dumped all the intra-{bdgenomics,hammerlab} dependencies I could find into http://mdaines.github.io/viz.js/, one node per maven-module…

Here they are colored by which repo they live in (click for larger versions!):

and here they are clustered by repo:

I won't claim that this resolves much, though the first one does especially illustrate, to me, the central importance of adam-core and utils-misc to the exclusion of other things that they happen to share github repos with (also bdg-formats, but it's already its own repo / build/release-target).

Smaller build targets == better

I don't think it makes sense for your code to depend on command line classes or Java APIs if it only needs adam-core. That's why we split those into separate Maven modules.

Yea, no disagreements here! Smaller build targets are almost always better in my book, and from where we are currently I also feel that smaller and more decoupled {release units, continuous-build units, version-control units} would be better, which is the previous and more contentious point.

Another way to think about it is that in general it shouldn't be up to core-library maintainers/contributors to upgrade every downstream dependency every time the core library is updated; that work should fall on the downstream-dependency maintainer when they want to upgrade to the newer version of the core library.

As the ecosystem of things depending on ADAM (specifically adam-core) expands, this becomes increasingly important, and the other adam-* modules start to seem like just 3 of the many libraries that depend on adam-core; however, we have maintained this setup where to do anything to adam-core, you have to {build, run all tests for, upgrade / fix brokenness in, and simultaneously release} these 3 other libraries that depend on adam-core but which provide nothing to someone who just wants to add/use adam-core functionality.

Fork-divergence danger

I'm afraid since the fork you may be missing out on some of the pain we've been introducing on HEAD towards the 0.20.0 release, so we're missing out on the push-back when we change things too much.

Yea this is a fair concern; as I mentioned yesterday, I'm recently optimistic that even rebasing/merging in upstream changes to an adam-core-only fork will not be too painful, and in fact I've just done that and pushed org.hammerlab.adam:adam-core_2.*:0.20.1 to Maven central (2.11 version, 2.10 version).

Those releases merge in the latest HEAD as well as #1142, which isn't quite merged in yet, plus a tiny commit to un-package-private SAMRecordConverter, which I mentioned yesterday that I wanted in order to make a small downstream bam-filtering binary.

I've got it set up over there so that the release process is:

# Starting from POM version 0.20.1-SNAPSHOT and Scala 2.11
mvn versions:set -DgenerateBackupPoms=false -DnewVersion="0.20.1"

# Release 2.11 version
mvn deploy -Prelease -DskipTests
git commit -am "bump to 0.20.1"
git tag 0.20.1_2.11

# Release 2.10 version
scripts/move_to_scala_2.10.sh
mvn deploy -Prelease -DskipTests
git commit -am "0.20.1_2.10"
git tag 0.20.1_2.10

# Go back to pre-releases commit, set to new SNAPSHOT version, and push
git reset --hard HEAD@{2}
mvn versions:set -DgenerateBackupPoms=false -DnewVersion="0.20.2-SNAPSHOT"
git commit -am "bump to 0.20.2-SNAPSHOT"
git push upstream HEAD:master
git push --tags upstream

No fussing with the Sonatype staging repo in the web browser!

(The above leaves out the spark2 artifacts, but those would work pretty much the same way.)

So anyway, that's my thinking on arranging more granular library-units that can be managed both independently and dependently more easily than is the case today, but the nice thing is that if you don't agree or we don't have time to move in this direction, it's not too hard for us to each continue having what we need, and benefitting from improvements in both directions (cf. #1142 upstreaming changes from the "fork").

Interested in all thoughts on this!

Member

ryan-williams commented Oct 6, 2016

Sorry, I'd missed your subsequent comments here @heuermh; lots of response below, as well as new info since our talk yesterday towards the end.

What should be in the same repo as what else?

It comes down to what should all be released at the same time (same repo)

Sure, maybe the simplest thing to say here is that I'm not convinced that the 4 adam-* modules are logically so tightly coupled to one another that they need be released together, only together, and always with the same version numbers as one another, and that changes to any of them need to trigger the continuous-builds of all of them, and that the general development pattern in each should entail building/testing all of them while iterating, etc. etc., all of which the current setup requires or encourages.

The larger BDG repo-constellation has many modules with their own versioning schemes and release-cadences (across utils, bdg-formats, adam, quinine, avocado, etc.), partly dependent and partly independent of one another's, and some of the modules therein are arguably more closely coupled to others that happen to live in other repos than they are to some of the modules in their own repo, so decoupling the ADAM modules further is not that radical an idea; a fair number of things use adam-core but not the other adam-* modules, and we already live in a world where important pieces of the overall picture are (rightly) decomposed into separate repos/build-targets.

While we're on this topic, I dumped all the intra-{bdgenomics,hammerlab} dependencies I could find into http://mdaines.github.io/viz.js/, one node per maven-module…

Here they are colored by which repo they live in (click for larger versions!):

and here they are clustered by repo:

I won't claim that this resolves much, though the first one does especially illustrate, to me, the central importance of adam-core and utils-misc to the exclusion of other things that they happen to share github repos with (also bdg-formats, but it's already its own repo / build/release-target).

Smaller build targets == better

I don't think it makes sense for your code to depend on command line classes or Java APIs if it only needs adam-core. That's why we split those into separate Maven modules.

Yea, no disagreements here! Smaller build targets are almost always better in my book, and from where we are currently I also feel that smaller and more decoupled {release units, continuous-build units, version-control units} would be better, which is the previous and more contentious point.

Another way to think about it is that in general it shouldn't be up to core-library maintainers/contributors to upgrade every downstream dependency every time the core library is updated; that work should fall on the downstream-dependency maintainer when they want to upgrade to the newer version of the core library.

As the ecosystem of things depending on ADAM (specifically adam-core) expands, this becomes increasingly important, and the other adam-* modules start to seem like just 3 of the many libraries that depend on adam-core; however, we have maintained this setup where to do anything to adam-core, you have to {build, run all tests for, upgrade / fix brokenness in, and simultaneously release} these 3 other libraries that depend on adam-core but which provide nothing to someone who just wants to add/use adam-core functionality.

Fork-divergence danger

I'm afraid since the fork you may be missing out on some of the pain we've been introducing on HEAD towards the 0.20.0 release, so we're missing out on the push-back when we change things too much.

Yea this is a fair concern; as I mentioned yesterday, I'm recently optimistic that even rebasing/merging in upstream changes to an adam-core-only fork will not be too painful, and in fact I've just done that and pushed org.hammerlab.adam:adam-core_2.*:0.20.1 to Maven central (2.11 version, 2.10 version).

Those releases merge in the latest HEAD as well as #1142, which isn't quite merged in yet, plus a tiny commit to un-package-private SAMRecordConverter, which I mentioned yesterday that I wanted in order to make a small downstream bam-filtering binary.

I've got it set up over there so that the release process is:

# Starting from POM version 0.20.1-SNAPSHOT and Scala 2.11
mvn versions:set -DgenerateBackupPoms=false -DnewVersion="0.20.1"

# Release 2.11 version
mvn deploy -Prelease -DskipTests
git commit -am "bump to 0.20.1"
git tag 0.20.1_2.11

# Release 2.10 version
scripts/move_to_scala_2.10.sh
mvn deploy -Prelease -DskipTests
git commit -am "0.20.1_2.10"
git tag 0.20.1_2.10

# Go back to pre-releases commit, set to new SNAPSHOT version, and push
git reset --hard HEAD@{2}
mvn versions:set -DgenerateBackupPoms=false -DnewVersion="0.20.2-SNAPSHOT"
git commit -am "bump to 0.20.2-SNAPSHOT"
git push upstream HEAD:master
git push --tags upstream

No fussing with the Sonatype staging repo in the web browser!

(The above leaves out the spark2 artifacts, but those would work pretty much the same way.)

So anyway, that's my thinking on arranging more granular library-units that can be managed both independently and dependently more easily than is the case today, but the nice thing is that if you don't agree or we don't have time to move in this direction, it's not too hard for us to each continue having what we need, and benefitting from improvements in both directions (cf. #1142 upstreaming changes from the "fork").

Interested in all thoughts on this!

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 6, 2016

Member

If release cadence is the major problem you're facing right now, then I would encourage you to participate in discussions around issue triage and the release roadmap

#1048
#1088
#1089

Without hearing any dissenting opinion @fnothaft and I have kept pushing the 0.20.0 release along to capture more of the major refactoring, so that 0.20.0 → 0.21.0 would be relatively minor, and quickly lead to 1.0.

We could modify this plan to cut 0.20.0 soon (perhaps now, or after #1114 is merged), and bump the rest to 0.21 and a new 0.22 milestone. Alternatively, we could cherry pick commits from HEAD to the 0.19.0 maintenance branch to cut a 0.19.1 release (I suggested this after the Spark 2.0 fixes, but heard no opinions either way).

I think of a hard fork as more a tool of last resort. We see you and others from the Hammer lab as Very Important Customers of the work we do, and would very much like to help get you unforked. :)

Member

heuermh commented Oct 6, 2016

If release cadence is the major problem you're facing right now, then I would encourage you to participate in discussions around issue triage and the release roadmap

#1048
#1088
#1089

Without hearing any dissenting opinion @fnothaft and I have kept pushing the 0.20.0 release along to capture more of the major refactoring, so that 0.20.0 → 0.21.0 would be relatively minor, and quickly lead to 1.0.

We could modify this plan to cut 0.20.0 soon (perhaps now, or after #1114 is merged), and bump the rest to 0.21 and a new 0.22 milestone. Alternatively, we could cherry pick commits from HEAD to the 0.19.0 maintenance branch to cut a 0.19.1 release (I suggested this after the Spark 2.0 fixes, but heard no opinions either way).

I think of a hard fork as more a tool of last resort. We see you and others from the Hammer lab as Very Important Customers of the work we do, and would very much like to help get you unforked. :)

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 6, 2016

Member

We see you and others from the Hammer lab as Very Important Customers of the work we do, and would very much like to help get you unforked. :)

I see you more as part of what we do, but I digress.

That flowchart you put together (of all the project dependencies) is amazing; thank you for putting that together and for writing out such a thoughtful post. I am absolutely completely slammed this week, but let me try to get some intelligent thoughts written down over the weekend or next week. I think that one question that is missing here is what is the entrypoint for the average ADAM user? If the entrypoint is the CLI—after all, this is bioinformatics—then having adam-{core,apis,cli} bundled together makes sense. If the entrypoint is the core libraries, then splitting things out is the correct approach.

Personally, I think the entrypoint should be the core libraries with a bare minimum CLI for importing/exporting data.

Anyways, let me chew on this a bit more, and I'll send some thoughts back here soon.

Member

fnothaft commented Oct 6, 2016

We see you and others from the Hammer lab as Very Important Customers of the work we do, and would very much like to help get you unforked. :)

I see you more as part of what we do, but I digress.

That flowchart you put together (of all the project dependencies) is amazing; thank you for putting that together and for writing out such a thoughtful post. I am absolutely completely slammed this week, but let me try to get some intelligent thoughts written down over the weekend or next week. I think that one question that is missing here is what is the entrypoint for the average ADAM user? If the entrypoint is the CLI—after all, this is bioinformatics—then having adam-{core,apis,cli} bundled together makes sense. If the entrypoint is the core libraries, then splitting things out is the correct approach.

Personally, I think the entrypoint should be the core libraries with a bare minimum CLI for importing/exporting data.

Anyways, let me chew on this a bit more, and I'll send some thoughts back here soon.

@fnothaft fnothaft referenced this issue Oct 7, 2016

Closed

Release ADAM version 0.20.0 #1048

47 of 61 tasks complete
@ryan-williams

This comment has been minimized.

Show comment
Hide comment
@ryan-williams

ryan-williams Oct 7, 2016

Member

Cool, yea let me narrow the scope of this thread a bit after having maybe over-expanded it:

My explicit message here is that I prototyped a way for external libraries to use ADAM-core functionality more recent than 0.19.0 (February 2016), org.hammerlab.adam:adam-core_2.1[01]:0.20.1, that offers a couple of different tradeoffs from depending on org.bdgenomics.adam:adam-core_2.1[01]:0.19.1-SNAPSHOT.

From that work came a bunch of reflections and patches; the former are above and the latter are in #1142, ready to be merged.

I've intended the word "fork" here only in the weak sense in which all contributions I've ever made to ADAM have been pushed to, and then merged upstream from, my "fork": https://github.com/ryan-williams/adam; that's just rote/idiomatic GitHub convention. There is no "hard fork" here in any sense in which that phrase is typically used.

I believe in and am invested in the ADAM project and I appreciate the work y'all have been doing to project-manage and evolve it for months while I've been working on other stuff! I'm hoping to have cycles soon to help with some of those things, but what I have at the moment is the perspective of an ADAM user taking a fresh look at it after some months ~away. and I see a pretty simple decision-tree from:

  • almost 8mos (and substantial rewrites of codebase/APIs) without a patch release,
  • none of us really using code (or even the same JVM version) that is currently the most recent official offering to the world

  • let's think about our priors around the optimal balance of project cohesion vs. agility/iteration,
  • maybe Doing A Release is on too much of a pedestal,
  • maybe too many build/release/SCM things are conflated/coupled and have become unwieldy.

So I have serialized some of those ideas to a GitHub comment above just because they exist and are pretty obviously relevant to our current situation; we can keep them in mind or not as project-management decisions continue to be made. Thanks for humoring/reading/responding, and helping us get some mileage out of this "discussion" tag 😄 !

Member

ryan-williams commented Oct 7, 2016

Cool, yea let me narrow the scope of this thread a bit after having maybe over-expanded it:

My explicit message here is that I prototyped a way for external libraries to use ADAM-core functionality more recent than 0.19.0 (February 2016), org.hammerlab.adam:adam-core_2.1[01]:0.20.1, that offers a couple of different tradeoffs from depending on org.bdgenomics.adam:adam-core_2.1[01]:0.19.1-SNAPSHOT.

From that work came a bunch of reflections and patches; the former are above and the latter are in #1142, ready to be merged.

I've intended the word "fork" here only in the weak sense in which all contributions I've ever made to ADAM have been pushed to, and then merged upstream from, my "fork": https://github.com/ryan-williams/adam; that's just rote/idiomatic GitHub convention. There is no "hard fork" here in any sense in which that phrase is typically used.

I believe in and am invested in the ADAM project and I appreciate the work y'all have been doing to project-manage and evolve it for months while I've been working on other stuff! I'm hoping to have cycles soon to help with some of those things, but what I have at the moment is the perspective of an ADAM user taking a fresh look at it after some months ~away. and I see a pretty simple decision-tree from:

  • almost 8mos (and substantial rewrites of codebase/APIs) without a patch release,
  • none of us really using code (or even the same JVM version) that is currently the most recent official offering to the world

  • let's think about our priors around the optimal balance of project cohesion vs. agility/iteration,
  • maybe Doing A Release is on too much of a pedestal,
  • maybe too many build/release/SCM things are conflated/coupled and have become unwieldy.

So I have serialized some of those ideas to a GitHub comment above just because they exist and are pretty obviously relevant to our current situation; we can keep them in mind or not as project-management decisions continue to be made. Thanks for humoring/reading/responding, and helping us get some mileage out of this "discussion" tag 😄 !

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 18, 2016

Member

Version 0.20.0 has been released and is available on Maven Central.

I've updated the release issues for 0.21 and 1.0 and added a new one for 0.22 (#1210).

I've also created Github projects for 0.21 and 0.22 at
https://github.com/bigdatagenomics/adam/projects

Hopefully these will help support the discussion on triaging issues, deciding what goes in a release, and when a release should be made. Let me know what you think!

Member

heuermh commented Oct 18, 2016

Version 0.20.0 has been released and is available on Maven Central.

I've updated the release issues for 0.21 and 1.0 and added a new one for 0.22 (#1210).

I've also created Github projects for 0.21 and 0.22 at
https://github.com/bigdatagenomics/adam/projects

Hopefully these will help support the discussion on triaging issues, deciding what goes in a release, and when a release should be made. Let me know what you think!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 8, 2016

Member

@ryan-williams just wanted to confirm, after the 0.20.0 ADAM release, you are off of your local fork, right? If so, we can close this ticket.

Member

fnothaft commented Nov 8, 2016

@ryan-williams just wanted to confirm, after the 0.20.0 ADAM release, you are off of your local fork, right? If so, we can close this ticket.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Dec 9, 2016

Member

@ryan-williams bump for confirmation. If you are still on your local fork, and there's anything we need to do before 0.21.0 to help, please let us know.

It would be fun to meet up sometime and try to work through some of these build and packaging issues. As nice as working on Github is, occasionally there are issues that require a few people sitting in the same room. Perhaps we can talk offline about travel opportunities.

Member

heuermh commented Dec 9, 2016

@ryan-williams bump for confirmation. If you are still on your local fork, and there's anything we need to do before 0.21.0 to help, please let us know.

It would be fun to meet up sometime and try to work through some of these build and packaging issues. As nice as working on Github is, occasionally there are issues that require a few people sitting in the same room. Perhaps we can talk offline about travel opportunities.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 3, 2017

Member

Closing this, as I think we've discussed this pretty extensively in the past and there isn't anything currently actionable here. Please reopen if needed @ryan-williams.

Member

fnothaft commented Mar 3, 2017

Closing this, as I think we've discussed this pretty extensively in the past and there isn't anything currently actionable here. Please reopen if needed @ryan-williams.

@fnothaft fnothaft closed this Mar 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment