Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More frequent major releases for arrow-rs #1120

Closed
alamb opened this issue Jan 1, 2022 · 22 comments · Fixed by #1143
Closed

More frequent major releases for arrow-rs #1120

alamb opened this issue Jan 1, 2022 · 22 comments · Fixed by #1143
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Jan 1, 2022

TDLR: I propose doing major releases for arrow-rs more frequently (up to every other week) directly from master, breaking the correspondence with the main arrow release

For example, the release cadence might look like

  • 2022-01-01: 7.0.0
  • 2022-01-15: 7.1.0 (no backwards incompatible changes)
  • 2022-02-01: 8.0.0 (new backwards incompatible change)
  • 2022-02-15: 9.0.0 (new backwards incompatible change)
  • 2022-03-01: 9.1.0 etc
  • 2022-03-15: 9.2.0
  • 2022-04-01: 9.3.0
  • 2022-04-15: 10.0.0
    ...

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The rust ecosystem as a whole is "fast changing". This means:

  1. Many pieces of the ecosystem change frequently that require downstream updates (e.g. new clippy lints released in stable versions of rust that require code fixes to pass CI)
  2. Many crate authors don't spend effort maintaining stable releases, but instead fix issues by pushing new releases

This means the ability to quickly upgrade to new downstream libraries is critical to help fix issues like #1101

With a few notable exceptions (e.g. tokio) most of the rust ecosystem pushes new major releases frequently (monthly if not more so), often via 0.X releases.

The current release cadence and versioning scheme of arrow-rs inherits from the C/C++ and python ecosystems with a major release every 3 months while maintaining a backwards compatible branch.

I propose moving the rust implementation in closer alignment with the rest of the rust ecosystem with more frequent major version releases.

Describe the solution you'd like

Continue to release every 2 weeks; However, release all new versions directly from the master branch, picking a new version number based on the changes in the crate (a new major version if there were semver changes, minor if not)

Pros:

  1. Same number of releases as today (1 every other week)
  2. Less branch maintenance overhead (selfishly good for me)
  3. Major changes to dependencies(e.g. upgrade to newer versions of prost) available sooner

Cons:

  1. downstream crates would likely have to do major updates more frequently (cargo update doesn't pick up new major versions)
  2. Rust versions would no longer necessarily align with C++/etc mono repo release version

Much of the rust ecosystem, as described above, is used to the "do major updates frequently" mindset, and furthermore tools such as dependabot reduce the effort required to do so, I believe the first con is manageable.

Describe alternatives you've considered

Option: 1 "major release on demand", where we released when there were "enough" changes built up to do a regular release;

Option 2: keep the same structure (a master and active_release branch); do a major release every month instead of every 3 months, and do a minor release every other week.

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Jan 1, 2022
@tustvold
Copy link
Contributor

tustvold commented Jan 1, 2022

often via 0.X releases

I think this touches on something key. In my opinion, the crates in this repository are still somewhat beta if not alpha software, with major APIs still getting fleshed out. As such it would perhaps be more expected for it to be 0.x software with the accompanying frequency of breaking releases. I accept the version ship has sailed, but I think it makes sense to pretend it hasn't and adopt a more rapid breaking release cadence as suggested 👍

FWIW similar arguments apply to many of the arrow-rs dependencies. It isn't so much that the ecosystem itself is "fast changing", just a lot of the ecosystem isn't yet v1 😆

@paddyhoran
Copy link
Contributor

Also another pro is that we can hopefully bring arrow2 back into the fold (though that would need confirmation/additional work). As I remember it, @jorgecarleitao's main sticking point was that the Rust version of arrow was sticking with the C++/Python versioning scheme.

@alamb
Copy link
Contributor Author

alamb commented Jan 4, 2022

In conclusion, from my perspective maintaining an active release branch with stable patches and major releases every 3 months doesn't provide enough value for the cost

The cost is borne both by me actually creating the releases as well as users who are slowed down picking up updates for dependent libraries (the latest version of tonic, for example) or waiting for their changes to be released.

So unless there is significant pushback (ideally along with a volunteer to help maintain a stable branch) I will plan to start doing releases for arrow-rs directly from master as proposed on [1] , starting with arrow-rs 7.0 (will make a candidate later this week and hope to release early next)

I think the only downside to this will be the oft discussed "arrow will end up with versions 22.0.0 rather than version 0.22.0" which may lead to confusion about its relative stability. We can add some documentation about this and adjust if it causes too much confusion.

@alamb
Copy link
Contributor Author

alamb commented Jan 4, 2022

TLDR is I plan to release arrow-rs 7.0 as normally scheduled at the end of this week, and then after that only release from master every other week (rather than maintaining a more stable active_release branch)

So the first time something different will happen is in 2.5 weeks time

@liukun4515
Copy link
Contributor

If the master has the API-change commit after the last release, how to handle this?
@alamb

@tustvold
Copy link
Contributor

tustvold commented Jan 5, 2022

I wonder if we might further simplify matters by releasing weekly and incrementing the version based on if there are any new breaking changes on master since the last release. The major benefit would be this could be completely automated, potentially using existing tooling.

In the past I've used this approach for binaries, in particular using goreleaser, but I see no obvious reason something similar couldn't be done here...

Edit: a somewhat related question is must all the crates in this repo advance their version numbers in lockstep? Should a breaking release to parquet mandate a breaking semver bump to arrow?

@alamb
Copy link
Contributor Author

alamb commented Jan 5, 2022

If the master has the API-change commit after the last release, how to handle this?

@liukun4515 the proposal is that the following release would have a new major (rather than minor version)

So for example, if there was an API change on Jan 12 (after 7.0.0 was released) the next release after Jan 12 would become 8.0.0 (rather than 7.1.0)

@alamb
Copy link
Contributor Author

alamb commented Jan 5, 2022

I wonder if we might further simplify matters by releasing weekly and incrementing the version based on if there are any new breaking changes on master since the last release.

The major thing that prevents more frequent and/or automated releases is the release voting process. I often have to chase (via slack, etc) to get 3 PMC members to approve a release;

In terms of automation, sqlparser-rs uses cargo release -- https://github.com/sqlparser-rs/sqlparser-rs/blob/main/docs/releasing.md and it is pretty sweeet.

Edit: a somewhat related question is must all the crates in this repo advance their version numbers in lockstep? Should a breaking release to parquet mandate a breaking semver bump to arrow?

I don't know of any technical reason they not need to advance their numbers in lockstep -- it is a convenience so that we (mostly so I) don't have to scrutinize the changes to each crate and determine if/when a new version is warranted on each release

@liukun4515
Copy link
Contributor

Waiting for the release of 7.0.0!

@liukun4515
Copy link
Contributor

I wonder if we might further simplify matters by releasing weekly and incrementing the version based on if there are any new breaking changes on master since the last release.

The major thing that prevents more frequent and/or automated releases is the release voting process. I often have to chase (via slack, etc) to get 3 PMC members to approve a release;

Maybe we need more active PMC for arrow in the rust ecosystem.

@alamb
Copy link
Contributor Author

alamb commented Jan 6, 2022

Maybe we need more active PMC for arrow in the rust ecosystem.

Yeah -- I don't really know how to improve this. We do have several representatives of the Rust implementation on the PMC now (@jorgecarleitao @nevi-me @andygrove and @Dandandan) but their time is limited (as all of ours is)

I do wonder how much of the issue is the current process, which suggests running an automated script. I am not convinced this extra level of "quality assurance" is a real value add -- the real value add to me is having more than one person look at the release's content and say that the "content seems reasonable."

Are there any other thoughts on this thread?

@houqp
Copy link
Member

houqp commented Jan 7, 2022

I don't have anything to add other than that I fully support this move :) It will certain reduce @alamb 's maintenance burden and make it easier for downstream to get access to breaking changes 👍

@alamb
Copy link
Contributor Author

alamb commented Jan 13, 2022

Given there are no more comments I'll proceed with this proposal 👍

@tafia
Copy link

tafia commented Feb 17, 2022

I know this is an old issue but just giving my 2 cents.

While I definitely agree working out of master directly, bumping a major version so frequently is not super convenient.

Much of the rust ecosystem, as described above, is used to the "do major updates frequently" mindset

I am not sure where this is coming from. From my experience there are many important crates which are trying hard not to release new major versions unless necessary (rocket, actix-web, chrono, tokio, log for the most important ones).

As explained in a reddit post, I have a hard time keeping up with the major version changes. Updating one library is easy, updating all its reverse dependency much harder.

Anyway, thanks a lot for the hard work!

@alamb
Copy link
Contributor Author

alamb commented Feb 17, 2022

@alamb
Copy link
Contributor Author

alamb commented Feb 17, 2022

Thank you for your feedback @tafia -- it is nice to hear from someone who valued the incremental updates!

@xxchan
Copy link
Contributor

xxchan commented Jul 3, 2023

I think the only downside to this will be the oft discussed "arrow will end up with versions 22.0.0 rather than version 0.22.0" which may lead to confusion about its relative stability.

So this means currently arrow-rs is in 0.x state, right? I think it's better if relative stable API can be guaranteed when we reach 1.x. (Do you have any idea about a timeline?)

Much of the rust ecosystem, as described above, is used to the "do major updates frequently" mindset

I am not sure where this is coming from. From my experience there are many important crates which are trying hard not to release new major versions unless necessary (rocket, actix-web, chrono, tokio, log for the most important ones).

+1. arrow-rs seems to be one of the crates which do major updates most frequently. But from a positive perspective, maybe this reflects it's complexity and active deployment. 😄

However, the drawbacks are also clear, especially when arrow-rs is depended by libraries instead of end-user applications.

  • end-user app will depend on multiple versions of arrow-rs. This will slow down compilation and increase code size.
  • More importantly, different major versions' types are incompatible. So if a library uses arrow-rs's types in it's public API (e.g., datafusion, and I think arrow-rs has a larger need for this than other crates, as it's more like a common language for libraries to talk with each other, instead of only implementation details), then it will force the application (and other libraries exposing arrow-rs) to use the same version. There are workarounds like https://github.com/icelake-io/icelake/pull/65/files, but I think that's hacky and not sustainable.

@xxchan
Copy link
Contributor

xxchan commented Jul 3, 2023

After reading the thread a few times, the main need seems to be "release from main and don't maintain stable branch", which makes a lot sense to me, instead of "frequent major release". 🤔 The latter happened just because of the fact that arrow-rs isn't stable enough yet. Maybe we can just be more conservative about breaking changes after "1.0", e.g., only merge them in a batch before a major release.

@tustvold
Copy link
Contributor

tustvold commented Jul 3, 2023

release from main and don't maintain stable branch

Yes this is the primary motivation, we aren't setting out to cut frequent breaking releases 😅 . FWIW most breaking changes are fairly innocuous, e.g. adding Send bounds to a trait object, or returning Result instead of panicking, however, semver doesn't really give us a good way to convey this nuance. Whilst we could defer these upgrades, even the current two week lag to get changes into DataFusion causes people frustration.

Do you have any idea about a timeline?
Maybe we can just be more conservative about breaking changes

I don't have a timeline, but the pace of breaking changes is noticeably slowing down. There are some breaking changes in the pipeline concerning scalar representation #1047 but I hope that following that we will be in a better place to maintain API stability.

@xxchan
Copy link
Contributor

xxchan commented Jul 3, 2023

On a side note, I think the fact that rust ecosystem is more willing to use 3rd party dependencies and update them, and crates are more often released actually requires libraries to obey semver more strictly and do major updates less often (because Cargo has a nice solution for diamond dependencies ). Otherwise it's actually discouraging users to use/update it and the situation will fallback to C++/Python's.

@alamb
Copy link
Contributor Author

alamb commented Jul 3, 2023

I predict this crate will adopt a more measured pace of versions once the pace of development slows down (and most of what goes in is bug fixes). Interestingly we haven't gotten there yet even after 3-4 years of development.

I am likely feeling good that in the next 6 months or so we'll start seeing versions with non-major bumps each time

@alamb
Copy link
Contributor Author

alamb commented Jul 3, 2023

On a side note, I think the fact that rust ecosystem is more willing to use 3rd party dependencies and update them, and crates are more often released actually requires libraries to obey semver more strictly and do major updates less often (because Cargo has a nice solution for diamond dependencies ). Otherwise it's actually discouraging users to use/update it and the situation will fallback to C++/Python's.

I agree -- and furthermore given the nice tooling in Cargo (and dependabot) the versions keep updating even though it does take ongoing work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants