-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release changelogs #199
Comments
A) I really like the idea of an organized changelog and I agree that we can use PR labels to achieve that in a way that does not significantly increase the burden on maintainers. B) I don't understand how the release cadence affects our ability to fix regressions. In my mind, we agreed to just release where we are at every 2 weeks. With that approach, if there is a regression and it takes a month to fix, then that regression would just be in two releases. I know that is not exactly what we have been doing, but it feels like a fine approach to me. If people want to be able to pin to a release that they know is a pretty good one, that takes us back to the conversation about "blessed" releases. |
I believe two weeks are not sufficient to find and fix the regressions. Due to stability problems in dask/distributed we recently needed to delay multiple releases because we didn't manage to fix them in the given time. |
I wanted to add some examples from how other Python projects handle characterizing release notes. Hope it helps.
We do this in LightGBM using the release-drafter bot.
Release-drafter automatically creates a draft release (only visible to maintainers), and then you have an opportunity to manually edit it before publishing it as an official release.
My personal opinion...asking individual contributors to update a changelog in files adds friction for not a lot of benefit. This is what For a sort of middle ground between manual and automatic,
That approach is nice because it doesn't result in merge conflicts, but it's still a burden on maintainers to remind contributors to add such a file. |
I like the above idea about the release-drafter bot. However, so far, we're not actually using githubs release feature but I guess this is not a deal breaker. Either we start using it or we simply use the bots to generate the markdown. IIUC, this would mostly be based on labels so we should start labeling everyhing properly either way.
Yes, I pointed out pandas because they are using this and I am aware of that process. It is quite tedious, the results are OK, definitely better than what dask has to offer right now. However, the merge conflicts are very annoying, especially with long CI build times |
On the flip side releasing less frequently could mean more regressions accumulate in |
On the topic of automating release notes:
|
We've got a release scheduled for tomorrow (xref #209). I'll propose we start improving the format of our changelogs by introducing the following sections for each release:
Thoughts on these categories? For tomorrow I'll just handle sorting commits into the above categories manually |
👋 hello from a longtime @jrbourbeau instead of "Highlights" being a bulleted list of PRs, I want to suggest doing something like what XGBoost does for releases. Their release notes typically include a few paragraphs summarizing the content of the release, followed by bulleted lists linking to specific issues / PRs and arranged under headers. For example: https://github.com/dmlc/xgboost/releases/tag/v1.5.0 That summary is typically written for the audience of "people installing and using XGBoost", and answers the question "why should I care about this release?". I've really appreciated that because sometimes one feature like "better support for categorical data" is actually a composite of many separate PRs. |
As a counterpoint, one advantage of bullet points is they are quicker to read for people that have a limited amount of time |
I'd like to revive this conversation again. We've come a long way and have been using the current process (see above, #199 (comment)) for a rather long time now. I think that the changelog improved but is still not where it should be. A couple of notes about the current state
Even I, as an expert, am sometimes struggling to parse our changelog to know where the notable changes are burried. I suspect that most end users do not even bother reading the changelog any longer. On top of this, I don't think it is very obvious for users whether a change in I would like to have a way to
|
I agree that we need a better way of filtering out irrelevant stuff (and order things better). The changelog currently contains a lot of things that most folks don't care about. Quick description how the pandas process works: The changelog in pandas is updated by the PR author. This means that the author of a PR creates a whatsnew entry in one of the available categories. PRs are only added, if they changed anything that is user-visible, meaning that maintenance PRs or something similar won't show up in the changelog. This has a couple of advantages:
|
How does the Pandas release process handle conflicts between adjacent changelog entries added by different PRs? |
Good question. We have a script that sorts entries alphabetically per category that's part of pre-commit. It does not remove all conflicts, but we rarely see conflicts since we introduced it. |
I don't think we'll get around manual effort here and considering that the release manager (mostly but not exclusively @jrbourbeau ) isn't always aware of all changes, I'm +1 for a pandas-like approach. Still open for other proposals, of course |
I took the changelog of the distributed 2021.10.0 and 2021.09.0 releases and tried to classify the changes into a few categories.
The 2021.09.0 changelog had 22 changes of which 14 (that's >60%) are maintenance related fixes as far as I can tell. Only very few actual user relevant fixes are in this release.
I see two issues
Changelog categorized distributed 2021.09.0
Moving on to the 2021.10.0 release which was actually delayed for stability reasons, the summary looks like
This feels like a much more meaningful release, however, the changelog is a bit of a mess, I likely misclassified a lot.
Changelog categorized distributed 2021.10.0
Original changelogs, see http://distributed.dask.org/en/stable/changelog.html#id2
I didn't go through the dask/dask changelog but it is similarly unorganized and I am struggling to decipher what is what and what would be relevant for me if I was a user.
I would like to propose
A) We start categorizing our changelog in this or a similar way. I believe we could easily do this via a semi-automated process using labels. We might even consider dropping the entire maintenance section.
Alternatively, we could maintain a manual changelog, e.g. similar to pandas. That's more effort and a frequent source for merge conflicts in my experience but that would also allow us to provide more context information for a change.
Either way, I believe a more organized changelog would provide a lot of value for users.
B) This will be a more lengthy discussion but I would like to start a conversation about our release cadence. In particular with stability problems all over the place, the two weeks cadence feels a bit rushed sometimes to fix known regressions and properly test them.
Edit: Let's move discussion around B to #200
The text was updated successfully, but these errors were encountered: