Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organization of a hot freezing day #6954

Closed
herbelin opened this issue Mar 9, 2018 · 50 comments
Closed

Organization of a hot freezing day #6954

herbelin opened this issue Mar 9, 2018 · 50 comments
Labels
food for thought Issue that could be closed but would still be worth thinking about. kind: meta About the process of developing Coq.

Comments

@herbelin
Copy link
Member

herbelin commented Mar 9, 2018

This collects opinions about how to organize a freeze with less stress and frustration (discussion started at #6247, skipping a post judged with a 馃憥 ).

From @herbelin:

I'm trying to understand how to organize things in a way which takes into account the complexity of the convergence process of a PR, which takes into account the personality of the submitters, which takes into account the time needed for reviews, the time needed for benchmarking and this is why I'm suggesting that it is not about one deadline where all of completing, reviewing, testing, documenting, freezing arrives at the same time, but a sequence of deadlines which gives enough time to each phase (thus making easier the task of @maximedenes, who unfortunately takes the rants of old geezers, and this is why I'm talking about defining what is the expected contract).

The synergy around Coq is wonderful, and everyone around is admirative of that, thanks to all the contributions of everyone, starting with @mattam82. My point is only about how to improve things further as one of the contributor (and incidentally how to avoid the stress accompanying a freeze).

@herbelin
Copy link
Member Author

herbelin commented Mar 9, 2018

From @RalfJung (which received a 鉂わ笍 at #6247).

(I think this is horribly off-topic here. I also assume this discussion has happened previously, but I wasn't there, so please keep that in mind.^^ But I got asked, and I anyways have a hard time not to raise my opinion, so here you go. I hope this will be my only comment on this matter.)

Specifically:

Don't you think the community would be grateful to us to give as much importance to features than to strict deadlines, and to adopt a well-balanced compromise?

I'm sad when a PR misses the deadline, and this one here we actually want to use in Iris. I'd love to see this PR in 8.8. But the day of the freeze is not the day to change the process because work didn't get done in time. (And just to be clear, I'm also not blaming you; there's tons of good reasons for work to get delayed.)

Personally, I am very grateful for the regular on-time releases, and I think strict deadlines are a necessary requirement for that. By having a regular schedule with a fairly high frequency, it is not a total catastrophe if a PR misses one release -- the next release will happen 6 months later, and -- most importantly -- we can rely on it actually happening on time. The "softer" model you advocate for subverts this concept, because the "softer" a process is, the less reliable it is. If the release is delayed for the benefit of one PR, then everyone else who got their PR merged in time has to wait longer for their feature to be released. That's not very well-balanced.

This is not to say there cannot be exceptions, e.g. for important fixes that may have to do refactoring that goes beyond the usual kind of bugfix. But this PR has been open for more than 3 months, and some design questions are still open. I assume you have some kind of process to decide about exceptions, and I assume it at least requires some form of consensus in the team, and if I were on the team (which I obviously am not), I'd say this PR has too many open questions and too far-reaching consequences for such a last-minute merge.

As much as feature-based releases (and you are proposing to do some kind of feature-based reasoning here) seem intuitively "right", I personally came to the conclusion that this is a fallacy. Linux, Firefox, Rust -- the big healthy active projects all use a time-based release schedule to ensure that features and, even more importantly, bugfixes reach their users in due time. I can feel your frustration, but I think a "softer" process not only significantly reduces the reliability of the process, it also puts undue amounts of (social!) pressure on the poor release manager who has to make the judgment call. Clear rules that everyone can agree to and accept help avoid conflicts. The more ambiguous these rules, the less useful they are. (Man, I sound really German, do I? ;)

(This actually only gets better with a higher frequency. Rust uses a 6 weeks cycle, and there is no freeze -- instead, they follow the train model: When a release happens, the current master is flagged as beta, and for the next cycle, i.e., the next 6 weeks, people can test the beta. Only bugfixes are backported to beta. So whenever something is released, it has seen 6 weeks of beta before that. Nobody mourns if a PR misses a Rust release, it's just a 6-week difference so who cares. But that process may require more man-power than Coq has at hand. But with a fully automated release process, maybe Coq could do something like 3 or 4 months, hopefully mitigating discussions such as this?)

@herbelin
Copy link
Member Author

herbelin commented Mar 9, 2018

From @maximedenes:

Rust uses a 6 weeks cycle, and there is no freeze -- instead, they follow the train model: When a release happens, the current master is flagged as beta, and for the next cycle, i.e., the next 6 weeks, people can test the beta. Only bugfixes are backported to beta. So whenever something is released, it has seen 6 weeks of beta before that. Nobody mourns if a PR misses a Rust release, it's just a 6-week difference so who cares.

I find this inspiring :) I'm curious: how often do they break compatibility compared to us?

@herbelin
Copy link
Member Author

herbelin commented Mar 9, 2018

From @Zimmi48:

I have several ideas around that, one of them being several deadlines like you just said and I mentioned in another thread. But I'd like to discuss this in a specific on-topic issue, because otherwise, all this discussion is just going to be lost.

@Zimmi48 Zimmi48 added the kind: meta About the process of developing Coq. label Mar 9, 2018
@herbelin
Copy link
Member Author

herbelin commented Mar 9, 2018

@RalfJung: very interesting post. Indeed an even shorter release schedule might be a solution to stress and frustration, and even maybe to features for those like me who work in parallel on several topics: making steps every 6 weeks and missing 2 or 3 deadlines shall be better than making steps every 6 months and missing 1 year and a half 馃槃.

@Zimmi48: waiting for your proposal.

@herbelin
Copy link
Member Author

herbelin commented Mar 9, 2018

Can someone (@Zimmi48?) comment about what is expected between the beta and final release?

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 9, 2018

I've got several ideas and they are not yet fully thought through so I'll try to explain them. But first let me answer your post right above (I also did in #873 but I'll keep only the generic part here):

The period of the beta is for bug fixes (and documentation improvements). However, we like as much as possible to avoid what we call "dangerous" bug fixes. Moreover clean-ups, and new features are strictly forbidden. Of course the category of "dangerous" bug fixes is not well defined but it includes very large diffs, changes to the behavior of tactics, or the pretyper.
If we discover an important bug after the beta release, of course we fix it nonetheless, even if the fix is "dangerous" but then we are forced to do a new beta and this poses the threat of making the final release late.

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 9, 2018

I've stayed at work later than I was planning to while not having the time to do what I was planning to do today. So I'm sorry but my answer to this (much welcome) thread will have to wait possibly a full week (possibly less if I can't manage to stay away from GitHub during my holidays).

@mattam82
Copy link
Member

mattam82 commented Mar 9, 2018

Just to be clear, of course I'm not terribly happy that namedobls and unifall-infra did not make it but I feel entirely responsible for that, as I prioritized doing more interesting research instead and did neither foresee the time needed to get those in mergeable state or acted preemptively to discover these issues by e.g. splitting unifall-infra early on so that people could review more comfortably. I think deadlines should stay strict (each time we tried to be soft things were actually delayed and surprises happened AFAIR). I like the idea of a kind of 2-phase process (for "substantial" changes), if we can agree to that and implement that, something akin to a rebuttal+(limitied time) shepherding period before the freeze. In retrospect, I had no luck with the calendar because early january I thought I would have this whole week and another one to finish this which in the end were eaten up by paper writing and a trip to Chile I couldn't move later. The lesson for me is that I will never again submit a big PR made of independent commits even though I actually developed them in the course of a (personal) coding sprint, and rather split them for easier integration/review/etc... To me this seems to be the one thing that slowed everything down in this case.

@herbelin
Copy link
Member Author

herbelin commented Mar 9, 2018

@mattam82: well, yes. It is indeed also sad for namedobls. One of the reasons I'm focussing on unifall is that I suspended some work waiting first for the integration of unifall and now I don't know what to do. Should I still wait for it, or should I proceed independently of unifall?

About numeral notations, I'm also particularly attached to it as this is something that I was looking forward for now maybe more than 10 years.

But this is not exclusive. I shall not pass the whole roadmap in review but the native integers is for instance something which I feel will be a key feature for many applications.

I don't know what conclusions to draw from that. There are so many cool things to do in Coq, so much willingness to contribute. How to organize ourselves? Are we too few?

@Zimmi48 is right that I probably did not communicate enough about my projects. As soon as works are dependent on other works, we should not forget to tell, even maybe if for telling that we are blocked. And conversely, we should probably not hesitate to ask news from what others do.

Yes, even shorter cycles, and 2-phase freezing process might already be good directions. Thanks for your message, @mattam82.

@ejgallego
Copy link
Member

ejgallego commented Mar 9, 2018

Yes, even shorter cycles, and 2-phase freezing process might already be good directions. Thanks for your message, @mattam82.

Indeed @Zimmi48 's idea of having a few more deadlines for merging looks very good. IMHO that should help. Coq could maybe use a 4 months release cycle? Indeed many people is moving to a mark closer to the 100 days deadline, so that could be the right spot as to relieve the psychological anxiety of missing the deadline.

Going to more practical matters, for now a critical TODO in order to improve the merging process is to have a better / faster CI system. It is very hard to have any development momentum with the current unbearable slow CI, so I suggest we all put some effort on improving this (cc: #6919 )

@herbelin
Copy link
Member Author

Another point...

I think it would help that @Zimmi48 and @maximedenes explain in more details the "contract" they have in mind, in terms of schedule and benchmarking (how many PRs they expect to merge on the last day, do they expect all known new bugs to be fixed or reverted, do they expect one, or two, beta, how do they plan to deal with unusual situations, ...), in terms of what they expect from others for a smooth freeze (what availability to pings, what availability of others for reviews, plans from developers about which PRs they target for a release and at which time, ...), in terms of what they expect to be thanked for (for their time, for the respect of schedule, for the respect of roadmap, ...), or any other relevant matter which would make their work more fluid.

Everyone of us has his/her own schedule and, obviously, lack of communication from developers created this time tensions, unexpected extra last-minute work outside of planned hours of work, etc.

It seems to me that we now started to have enough experience to be able to clarify a bit in advance the expectation of everyone of us in this (traditionally stressful but maybe less in the future) event which is a release freeze.

@RalfJung
Copy link
Contributor

respect of roadmap

If I may give my input on this, I think maybe it could help to decouple the roadmap from the release process. I may sound like a broken record for bringing up Rust times and times again, but I think there's a lot to be learned from how they are handling releases, so please forgive me. ;)

So, what I mean by decoupling is that the job of the release manager is to make a stable release out of the set of PRs that are done and merged at the agreed-upon time of the freeze. That requires lots of testing and communication and repository management and whatnot. However, I see the completion of a roadmap item as a separate job. With time-based releases, the release manager is not at all concerned with "which features do we want in this release". The release consists of whatever is ready at the given time. Rust's roadmaps don't even specify releases!

Then, for each individual PR, there will be someone assigned who takes responsibility to get that PR along -- the assignee in GitHub. Think of them as the "shepherd" for this PR. That person would aid the PR author in getting the PR in a mergeable state and ultimately makes the final decision for merging the PR. But this is a job completely separate from release management! PRs only ever get merged into master anyway, while (I assume) the new release gets its own branch on the day of the freeze, so merging something into master will never affect the current beta stabilization process. Therefore, in principle, shepherding should happen without taking freeze days into consideration at all. Either the PR is ready and should get merged, or it is not and it shouldn't get merged prematurely.

Now, I am an outsider to the Coq development team, so maybe what I am saying makes very little sense in this context -- in that case, I apologize. I gather that the jobs of release manager and PR shepherd are done by the same people here (Rust has plenty of people doing PR shepherding and only one release manager, AFAIK). Still, maybe it makes sense to consider this two different hats that the same people are wearing at different times?

If this comment is not helpful, please ignore it. :)

@maximedenes
Copy link
Member

maximedenes commented Mar 10, 2018

Now, I am an outsider to the Coq development team, so maybe what I am saying makes very little sense in this context

Not only it does make sense, but you basically describe our current model.

-- in that case, I apologize. I gather that the jobs of release manager and PR shepherd are done by the same people here

No, Th茅o (for minor versions) and I (for major versions) are the release managers. Shepherding is done by the core developers.

In this cycle, I made the mistake of trying to gather the objectives developers had for this release (in terms of developments they were planning to complete), to help them to organize themselves. But in most cases I either got very imprecise feedback, or completely unrealistic schedules (with some exceptions, notably @gares and @Zimmi48 did a very good job at keeping an up-to-date schedule). This got me convinced that indeed, we should have roadmaps that are independent of versions, like Rust.

@herbelin
Copy link
Member Author

I find @RalfJung's comments on Rust very interesting and I'd be interested about knowing what others think about it.

I'm opening a parenthesis about what @maximedenes is saying:

But in most cases I either got very imprecise feedback, or completely unrealistic schedules (with some exceptions, notably @gares and @Zimmi48 did a very good job at keeping an up-to-date schedule).

I don't know what you call "imprecise feedback, or completely unrealistic schedules" and I'd be happy to know in which category you are putting me!

As I already said in other occasions, I'm generally working on different projects (Coq or not Coq) by successive waves. I'm someone who works better with deadlines, using the deadlines as a scheduler of the different activities I'm working on in parallel (not saying that it is a good way to work, just saying that this is my way of working, and other people may have different ways). Thus, if a roadmap is not bound to a deadline, this shall be more difficult for my way of working. I'm not saying that a roadmap has necessarily to be bound to the same deadline as the freeze properly speaking (and this is why I like the idea of a release process in several stages), but I need (social-level) deadlines, whatever form they take.

As far as my "roadmap" and PRs are concerned, I'm not fully satisfied but not unsatisfied either. It would be long to do a complete evaluation, but I could conclude several of the PRs which were pending for some times in personal branches, which seems good to me. But the release schedule was not clear (and this is one of the reasons I'm pushing for a clearer schedule). All PRs were supposed to be acceptable up to the date of the release but in practice this was not true. Look at #1003, #6247, #6826, #6833, #6835, #6836, #6844, #6845, #6866 which were ready on time regarding the code but for which a process of reviewing satisfying everyone could not be done on time. There is an interest of developers in reviewing the work of others which is absolutely great and which works well when done "in real time", but which does not work well in some other cases, leading to stalling PRs, and when a PR is stalling what to do? To give a 2-weeks delay before pinging? And what if nothing happens? We shall not stay behind the keyboard waiting for an action of others. We shall start to concentrate on something else and the 2-weeks stalling shall become a 2-months or even 6-months stalling, because it would not even be worth to do a ping, because we moved our concentration to something else and we cannot switch context so easily from one activity to another. After the 6 months, a new freezing date arrives about which we are ready to reallocate some concentration on a definite period of time. By good luck, this finds completion for the release. By bad luck, this may be postponed again.

I could continue for long but shall stop here for now. At least I want to say that the opportunities to improve our global organization seems huge to me. And, back to @maximedenes, could you give an example of what you would have expected from me which you did not get? (Note in passing that I went very happy when I saw you asking developers about their plans: this is something which I think is crucial and that you are in good position for doing it, even if the conclusion might be that roadmaps "pings" should better be disconnected from freezing dates).

[Side question: how do Rust developers ensure reactivity of reviewing and absence of stalling? Natural selection? Number of people motivated in a PR not stalling? But sometimes, there are interesting though complex or original works developed by one person and the person is alone to bear his/her ideas? What to do in these cases?]

@RalfJung
Copy link
Contributor

how do Rust developers ensure reactivity of reviewing and absence of stalling? Natural selection? Number of people motivated in a PR not stalling? But sometimes, there are interesting though complex or original works developed by one person and the person is alone to bear his/her ideas? What to do in these cases?

I think it's a combination of regular triage (every open PR is regularly checked by the triage team to see if the labels ("waiting on review", "waiting on author") still make sense; if the PR is blocked on someone they ping them), and one particular person (the shepherd) being responsible, which avoids the usual problem of responsibility dissolving in a group.

However, looking at some of the PRs you mentioned I get the feeling that Coq PRs (e.g. #1003) span much further into language design than Rust PRs, which are pretty much only about implementation. Design decisions are handled in the RFC process, which is separate from code contributions.

@ejgallego
Copy link
Member

Yeah so in the Rust model the psychological pressure of missing a deadline is very low, that is good. As we have discussed in the WG kernel does around 100 days. IMHO for Coq anything less than 120 days seems a bit too much, thus I do like @Zimmi48 's idea of alpha releases.

@RalfJung , FYI the current model has been mainly inspired from the GCC and Kernel release models [there may even exist some slides at that purpose], with some key differences as we don't treat regressions and maintainers in the same way.

@herbelin
Copy link
Member Author

herbelin commented Mar 11, 2018

Thinking aloud: if we focus on the freezing process alone, besides the miscommunication around #873 (apparently initiated by my inaction, sorry, which led to speculations of @Zimmi48 and @maximedenes on my intents), it went particularly well with a high professionalism of @Zimmi48 and @maximedenes with respect to their objectives.

What did/does not work well, as discussed above, is to make a freeze coincide with, let's call it a "feature marathon". This is not surprising since this is the way most of us are/were used to work, and, even in the mind of @Zimmi48 and @maximedenes, when they asked about whether the freezing date was ok for developers, the asking was (I guess) implicitly about whether this was ok for them as a feature freeze date, and thus as a date to complete features.

So, bouncing on the ideas above of separating features and freeze, what about organizing say once a month or once every two months a "feature marathon". We decide the dates with a poll, like for working groups (though taking care of keeping an at-least 2 weeks distance from a freezing date). The marathon lasts 2 or 3 or 4 days. All features should be published as PRs at latest on the eve of the marathon, and during the marathon itself, we review, we make us available for the questions which are asked to us, we improve our PRs according to the discussion, and, on the last day before the marathon ends, we summarize the status of all PRs discussed during the marathon so that it is clear what everyone has to do in the next following weeks and that the PRs do not stall. In short, the rough idea is that the marathon that some of us experienced in the last few days is turned into an official event disconnected from freezing. So, less pressure on all sides.

This is not without relations with the PRs sessions at working groups, but done online rather than physically, so that the process is lighter, more easily combinable with our other duties, with smaller groups being able to interact in parallel, open to more contributors, not constrained to residences easily reachable from Paris. The kind of discussions we have at WG PRs sessions could actually be also integrated to the marathon with say a 2-hours video conference at the end of each day to discuss points about which a consensus seems difficult to find (as in the mli/ml and intf cases), or to explain more broadly the issues with a new feature (as e.g. with printing scopes or with intros '(x,y) as @maximedenes was asking).

@maximedenes
Copy link
Member

First, a side note for #873: in order to avoid having to backport the integration of the new documentation, I merged the revert of #873 before branching. I'm ready to help rebasing a PR that puts it back with the fix, in view of 8.9. The easiest way is to revert the revert in that PR (to be integrated after branching).

I suggest we discuss the length of the release cycle during a meeting (will send an e-mail about it).

I'm not sure I understand this whole idea about marathons. As much as I like sport, I don't think it's a good way to organize our work. In software industry, companies often need to deliver specific features at a specific time, for a client. So they need deadlines to make sure the process converges in time, and reaches its objectives. We don't have these constraints. We agreed to be strict on the release dates, because it makes the life of our users simpler (and ultimately, the one of developers as well). But we can move features freely. Each developer is responsible for organizing their own schedule, of course. Is there anything wrong with that?

As for the design discussions (what is typically needed for printing scopes or intros '(x,y) and many other changes), we are definitely lacking some process. But I don't think one can complain that discussion didn't happen when the topic wasn't even put on the agenda of a working group. On the longer term, though, I would indeed like to have something more flexible, like ad-hoc groups of developers working on a common task and reporting to the WG.

@gares
Copy link
Member

gares commented Mar 12, 2018

Whatever proposal we are going to debate, I'd like the following points be made explicit in the proposals, to ease discussion:

  • which exact problem does the proposal address, which aspect does it improve
  • pros/cons of the proposal for the users (what do they gain, what do they lose)
  • pros/cons of the proposal for the developers
  • pros/cons of the proposal for the software itself (as a product, eg quality, maintainability)
  • orthogonality w.r.t. the release process (does it have/has-to-have an impact on the release)

@herbelin
Copy link
Member Author

@maximedenes: Thanks for your message, but I'm afraid I have to say that I don't see in your message an answer to my questions.

Starting with #873, I'm one of the main persons involved with it and it would seem natural that my opinion is taken into account. I call this a miscommunication. How do you call it? Are you willing to repair this miscommunication?

About a "marathon" (or whatever name it is called), I was just trying to find solutions. You seem to start from the point of view that everything did well for the deadline. I'm saying that it did not. I appreciate the time, the reliability, the preparation you made about this release, talking about it long in advance, spending a long day on last Friday from early morning to late in the night, trying to be nice and diplomatic with those who were ranting, and all this was wonderful, and I really feel extremely grateful about it. I'm however observing that several features which should have been there are not. For 8.7, I had already asked you whether it would not be more reasonable to have a two-step deadline and you said no. Here again, the purpose of the deadline was unclear. I mean, I think I understand your point of view. You see the deadline from a release point of view and from your point of view it was clear. You make a personal evaluation of what you think is ready and this is your definition. But, then, obviously, this clashes with seeing the deadline as ready to be integrated from the point of view of the developer, like several of us have in mind. This is this mismatch that I, @RalfJung and @mattam82 are trying to address with a two-step deadline. A deadline in your sense, and a deadline in the sense of the work of developers. A deadline for developer works, without the pressure of doing a review in urgency, without the urgent pressure of convincing others of the interest of the proposition, or the urgent pressure of fixing a bug (+ a way to address the issue of stalling PRs), without either having to wait weeks if not months to get feedback, without the frustration that a work is abnormally blocked by lack of a good organization.

Now, I'm sorry if I use a tone which apparently makes you react as if you were attacked. I mean, some of my comments on last Thursday about what seemed to me a rigor which was not appropriate to the reality of the situation, these comments, were, I agree, provocative. I had hopes that my provocation would trigger a discussion but it did not and this is why I'm trying now to trigger the discussion in a different way. I'm sorry about having been provocative. I know your proud and I'd be willing to please it and to make you happy in general, but a necessary condition for that is also that my own words are respected. Is that clear?

In particular, I'm fine if you want to be inflexible on a deadline but this is one more reason to have different deadlines. One which is inflexible and good for you. And others which are more loose and which are taking into account the reality of how an implementation process works, with sometimes unexpected technical issues, with sometimes unexpected long discussions, with sometimes unexpected long pings.

Well, there are a lot more to say, but for the time being, I would like that we could focus on really discussing contents rather than playing a school playground game 脿 la Trump-Kim. Again, I'd be happy to contribute to make your work more fluid, and to accept that you need inflexible deadlines in exchange of the services that you give to the team (see also my question above asking you to clarify the "contract" you would like to have with the team), but I can be supportive of the way of working which is good for you only to the condition that I could feel my own words, my own way of working and my own objectives being respected.

@maximedenes
Copy link
Member

maximedenes commented Mar 12, 2018

I'm not sure why you say I feel attacked, I certainly don't.

Starting with #873, I'm one of the main persons involved with it and it would seem natural that my opinion is taken into account. I call this a miscommunication. How do you call it?

Indeed, I call it miscommunication. Look at the thread #873. I wrote the WG conclusion there, never got a reply. So of course I inferred there was no issue. Maybe I miscommunicated with my crystal ball?

Are you willing to repair this miscommunication?

Yes, that's why I wrote I'd be happy to help preparing an 8.9 PR.

You make a personal evaluation of what you think is ready and this is your definition. But, then, obviously, this clashes with seeing the deadline as ready to be integrated from the point of view of the developer, like several of us have in mind.

Interesting, I wasn't aware of this mismatch. So what PRs were ready according to your definition and not integrated?

several features which should have been there are not

I'm not sure what "should have been there" means, but again, can you give pointers? I would be curious to know what responsibility I play in each of these not being there.

@gares gares added the food for thought Issue that could be closed but would still be worth thinking about. label Mar 12, 2018
@gares
Copy link
Member

gares commented Mar 12, 2018

The #873 affair is pretty simple:

  1. @herbelin submits a PR for unfinished work (the commit message says explicitly there are open problems, eg the problem I later reported as Regression in notation printing聽#6777 )
  2. @ejgallego does not spot the problem(s) in the review
  3. our tests are to weak to detect the problems
  4. the PR is merged
  5. the problems are not fixed (in 2 months)
  6. the merge is undone
  7. endless, unpolite, discussions, complaints... the usual waste of energy
  8. a fix is proposed

In my opinion:

  1. should never happen, if the author of the PR is aware of problems should label the PR as such and the PR should not be merged
  2. can happen, that is life
  3. should be fixed
  4. can happen, it is a mistake, but hey we have revert exactly for fixing that
  5. can happen, we are all busy
  6. the discussion at the WG about open issues results in the decision to revert: it is the only viable solution since there is no fix and the regression is too important (a blocker)
  7. I see better ways of using our energy, let's go back to work and discuss how to smooth our procedures using more a efficient media (and with a precise agenda, see my points above)
  8. wonderful, if the fix turns out to be OK we can merge back the PR + the fix

@herbelin
Copy link
Member Author

@gares: thanks for this view at the situation. I had not seen like this and this is why I found very strange to discover at some time in the discussion a sentence "WG discussion: we agreed to revert this temporarily" without me having been contacted to get my opinion.

[In parentheses, I still think however that it would have been normal in such a situation to have contacted the author about the result of the discussion. Is there really someone to disagree with this "policy"?]

Back to the discussion, I don't think @ejgallego is to blame. I already mentioned it a couple of times, but I see a problem of chicken and egg in this kind of story. New features need experimentation. How to experiment if not by beta-testing the feature at some time. Consider e.g. custom entries, consider e.g. the new apply and rewrite of Matthieu, consider the new introduction patterns. What are the concrete plans to beta-test them? By chance we have @JasonGross who gives a lot of his time for experimenting and giving feedback. But besides @JasonGross (and @JasonGross is not testing everything), who to sollicitate?

So, for instance, taking a concrete problem, what should be done for testing custom entries?

@herbelin
Copy link
Member Author

@maximedenes:

So what PRs were ready according to your definition and not integrated?

I gave a list of examples above. Please don't argue on the meaning of "ready" because this is precisely the point. So let me continue first with what comes next.

We need, sorry for the pompous wording, a "feature manager". We need someone who knows about what is going on in general, who actively goes to people to ask them about their projects, who encourages them to develop these projects, who encourages them to interact with others having similar projects. We need this "feature manager" to be in close interaction with who plays a role of release manager so that they can coordinate together and evaluate the time needed for a feature to be reviewed, tested, updated after the developer him/herself is satisfied by the code, so that, say, if the evaluation of the time needed for reviewing is one week, the feature manager coordinates with both the developer and with who prepares the release to get the feature released with reasonably high probabilty in the next version.

Currently, we all partially play a bit this role informally, developers and users, in particular when we are specially interested in an ongoing work. @maximedenes is actually playing this role when he is collecting data for the roadmap. @maximedenes has also partially played this role recently when he ping-ed people a couple of weeks ago to inquire about the status of their PRs. @maximedenes would tell better himself whether he would like to play more this role, but my feeling is that @maximedenes prefers to consider that developers have to manage their schedule on their own and that he is not here to make an evaluation with everyone of us about what should be the optimal deadlines, feature by feature, planned PR by planned PR, so that the estimated process of completion of the PR after the developers has released his code leads to have the feature reasonably integrated in the next release.

This being said, I can now answer this comment from @maximedenes:

I'm not sure what "should have been there" means, but again, can you give pointers? I would be curious to know what responsibility I play in each of these not being there.

Whether your responsibility is engaged in these features having missed 8.8 depends then on how much you would like to play the role of "feature manager".

In any case, I strongly believe that we need a "feature manager". @mattam82, what do you think?

@maximedenes
Copy link
Member

I think I need more concrete examples. What should the "feature manager" have done about unifall, for example?

@maximedenes
Copy link
Member

We need this "feature manager" to be in close interaction with who plays a role of release manager so that they can coordinate together and evaluate the time needed for a feature to be reviewed, tested, updated after the developer him/herself is satisfied by the code, so that, say, if the evaluation of the time needed for reviewing is one week, the feature manager coordinates with both the developer and with who prepares the release to get the feature released with reasonably high probabilty in the next version.

Does this sentence mean that the "feature manager" can delay the release?

@gares
Copy link
Member

gares commented Mar 13, 2018

without me having been contacted to get my opinion.

[In parentheses, I still think however that it would have been normal in such a situation to have contacted the author about the result of the discussion. Is there really someone to disagree with this "policy"?]

False. The author was made aware of the decision: #873 (comment) . The revert happened a week later, the day of the freeze.

@herbelin
Copy link
Member Author

Does this sentence mean that the "feature manager" can delay the release?

I let it open, but before all, it depends about what you mean by "can delay"? If you are asking whether a "feature manager" has authority to delay a release, this is not what I meant (i.e. the "feature manager" has no more authority than anyone else, i.e., in my view, the objective for him/her, like for others, is to eventually arrive at situations which are good for everyone).

I think I need more concrete examples. What should the "feature manager" have done about unifall, for example?

For unifall, I think that both you and me ping-ed @mattam82 enough often so that we somehow made what we could do about it, so I would say that a large part of the responsibility is the one of @mattam82, but I feel responsible too, even more that @mattam82 told me that he was ok if I wanted to help him.

I felt a bit concerned when I read the suggestion to split the unifall 1 PR into 15 pieces. @mattam82 seemed to think that it would be feasible, but my naive feeling is that it would be difficult to have 15 independent pieces (though I did not study the question in details), that the pieces would remain made the ones on top of the others and that reviewing 15 pieces, possibly nested, would not be easier than rewieving unifall 1 as a whole (but I may be wrong and I'm ready to do the experience of such a split).

About unifall 1, somehow, I'd rather like to ask you and @ejgallego and @Zimmi48, and whoever interested about more details, what are your expectations with it? What would you like to review more precisely? What would you like to learn about it? What is missing for a merge according to you?

So, the unifall question is a difficult one. I know what I would have recommended myself (a merge, trusting the work of @mattam82, applying fixes afterwards whenever needed, depending on remarks from others, depending on bugs discovered). I don't however know what a "feature manager" should have done in the absolute.

In particular, I'm very concerned in general by the demands for reviewing and by how to answer at best these demands. (And this goes beyond the case unifall. It happened with #982, it happens with custom entries and I don't know how to satisfy all requirements here.)

@gares: By contacted the author I meant something like "the discussion at the working group are converging towards a revert of the PR for this or that reason: do you support the WG discussions that a revert would be a good decision?". But this is not what happened. I saw messages saying "we agreed to revert this temporarily" or "will be fixed by reverting 873" but I saw nowhere that the "we" or the "will" was asking about my opinion. Maybe it is just for you a question of vocabulary, but even if only a difference of vocabulary, I think it matters in terms of quality of interaction between people to adopt inclusive attitudes and to request all opinions of people who are concerned. Don't you share what I'm saying?

@ejgallego
Copy link
Member

The less managers the better :D :D

  • For custom entries there is the hook question, in particular @ppedrot 's comment about the hooks that should be solved before merge, otherwise seems ready.
  • For unifall, I seem to recall it was not passing the test suite or something, otherwise it seemed ready.
  • Merging is still too slow but what can we do? We just lack the resources to make it faster.

8.9+alpha1 should hopefully happen in 1 month so I hope we can clean the queue up a bit.

@maximedenes
Copy link
Member

8.9+alpha1 should hopefully happen in 1 month

What do you mean by that?

@ejgallego
Copy link
Member

What do you mean by that?

I mean that 8.9+alpha1 should hopefully happen in 1 month.

@maximedenes
Copy link
Member

maximedenes commented Mar 13, 2018

I mean that 8.9+alpha1 should hopefully happen in 1 month.

I don't understand what that means, sorry. Would you mind explaining a bit?

@ejgallego
Copy link
Member

ejgallego commented Mar 13, 2018

I don't understand what that means, sorry.

I meant that hopefully, as suggested by Th茅o, we can have an intra-release milestone "8.9+alpha1" in around 30-40 days. Sorry I had assumed people around here had the right context from the Gitter discussion a couple of days ago.

@maximedenes
Copy link
Member

maximedenes commented Mar 13, 2018

I'm really bored by all this aggressive communication. [@ejgallego edited his message above since then]

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 20, 2018

What should happen at the time of the freeze

I think it would help that @Zimmi48 and @maximedenes explain in more details the "contract" they have in mind

The ideal situation would be a freeze day where no more PRs than usual are merged and preferably not big ones.

do they expect all known new bugs to be fixed or reverted

Oh no, that wouldn't be possible. But when a new issue (specifically a regression) is considered bad enough to be a blocker, then it should either be fixed ASAP or the problematic PR reverted.

in terms of what they expect from others for a smooth freeze (what availability to pings, what availability of others for reviews, plans from developers about which PRs they target for a release and at which time, ...)

Ideally no particular expectations for others, except to not rush PRs in at the last minute. In particular, the PR author should not expect a review to happen on the freeze date. If a PR is ready on the freeze date but not before, it is expected that the PR can still miss the freeze. That's why I had written down "pull requests must be fully reviewed by this date, no exception will be made" and not "pull requests must be fully ready by this date" (if you see the difference).

Coordinating the development

I found @RalfJung suggestion of disconnecting roadmaps and releases very good.

What did/does not work well, as discussed above, is to make a freeze coincide with, let's call it a "feature marathon".

Indeed, that's a bad habit we should divert from.

So, bouncing on the ideas above of separating features and freeze, what about organizing say once a month or once every two months a "feature marathon".

That's a very common idea that is used in many start-ups and it is called a "coding sprint". However, I don't think it is appropriate in our context because most of the developers are researchers and each has their own schedule. It is IMHO a process that is suited only for full-time developers.

I need (social-level) deadlines, whatever form they take

I think that this is a need that several developers have and we have to take into account (not in the RM process but in the dev management and organization process), even if some people think that it is a bad way of organizing oneself. However, there's no real need for deadlines for all PRs to be the same. In particular, what about setting a deadline with the author for all PRs that are in progress? If no progress is made and the deadline passes, the PR is closed (with appropriate friendly reminders beforehand).

The other idea was intra-release deadlines (the so-called 8.9+alpha1, etc). It scares @maximedenes because it looks like more work for the RM. We should make sure that it is not the case: the idea is just to spread the deadlines (to avoid a last minute rush, and to give deadlines to whoever needs them). Having 8.9+alpha1 in just one month is too soon IMHO though because we cannot attach any meaning to this date.

when a PR is stalling what to do?

If each PR has an assignee, the assignee should know what is blocking the PR (is it the author? is it a reviewer?) and require an ETA from this person. If the PR is still blocked after the ETA, the assignee should take the decision (dismiss the request for changes, close the PR for lack of progress, close the PR for impossibility to reach a consensus...). In particular, we should be less scared of closing PRs. PRs are not for TODO lists like issues can be, they are for integrating code, and code rots.

Note in passing that I went very happy when I saw you asking developers about their plans: this is something which I think is crucial and that you are in good position for doing it, even if the conclusion might be that roadmaps "pings" should better be disconnected from freezing dates

Not only should it be disconnected from freeze dates but RM and dev coordination should be two separate roles. It hasn't been that clear in the recent months but it should definitely be separate.

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 20, 2018

How to experiment

New features need experimentation. How to experiment if not by beta-testing the feature at some time. Consider e.g. custom entries, consider e.g. the new apply and rewrite of Matthieu, consider the new introduction patterns.

It depends on the degree of stability that we expect to have in releases, and in the master branch. Some, limited, unstability in the master branch is OK, and that's why it is important to merge dangerous features early in a release cycle. However, regressions should be addressed quickly. The rest of the time, it would be better if experimentation for new features could be done in a way that is not threatening stable users: e.g. with a plugin (even if it is in the main code base), an experimental option, a module, etc.

That's BTW what @mattam82 is proposing for Unifall.

About specific PRs

The notation PR that was reverted

I had not seen like this and this is why I found very strange to discover at some time in the discussion a sentence "WG discussion: we agreed to revert this temporarily" without me having been contacted to get my opinion.

I saw nowhere that the "we" or the "will" was asking about my opinion

It was assumed that you would react if you disagreed, but maybe this should have been made clearer indeed.

About unifall-infra

@herbelin

About unifall 1, somehow, I'd rather like to ask you and @ejgallego and @Zimmi48, and whoever interested about more details, what are your expectations with it? What would you like to review more precisely? What would you like to learn about it? What is missing for a merge according to you?

I did a detailed review because nobody had done it before so I will refer you to it and I would like my comments to be addressed before merging. In particular, I said that I didn't review the biggest commit and this one would have to be reviewed specifically by someone competent. Furthermore, I did not suggest to split it into 15 parts, but rather 3 or 4 parts. If @mattam82 thinks 15 parts are better, that is also OK with me.

@herbelin

trusting the work of @mattam82

You trust too much. In particular, @mattam82 said to me that he didn't want to be trusted, he wanted someone to check his work carefully.

@ejgallego

For unifall, I seem to recall it was not passing the test suite or something, otherwise it seemed ready.

Completely wrong. It was passing the test-suite. I blocked it after reviewing it because noone else had done it before.

@mattam82

The lesson for me is that I will never again submit a big PR made of independent commits even though I actually developed them in the course of a (personal) coding sprint, and rather split them for easier integration/review/etc...

Big PRs are indeed bad because they are slow to reach completion, slow to review, and whenever they finally get merged, we often realize that we have been less careful when reviewing and bad things have slipped in that would have never been accepted if they had been part of smaller PRs.

@ejgallego
Copy link
Member

I agree with everything you wrote which seems to me pretty well motivated @Zimmi48 , I only have a small comment:

In particular, we should be less scared of closing PRs

Yeah, we should. On the other hand I am keeping some PRs open for a long time as a reminded "I am working on this feature", however the thing was harder than expected but I'd like still to keep communicating my intentions, make the code available for collaboration.

I agree that using PRs for this purpose is not ideal, but I am not finding a sensible alternative. On the other hand, I think that, for example, in Linux, they keep long running branches as for this particular purpose and they are in a sense a kind of "PR", advertised but not yet ready to merge.

So maybe the question is "how to advertise a branch", without asking a direct merge.

@RalfJung
Copy link
Contributor

So maybe the question is "how to advertise a branch", without asking a direct merge.

GitLab has WIP merge requests for this. In GitHub, this could be implemented as a simple convention: If the PR title starts with "WIP" (or maybe some label), then it is NOT ready to be reviewed. If someone is looking for PRs to review/merge, they should ignore such PRs.

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 20, 2018

There is the "needs: progress" label for this already. It's not necessarily a problem to keep a PR opened for several months as long as there is clear communication on the intents (and an ETA). The problem is with the PRs that stay open without anyone knowing what the author's intentions are.

@ejgallego
Copy link
Member

ejgallego commented Mar 20, 2018

I guess these should be cleaned up indeed, in the same way we do bug squash parties, we could certainly include PRs in the janitoring.

@herbelin
Copy link
Member Author

The problem is with the PRs that stay open without anyone knowing what the author's intentions are.

Asking the intentions of the authors may then help :)

@ejgallego
Copy link
Member

Asking the intentions of the authors may then help :)

Even better would be to be sure we can use tags, projects, and milestones in such a way that intentions are fully captured by meta-data.

@herbelin
Copy link
Member Author

The notation PR that was reverted

It was assumed that you would react if you disagreed, but maybe this should have been made clearer indeed.

I did react by cautiously asking about more details and by trying to understand why the opportunity of a fix has not been considered in the discussion.

That's why I had written down "pull requests must be fully reviewed by this date, no exception will be made" and not "pull requests must be fully ready by this date" (if you see the difference).

Not only should it be disconnected from freeze dates but RM and dev coordination should be two separate roles. It hasn't been that clear in the recent months but it should definitely be separate.

Well, I think the misunderstandings are now resolved. PR convergence and releasing are different processes with different objectives, and I shall stop thinking at a release date minus 7 days as a PR-convergence date.

About reviewing

You trust too much. In particular, @mattam82 said to me that he didn't want to be trusted, he wanted someone to check his work carefully.

I must confess that, at the current time, my ideas are pretty confused about what to do with "reviewing" in general. True reviewing can require hours, so, I don't know how, with the small team which we are, how not to eventually trust, even if I agree that this also puts somehow pressure on the author.

Maybe a compromise is that authors tell the parts of code they don't feel fully comfortable with, so that reviewers can concentrate their energy on these parts.

Let me roughly try to distinguish different aspects of reviewing:

  1. pointing at obvious typos, at suspectful parts which could be typos, checking uniformity of style
  2. clarity and covering of documentation
  3. testing the PR
  4. checking covering of regression tests,
  5. checking if invariants are respected, checking side conditions, checking if methodology is consistent enough with other parts of the code
  6. getting the global picture at the same level of understanding as the author; evaluate whether the solution adopted is optimal
  7. relevance of the PR: what are the advantages and drawbacks, what does it answer?

I must confess that points 3 to 6 seem to me pretty time consuming even if some of us excel in 5 (e.g. @SkySkimmer and @silene impressed me several times in how quickly they pinpointed problems in some of my PRs). Some close users are also very good at 3...

Let me digress. At IRIF, one of our colleagues developed and implemented a network protocol called Babel. This protocol is validated by the IETF international organization for internet and they have precise procedures, such as (from memory) having two teams independently developing an implementation and documenting the protocol so that they can compare where there are ambiguities and how they are resolved by each party, and then a third independent team do a third implementation by just reading the documentation so as to check if it is clear and unambiguous enough.

With enough resources, we could certainly follow processes like this for a software like Coq (in the sense of being two at spending the same amount of time on a same feature). But we have small resources and more good ideas to explore than human resources to implement them. So, I don't know what to do.

It seems to me that I try more and more to answer the explicit requests for reviewing that I receive, but there is also a balance to find between developing and reviewing, and a balance to find also between the different aspects of reviewing listed above. Is the general opinion that we are missing reviewing in general, or that the current practice is reasonable? If the opinion is that we are missing reviewing, about which aspects? And with which average time spent on reviewing a PR? And how could I improve my practice on my side?

A special comment on the kernel. With @mattam82, @ppedrot, @maximedenes, @SkySkimmer, @barras, @amintimany, I feel that we are enough to test-stress both the implementation and the foundations to ensure at the end a relative confidence. But I may be wrong, and, I'm ready to let my position evolve in other directions... Somehow, I feel a bit guilty not to do it but I should probably try to submit a funding proposal to get more human resources which we could then allocate in directions in which we are not enough (as e.g. re-starting a bootstrapping project for the kernel).

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 21, 2018

About what should a review be, I'll quote @gares who wrote something that I fully agree with:

I think PR reviews are complementary, as in disjoint union, with tests.
I check if the code is sound (wrt the patch, a lot of context is missing).
Travis checks if issuing "Load foo" raises an anomaly, diverges, or seems to work. The test suite today passes even if I remove the grammar rule for Load, or implement it with assert false.

So basically, most of the time, reviewers do not need to test the code. But what they need to do is ensure that when they see something that could look like a change of semantics, a relevant test is here (or added) to check the semantics. Our test-suite is lacking a lot today but we can improve it along the way.


Somehow, I feel a bit guilty not to do it but I should probably try to submit a funding proposal to get more human resources which we could then allocate in directions in which we are not enough (as e.g. re-starting a bootstrapping project for the kernel).

Please don't hire yet another engineer on a short-term contract given the experiences of the past. However, if this funding is geared at hiring PhD students and post-docs and coordinating between various groups of researchers that would work on the project, then very good! (Since you take examples from IRIF, a good inspiration here could be the CoLiS project).

@herbelin
Copy link
Member Author

About what should a review be ...

May I see point 3 of my list as the closest from what @gares is saying?

More generally, what about defining more precisely what a review is in the context of contributing to Coq so that we all speak the same language (and have less misunderstandings).

Just looking in GitHub, I see for instance the sentence "Reviews allow for discussion of proposed changes and help ensure that the changes meet the repository's contributing guidelines and other quality standards".

@Zimmi48
Copy link
Member

Zimmi48 commented Mar 21, 2018

Yes, I've wanted to write such guidelines for quite some time. Now may be a good time (although I don't volunteer to do it right now, so it could either be someone else writing the first draft, or me at a later time).

@ejgallego
Copy link
Member

We have a new merge policy and a way faster CI in place; it feels to me that the time to resolve this issue is coming close; I guess the next freeze will simply be a branching of the current master status. The issue of whether to do more frequent releases is a valid one IMO, but I guess we can live with 2 releases a year until the release process is almost automatic.

We certainly need more developers / maintainers to ensure steady PR progress but that seems like an orthongonal issue to me.

@ejgallego
Copy link
Member

It feels like this issue has also reached it useful lifetime, unless somebody comments, should close after the summer.

@ejgallego
Copy link
Member

Shall close soon unless someone objects.

@ejgallego
Copy link
Member

Well, time to close this interesting discussion has come.

It seems to me that the most sensible approach is not to have a freeze by itself, but a cutoff date for tagging towards the milestone, then when such list is clean [either by merging or delaying] the beta can be released.

That should work well for people IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
food for thought Issue that could be closed but would still be worth thinking about. kind: meta About the process of developing Coq.
Projects
None yet
Development

No branches or pull requests

7 participants