Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conditional CI pipeline + migrate some jobs from travis to gh #4431

Merged
merged 11 commits into from
Oct 5, 2022

Conversation

mbien
Copy link
Member

@mbien mbien commented Jul 23, 2022

Now that we have the basic CI features implemented (cancellable, restartable, junit-reports #4328 and a simple pipeline #3785), we can migrate some travis jobs to github.

I don't think we can migrate everything, since apache has a limited number of job runners which are shared between all apache projects - using them all every time someone opens a PR wouldn't be polite :)

so far migrated:
sig test, java hints, javafx, enterprise, java modules, profiler, apisupport, extide, platform modules, webcommon, rat, lib validation

  • travis: 31 -> 14 jobs (~12.5h -> ~5h)
  • gh: 12 (fixed) -> 25 (conditional) (~4h -> ~3h - 8h)
    (counts individual matrix axis as job)

The good news is that a lot of time is saved simply by having the jobs as part of the pipeline, the initial work is now paying off, travis did build in every job.
Another way to save CI time is by triggering long running jobs/steps only with certain PR labels, see below.

other changes:

  • travis uses the travis_retry command on many unreliable tests. Gh doesn't have that so I had to find a workaround with the help of a little script. The summary will now also show warnings:
    retry

  • we don't have to hide logs anymore since there doesn't seem to be a log limit. Additionally: logs are split by step so which makes inspection easier

  • some steps or jobs are now skipped based on selected PR labels

    • ci:no-build for disabling the build entirely
    • ci:all-tests for running all tests
    • Java, JavaScript, Platform and PHP unlock some long running steps/jobs in their respective category
    • Ant, Gradle, Maven or MX trigger the build tools job
    • JavaDoc and API Change run javadoc jobs
    • all jobs/steps run if the workflow was not started by a PR sync, e.g during merges
    • the full text search allows to quickly check which labels influence CI, e.g by entering "ci java" or just "ci".
      ci-labels
      tooltips will show the description too:
      image
  • merged sig tests, rat, lib val into a single job for all paperwork. The job will also print the commit header so that reviewers can check if it looks ok

  • merged extide, ant, gradle, maven and mx into a cross-cluster build-tools job

  • moved some steps around, unreliable should run first for fast fails

@mbien mbien added the CI Continuous Integration label Jul 23, 2022
@neilcsmith-net
Copy link
Member

I don't think we can migrate everything since apache has a limited number of job runners which are shared between all apache projects - using them all every time someone opens a PR wouldn't be polite :)

No, but the situation with Travis isn't much different. What about taking this as a chance to try some of the heavier tests on different triggers? A daily basis on certain branches, or on PR merges?

@mbien
Copy link
Member Author

mbien commented Jul 23, 2022

I don't think we can migrate everything since apache has a limited number of job runners which are shared between all apache projects - using them all every time someone opens a PR wouldn't be polite :)

No, but the situation with Travis isn't much different. What about taking this as a chance to try some of the heavier tests on different triggers? A daily basis on certain branches, or on PR merges?

sure thats all no problem in principle. The main issue is to actually split the tests up and decide what should run when. Triggering based on code paths, or even PR labels is also an option, but this has been discarded every time we had the discussion. PHP for example almost became conditionally triggered as first experiment.

But i do agree, this project runs probably ~20h of testing per sync (travis alone is 13h). Thats simply too much. (edit, the pipeline will reduce this a bit since each job won't need to build everything)

@neilcsmith-net
Copy link
Member

sure thats all no problem in principle. The main issue is to actually split the tests up and decide what should run when. Triggering based on code paths, or even PR labels is also an option, but this has been discarded every time we had the discussion.

Yes, I know! 🤯 I actually meant that as a hint that you might want to propose a suitable split, whatever makes sense to you. Then anyone who wants to -1 it can actually do the work to propose how to move things around to keep our CI usage at a workable (and polite to the rest of ASF) level. 😄

@mbien mbien force-pushed the ci-migrate-jobs-1 branch 3 times, most recently from 3557820 to 3e20d40 Compare July 23, 2022 19:32
@mbien
Copy link
Member Author

mbien commented Jul 23, 2022

@neilcsmith-net looks promising so far since some jobs on travis did spend most of the time building. When run as part of the pipeline the time goes down to a few minutes.

I don't really know which tests need a display. I might turn the xserver on for all tests on linux to keep things simple (done).

observations so far:

@mbien mbien added the PHP [ci] enable extra PHP tests (php/php.editor) label Jul 24, 2022
@mbien mbien force-pushed the ci-migrate-jobs-1 branch 3 times, most recently from 6202af8 to 07632f6 Compare July 24, 2022 21:14
@mbien mbien removed the PHP [ci] enable extra PHP tests (php/php.editor) label Jul 24, 2022
.github/workflows/main.yml Outdated Show resolved Hide resolved
@mbien mbien force-pushed the ci-migrate-jobs-1 branch 4 times, most recently from 4f3fb89 to 848b9a4 Compare July 25, 2022 16:33
.github/workflows/main.yml Outdated Show resolved Hide resolved
@mbien mbien force-pushed the ci-migrate-jobs-1 branch 5 times, most recently from 855c54c to 1f8e34d Compare July 27, 2022 10:13
@mbien mbien added ci:no-build [ci] disable CI pipeline and removed ci:no-build [ci] disable CI pipeline labels Jul 27, 2022
@mbien mbien force-pushed the ci-migrate-jobs-1 branch 2 times, most recently from 0f6e37b to 6523be4 Compare July 27, 2022 17:29
@mbien
Copy link
Member Author

mbien commented Oct 2, 2022

update: workflow_dispatch can only run on regular branches (e.g master). You can't select PR branches in the UI. So i can skip testing the rest.

@neilcsmith-net
Copy link
Member

neilcsmith-net commented Oct 2, 2022

Does re-run all jobs not work after relabelling then?

unfortunately not, that would have been perfect.

OK, second thought. Does changing the pull request event types to include labeled work? https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request

@mbien
Copy link
Member Author

mbien commented Oct 2, 2022

OK, second thought. Does changing the pull request event types to include labeled work? https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request

this would trigger a new run every time a label is added which is probably too much since we have also the do-not-merge label etc

@mbien
Copy link
Member Author

mbien commented Oct 2, 2022

but what if we abuse reopened for this?

there is also "unlocked" which we will probably also rarely use

@neilcsmith-net
Copy link
Member

this would trigger a new run every time a label is added which is probably too much since we have also the do-not-merge label etc

Given this enables many of the tests to eventually not run without required labels, I'm not sure that's a huge issue in practice, and it's certainly more obvious. If the label trigger works, IMO we could also consider going back to minimal tests by default from the start.

@mbien
Copy link
Member Author

mbien commented Oct 3, 2022

this would trigger a new run every time a label is added which is probably too much since we have also the do-not-merge label etc

Given this enables many of the tests to eventually not run without required labels, I'm not sure that's a huge issue in practice, and it's certainly more obvious. If the label trigger works, IMO we could also consider going back to minimal tests by default from the start.

when I read the doc this indeed seemed to be the obvious way how to solve this issue. But this dissipated quickly:

  • labeled events are fired for each label. So if you put 3 labels on a pr it will start 3 workflows + the open event which makes 4. We have the concurrency rule which would shut down all workflows except the newest, but this would still create a lot of noise on the /actions page
  • we use do-not-merge/stale/bugfix etc labels for meta-info. This can become a problem when someone accidentally resets a build someone else is looking at
  • i sometimes go around and add labels to PRs so that they will go the right category on the feature list. Since not all labels are used as conditions, this would make me fixup less since it would cause unnecessary builds just for cosmetics.
  • when someone would bulk label a bunch of PRs at once via the /pulls page it would simply cause too much building at once IMO

how about using the unlocked event as "secret" trigger for the workflow? I just tested it in a sandbox and it worked well

  • I don't think it sends mails (I am not sure, difficult to test with just one account, in fact, let me test this by locking this PR, you can tell me if you got a mail :)), close/reopen does
  • it is less invasive than 'reopened'
  • maybe one day gh will support better ways of doing this. e.g via adding a custom button to the checks or restarting workflows with a fresh context optionally

I assume that most devs will try to label PRs correctly right away or will sync PRs several times anyway during review which gives enough opportunities to update labels. For new contributors and the event of a PR which does not need any changes, reviewers could use this semi-secret trick to trigger a new build after labeling the PR.

@apache apache locked and limited conversation to collaborators Oct 3, 2022
@apache apache unlocked this conversation Oct 3, 2022
@neilcsmith-net
Copy link
Member

No email that I can see. All the options seem pretty ugly.

Is the actual added label in the event? And if so, easy to filter by prefix ci:?

Use gh cli to read the PR labels directly rather than from the event?

Use a comment trigger and filter by text (eg. <RETEST THE DAMN THING>)???

Yep, all ugly. From my perspective, go with what you think makes most sense initially. Can always be tweaked later. Anyway, you're in charge of explaining any secret triggers! 😄

 - extracted global opts
 - enabled xserver for all linux test jobs
 - security (removed git permissions just to be sure)
 - added path to commit validation tests to mac job
 - added retry script for repeating unreliable steps
 - 19 ea -> 19 GA
 - other misc cleanup
 - sig test
 - java hints
 - javafx
 - enterprise
 - conserve resources by skipping long running CI steps
 - ci:no-build for disabling the build entirely
 - ci:all-tests for running all tests
 - category labels: Java, JavaScript, JavaDoc, Platform and PHP for running
   additional jobs to the initial set
 - if the workflow is not triggered by a PR, *everything* will run

note:
  gh does not support env vars in job conditions which leads to
  a bit more duplicated code. Env vars in step conditions work fine.

add manual trigger to start a new workflow with updated label info

 - there is currently no good way to implement this so we use a little trick
 - the PR "unlock" event will be repurposed to trigger a new run
 - this trigger is hopefully rarely needed, but its good to have one

updated PR template with added info to label PRs before creating them.

note:
 gh does not support env vars in job conditions which leads to
 a bit more duplicated code. Env vars in step conditions work fine.
 - triggered by labels: Ant, Gradle, Maven or MX
 - merged extide job into new build-tools job
 - set continue-on-error for unreliable java.mx.project test
@mbien mbien removed the ci:all-tests [ci] enable all tests label Oct 4, 2022
Comment on lines +23 to +24
# unlocked event is used as super secret restart button
types: [opened, synchronize, unlocked]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Lock Conversation" button is on the right panel on PRs. Unlock will start a fresh run without having to sync. I don't think we will need this very often, but its good to have that option.

We are using some fairly new github features, until recently you couldn't restart individual jobs at all. I am sure there will be better ways to implement this one day.

@mbien
Copy link
Member Author

mbien commented Oct 4, 2022

ready to integrate from my side. Added the "unlocked" trigger and put a info line into the PR template.

I think its better to integrate it as is without squashing everything. It makes it easier to check if for example there was a mistake the migration of certain jobs without having to go through everything.

Copy link
Contributor

@matthiasblaesing matthiasblaesing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only eyeballed this and indeed the commits make sense (also to keep them separate). I think we need to "test" this in reality and see how it goes. I like the option to lock/unlock to trigger a build. I'll feel like a "supervillain" when I use the "super secret" restart button.

Copy link
Contributor

@vieiro vieiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI all,
Sorry for a late reply.
I'm afraid I don't fully understand all this, but AFAIU this saves time and energy, to +1 from me.
Since this affects how people approve PRs, maybe we want to add some instructions elsewhere (Confluence?).

@mbien
Copy link
Member Author

mbien commented Oct 5, 2022

  • updated the label descriptions for labels which influence CI. Description text has been prefixed with [ci] which allows filtering.
  • Added a screenshot with an example to the PR.

full list:
https://github.com/apache/netbeans/labels?q=ci

@vieiro I think it is going to be fairly intuitive. Even if you don't label anything there will be still about 3h of CI time spent (maybe we can reduce this further in future).
This won't be a fine grained configuration, Existing PRs which update maven features are already labeled with "Maven" right now. Once this is merged, the same label will also toggle maven related tests which are aggregated in a build tools job.

@mbien
Copy link
Member Author

mbien commented Oct 5, 2022

merging. Going to send the [NOTICE] mail to the dev list too as @neilcsmith-net advised.

@mbien mbien merged commit 7e27290 into apache:master Oct 5, 2022
@mbien mbien mentioned this pull request Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants