Skip to content

Conversation

wilzbach
Copy link
Contributor

@wilzbach wilzbach commented Jan 6, 2018

A start to move tests from Travis to Semaphore.
It's amazingly fast:

image

Notes;

  • For now I didn't move the 32-bit runs as they aren't significantly faster on SemaphoreCI and I don't know how much workload can be handled with two workers
  • Both CIs use the same script, so that updates will be easier
  • (edited) We plan to move away from Travis step-by-step as it got rather unreliable and slow

Semaphore is for now configured as follows:

image

(if [ -f semaphoreci.sh ... is done s.t. other PRs don't fail for now)

@wilzbach wilzbach requested a review from MartinNowak as a code owner January 6, 2018 21:14
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @wilzbach!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@wilzbach
Copy link
Contributor Author

wilzbach commented Jan 6, 2018

BTW it's really fast:
Compare a successful Travis PR build (as Travis is still on pending here):

https://travis-ci.org/dlang/dmd/builds/325836886

with this build:

https://semaphoreci.com/cybershadow/dmd/branches/pull-request-7617/builds/1

@wilzbach wilzbach force-pushed the semaphore branch 6 times, most recently from a3edfbf to db71432 Compare January 6, 2018 22:00
@ibuclaw
Copy link
Member

ibuclaw commented Jan 6, 2018

@wilzbach - Have a badge:

Badge

Or a shield. :-P

Shield

@ibuclaw
Copy link
Member

ibuclaw commented Jan 6, 2018

Hmm, looks like Jenkins is building a version of Ocean that doesn't have the contract fix.

@wilzbach wilzbach changed the title Move tests from Travis to SemaphoreCi Move tests from Travis to SemaphoreCI Jan 6, 2018
@wilzbach
Copy link
Contributor Author

wilzbach commented Jan 6, 2018

@wilzbach - Have a badge:

My idea was to

image

This looks a bit boring:

image

@wilzbach
Copy link
Contributor Author

wilzbach commented Jan 6, 2018

Hmm, looks like Jenkins is building a version of Ocean that doesn't have the contract fix.

Yeah I observed this on a couple of other PRs today too
However, I don't know much about the problem, except that rebasing usually fixes the problem.
I did open an issue though: dlang/ci#112

@wilzbach
Copy link
Contributor Author

wilzbach commented Jan 7, 2018

Hmm, looks like Jenkins is building a version of Ocean that doesn't have the contract fix.

Hmm that Jenkins does checkout the version with the fix (see dlang/ci#108):

 > git rev-parse v4.0.0-alpha.4^{commit} # timeout=10
Checking out Revision 0cf1550792992f4d868e0ae7fe67b4025790c40d (v4.0.0-alpha.4)

@ibuclaw
Copy link
Member

ibuclaw commented Jan 7, 2018

All I can see is that the change was pushed to the 3.5 branch only
https://github.com/sociomantic-tsunami/ocean/branches

@ibuclaw
Copy link
Member

ibuclaw commented Jan 7, 2018

As far as I can tell then, all PRs should fail unless ocean 4.x is fixed or we temporarily just go ahead and merge anyway.

@wilzbach
Copy link
Contributor Author

wilzbach commented Jan 7, 2018

As far as I can tell then, all PRs should fail unless ocean 4.x is fixed or we temporarily just go ahead and merge anyway.

Let's see whether dlang/ci#115 works

@dlang-bot dlang-bot merged commit 750400c into dlang:master Jan 7, 2018
@wilzbach wilzbach deleted the semaphore branch January 8, 2018 04:22
@MartinNowak
Copy link
Member

It's amazingly fast:

Well, I guess the main reason for being faster is that they're still new and thus burn more money to grow their user-base.
There aren't many differences between those platforms, and we're heavily exhausting our free usage tier. We should prolly reduce what we run on free CI platforms instead of constantly swapping them.
The more important thing in the middle-term is to cleanup this CI list and make it easier to digest. With 10 different checks we're rather diluting than enriching the important feedback.
In the long-run I'd say we have @braddr's auto-tester, with his great effort at maintaining a sizeable amount of dedicated servers https://auto-tester.puremagic.com/hosts/, https://ci.dlang.org for dub project testing and maybe smaller tasks, and then some linting et.al. on free CI systems.

@wilzbach
Copy link
Contributor Author

With 10 different checks we're rather diluting than enriching the important feedback.

We currently display the following three CodeCov checks:

  • codecov/changes - unexpected changes (there are still some bits in the DMD backend that depend on global memory state, but it's typically unrelated to a PR, very hard to fix and we always set it to "green" anyways atm)
  • codecov/project - the overall coverage change. Typically this isn't interesting either as it again mostly only due to random coverage changes in the backend. If tests get removed, it makes sense to look at this option, but luckily that doesn't happen frequently and even if it's still just one click away
  • codecov/patch - the only coverage option I look at

@ibuclaw
Copy link
Member

ibuclaw commented Jan 15, 2018

Well, I guess the main reason for being faster is that they're still new and thus burn more money to grow their user-base.

They were founded only one year after Travis. So that's no excuse. For sure I think they've not really marketed themselves well unlike Travis.

I picked up semaphore for gdc a few years ago as it was the only platform that could build and run all tests in 25-30 minutes flat. They seem to have cut that down to 15-20 minutes since then.

@MartinNowak
Copy link
Member

Right, we can get the total coverage by clicking details on coverage/patch, so we should only keep coverage/patch.
Certainly looking at https://github.com/dlang/dmd/pulls?page=1&q=is%3Apr+is%3Aclosed shows how bad and flaky our tests are atm.

@wilzbach
Copy link
Contributor Author

Right, we can get the total coverage by clicking details on coverage/patch, so we should only keep coverage/patch.

Already on it: #7711
Total coverage is also displayed in the top bar if the CodeCov extension is installed:

image

shows how bad and flaky our tests are atm.

To be fair, most of these checks have 9/10 because the Travis - even reduced to a sole OSX job - still takes ages to run and is the only CI which isn't enforced yet. I would be happy to get rid of this Travis.
All other CIs are set to enforced since a couple of days and so for the experiment worked quite well.
We run into a couple of issues with dlang/ci sometimes though, e.g. dlang/ci#112, dlang/ci#118, dlang/ci#120

@MartinNowak
Copy link
Member

They were founded only one year after Travis. So that's no excuse. For sure I think they've not really marketed themselves well unlike Travis.

I picked up semaphore for gdc a few years ago as it was the only platform that could build and run all tests in 25-30 minutes flat. They seem to have cut that down to 15-20 minutes since then.

It's nice that they run on dedicated servers, and maybe that works out better business-wise than using expensive on-demand cloud instances. They still have a very limited free-tier, like everyone else, and that likely won't suffice at peak times. So switching only mitigates the problem so much. For the short term we could prolly spread our repos among CIs, but in the long-run I'd rather want to move things back into one or two CI systems per repo.

@MartinNowak
Copy link
Member

To be clear, this might a good move, so thanks for taking the initiative @wilzbach. It's just that it also causes churn and doesn't seem like a long-term solution, hence my reservation.

@wilzbach
Copy link
Contributor Author

It's just that it also causes churn

Understood. Of course, having dedicated machines on which we run the bootstrap builds would be a lot better, but the reason for the switch to Semaphore was that the performance and failures of Travis job. became unbearable in the last weeks.
And I'm really happy how stable & fast Semaphore is (getting it to this, of course, required a bit of dog-fooding).
Here's the dashboard of all open PRs that Semaphore has seen so far:

image
https://semaphoreci.com/wilzbach/dmd-2

The only failing PR build is a true positive - it really has an errors:

image

And do to its fast run time - a contributor now sometimes gets his first feedback from SemaphoreCI.

doesn't seem like a long-term solution

It was never intended to be one.
At least the ci.sh got more generic (it's used both on Travis and Semaphore), s.t. if we ever use dedicated machines for the bootstrapping this script is all they should need to run.

@wilzbach
Copy link
Contributor Author

For reference, here's Travis before #7653:

image
image

@jacob-carlborg
Copy link
Contributor

In the long-run I'd say we have @braddr's auto-tester, with his great effort at maintaining a sizeable amount of dedicated servers https://auto-tester.puremagic.com/hosts/, https://ci.dlang.org for dub project testing and maybe smaller tasks, and then some linting et.al. on free CI systems.

@MartinNowak one problem with the current setup of the auto-tester is that it's only one person that is in control of it. That person becomes a bottleneck for changes, it's also a problem if the person disappears.

@MartinNowak
Copy link
Member

@MartinNowak one problem with the current setup of the auto-tester is that it's only one person that is in control of it. That person becomes a bottleneck for changes, it's also a problem if the person disappears.

That's understood and we could use ci.dlang.org to step-in in case of urgency, but as we knew when we started, Jenkins is a PITA and has a horrible footprint, so that would be an unwelcome move and definitely incur a lot more maintenance overhead than the auto-tester.
If we can move to a better basis in the future, that would be a discussion ground with @braddr, but atm. the auto-tester is better suited than available alternatives.

That person becomes a bottleneck for changes

I doubt that this is such a big problem, we have dedicated targets in the Makefiles that allow for smaller changes, and for bigger environment/dependency updates it's actually good to have a gate-keeper.

@jacob-carlborg
Copy link
Contributor

That's understood and we could use ci.dlang.org to step-in in case of urgency, but as we knew when we started, Jenkins is a PITA and has a horrible footprint, so that would be an unwelcome move and definitely incur a lot more maintenance overhead than the auto-tester.

It's not so much about which software is running more about who can make changes.

I doubt that this is such a big problem, we have dedicated targets in the Makefiles that allow for smaller changes, and for bigger environment/dependency updates it's actually good to have a gate-keeper.

We already have gatekeepers for all projects.

All I'm saying is that every time I suggested to change something, like upgrading a compiler, I got the answer that it's too difficult and to much work because it's only one person in control. Not that we need to stay compatible with the existing compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants