Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI woes #27260

Closed
KristofferC opened this issue May 25, 2018 · 19 comments
Closed

CI woes #27260

KristofferC opened this issue May 25, 2018 · 19 comments
Labels
domain:ci Continuous integration status:priority This should be addressed urgently

Comments

@KristofferC
Copy link
Sponsor Member

KristofferC commented May 25, 2018

CI is in a sad state which means a lot of CI gets rerun which increases queues and it is hard to make solid releases with flaky CI. This is an issue to collect the different CI problems:

Travis

Building sysimg hangs

1376.504621 DelimitedFiles  ────────  7.522365
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated
94.756777 lock.jl
994.803373 threads.jl
997.187098 weak
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

The build has been terminated

Example logs: https://travis-ci.org/JuliaLang/julia/jobs/383489913, https://travis-ci.org/JuliaLang/julia/jobs/383452631

Hangs in other places:

970.597421  ccache g++ -m64 -pipe -fPIC -fno-rtti -pedantic -D_FILE_OFFSET_BITS=64   -O0 -ggdb2 -DJL_DEBUG_BUILD -fstack-protector-all -I/home/travis/build/JuliaLang/julia/src -I/home/travis/build/JuliaLang/julia/src -I/home/travis/build/JuliaLang/julia/src/support -I/home/travis/build/JuliaLan

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received

Might be the same issue as the previous one. It is odd that it freezes in the middle of writing a word...

Example log: https://travis-ci.org/JuliaLang/julia/jobs/383458103

AppVeyor

Hitting 2hr time limits

Happens frequently.

Top 15 test groups in terms of time spent (seconds):

LinearAlgebra/triangular     (18) | 478,70
subarray                      (3) | 267,89
loading                      (10) | 234,47
Distributed                   (1) | 225,84
SparseArrays/sparsevector    (17) | 220,87
cmdlineargs                  (14) | 188,56
bitarray                      (8) | 181,23
SparseArrays/sparse          (14) | 179,24
LinearAlgebra/dense          (19) | 163,49
SparseArrays/higherorderfns  (16) | 133,00
LinearAlgebra/diagonal       (21) | 120,63
arrayops                      (5) | 117,70
LinearAlgebra/lu             (24) | 116,70
LinearAlgebra/qr             (19) | 107,32
LinearAlgebra/cholesky       (23) | 103,69

Time to build sysimg:

Non-debug:

Sysimage built. Summary:
Total ─────── 329.318508 seconds 
Base: ───────  82.369702 seconds 25.0122%
Stdlibs: ──── 207.425770 seconds 62.9864%
Precompile: ─  39.519371 seconds 12.0003%

Debug:

Total ─────── 640.483594 seconds 
Base: ─────── 145.372932 seconds 22.6974%
Stdlibs: ──── 406.893176 seconds 63.5291%
Precompile: ─  88.214408 seconds 13.7731%

Example log: https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.27099/job/fevqdpy21ka8btux

CircleCI

Fails getting gfortran

E: Unable to locate package g++-4.8-multilib
E: Couldn't find any package by glob 'g++-4.8-multilib'
E: Couldn't find any package by regex 'g++-4.8-multilib'
E: Unable to locate package gfortran-4.8-multilib
E: Couldn't find any package by glob 'gfortran-4.8-multilib'
E: Couldn't find any package by regex 'gfortran-4.8-multilib'
Exited with code 100

Example log: https://circleci.com/gh/JuliaLang/julia/25927?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Perhaps fixed by #27257.

FreeBSD

Seems fairly solid

@KristofferC KristofferC added status:priority This should be addressed urgently domain:ci Continuous integration labels May 25, 2018
@ViralBShah
Copy link
Member

On Appveyor, it is roughly 35 minutes to build the sysimg, and the rest of the time is to run tests. Turning off the debug build might shave off 10 mins. We could probably run a smaller testsuite. Sharding across multiple appveyor jobs will probably be more complex and overall slow the queue further.

@iblislin
Copy link
Member

For the record, FreeBSD CI ran into some issues like #23143 and randomly freezing in kernel stress testing.

@ViralBShah
Copy link
Member

Do we really start 32 workers on appveyor in tests? Might it be oversubscribing things, or is that by design?

@StefanKarpinski
Copy link
Sponsor Member

Travis change happened suddenly and in response to no corresponding change on our end. Did they maybe water down their VMs again? It's happened several times before with a similar effect every time.

@KristofferC
Copy link
Sponsor Member Author

KristofferC commented May 25, 2018

Do we really start 32 workers on appveyor in tests? Might it be oversubscribing things, or is that by design?

Not at once. When a worker takes too much RSS memory it exit and starts a new worker. 32 is just the total number of workers that got started during all the tests.

@ViralBShah
Copy link
Member

@iblis17 I am just curious what it takes to reproduce your setup on a linux box? Also, perhaps on a windows box?

@ViralBShah
Copy link
Member

I merged #27257. Hopefully that should get circle back in business.

@JeffBezanson
Copy link
Sponsor Member

I think we can turn off the debug builds. Or, we could just build libjulia-debug to make sure the debug build works, but not build the system image in debug mode (since it takes a while and is not really different from the release build).

@ViralBShah
Copy link
Member

ViralBShah commented May 25, 2018

Trying to do the simplest thing in #27263 for Appveyor. This will disable the full debug build. Let's see if it helps.

@ViralBShah
Copy link
Member

Perhaps they are throttling us on travis so that we migrate to the new thing?

https://blog.travis-ci.com/2018-05-02-open-source-projects-on-travis-ci-com-with-github-apps

@iblislin
Copy link
Member

iblislin commented May 25, 2018

@ViralBShah It's just a normal BuildBot setup. I don't think my setup is different from https://build.julialang.org/.

I only spent effort on daily maintenance: first, check zombie/frozen processes and killed them manually (to releasing memory). Not sure why there are some processes cannot be killed by BuildBot.
Second, I will browse the build history. Re-run the false negative build (e.g. the frozen one), if I find any.
I do these work almost everyday.

@Keno
Copy link
Member

Keno commented May 25, 2018

At least for Linux, setting up a BuildBot would be fairly trivial (and we likely have the capacity). Mac is probably more challenging.

@ViralBShah
Copy link
Member

Appveyor is now on increased capacity of 10 concurrent workers, with time allocation of 3 hours.

@KristofferC
Copy link
Sponsor Member Author

Great, only Travis left to figure out then!

@ViralBShah
Copy link
Member

They wrote back saying that they can only help us early next week (which may or may not even be Monday).

@ViralBShah
Copy link
Member

Also, the debug build is 20 minutes. Is that really worthwhile to build?

@martinholters
Copy link
Member

Looking at the history of AV, the last successful run was on 23da960, 6 days ago. It took about 2 hours. Many builds were canceled just after that to have CI time for the alpha release. But every build that ran and didn't fail for other reasons timed out after 3 hours. What's happened there?

@fredrikekre
Copy link
Member

I think that is #27274

@martinholters
Copy link
Member

Especially #27274 (comment), yes. Thanks for the pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:ci Continuous integration status:priority This should be addressed urgently
Projects
None yet
Development

No branches or pull requests

8 participants