Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISPATCH-1739 Add GitHub Action w/ sharding and bubblewrap #809

Closed

Conversation

jiridanek
Copy link
Contributor

This is a CI job triggered by pushes and pull requests sent to GitHub.

Output artifacts include parts of the workspace before running tests; this is
what the machines running test shards get. At the end, each shard outputs
a XML CTest report.

Note that ctest -j2 results in test log lines being mixed up in stdout.
Refer to the output XML report or use grep (grep for lines starting with
the test number) to get understandable logs.

  • Using python 3.6 as this is the version that works on all platforms tested.
  • Using four shards, as adding more does not help much.
  • Using ctest -j2, mostly just to exercise bubblewrap. Bwrap is more useful on Travis

@jiridanek jiridanek self-assigned this Aug 2, 2020
@jiridanek
Copy link
Contributor Author

One issue I know of. For some reason, dispatch skips SASL tests, even though I believe I installed all that's needed

53: test_ssl_sasl_client_invalid (system_tests_ssl.RouterTestSslClient) ... skipped 'Cyrus library not available. skipping test'
53: test_ssl_sasl_client_valid (system_tests_ssl.RouterTestSslClient) ... skipped 'Cyrus library not available. skipping test'

I also couldn't find a way to use here the artifacts that the Proton pipeline is building.

CC @gemmellr

@gemmellr
Copy link
Member

gemmellr commented Aug 4, 2020

Note that ctest -j2 results in test log lines being mixed up in stdout.
Refer to the output XML report or use grep (grep for lines starting with
the test number) to get understandable logs.

Or not use -j2? How much of a difference does it make? You commented later its mainly for excercising bwrap and is more useful on Travis.

Using four shards, as adding more does not help much.

Presumably due to all the setup costs relative to the time spent on tests, or the fact 1 shard is clearly longer than others and maybe still ends up that way?

Also, did you try less than 4? I see one of them has a test run around twice as long as most others, so unless all the long-running tests are perfectly spaced to fall in that shard, is it possible having less shards might actually distribute the test time more evenly such that it isnt much slower overall (if at all; it could even be faster if the tests group better), while giving less different output areas to look at (not personally a huge fan of the sharding, significantly complicates the build config and results, may point to some inefficient tests that it seems needed to begin with)

I also couldn't find a way to use here the artifacts that the Proton pipeline is building.

I think thats fine, possibly even preferable. Its not that lengthy a build, and its likely to eventually want different versions as e.g the Travis tests use.

@jiridanek
Copy link
Contributor Author

jiridanek commented Aug 4, 2020

Or not use -j2?

Probably. I'll leave -DUSE_BWRAP=ON and remove the -j2. On Travis it cuts the total time in half. Here, most time savings is by doing the shards.

Also, did you try less than 4?

Yes, there is about 1 min improvement in total time between 3 and 4. I did not try 2.

There is the Python tox test, that is the most lengthy one. It is essentially running the other tests on py27 and then 36. I was thinking of giving it a shard of its own, but it seemed to me overly elaborate.

.github/workflows/build.yaml Outdated Show resolved Hide resolved
.github/workflows/build.yaml Outdated Show resolved Hide resolved
.github/workflows/build.yaml Outdated Show resolved Hide resolved
@jiridanek jiridanek force-pushed the jd_2020_08_github_action_build branch from 26b6fea to 637b79a Compare August 4, 2020 19:08
@gemmellr
Copy link
Member

gemmellr commented Aug 5, 2020

The test times look a fair bit longer in the latest version, presumably thats as much/more due to removing -j2? Did you also try doing that leaving it with 4 shards first, to see the difference each change made? Maybe using 4 (or more :o) would be better if not using the -j2.

(I dislike the complexity sharding adds, but if having it then it should at least result in much faster runs to offset that. Right now its probbly barely halving the test time, so the overall job still looks well over half the time of a basic singular run)

@jiridanek
Copy link
Contributor Author

jiridanek commented Aug 5, 2020

I am now thinking of these opportunities for speedup

  • fix Proton build to not build test and example programs (either there isn't an option for it, or it does not work, not sure now)
  • set Dispatch build not to build the console, as that takes about 2 minutes, caching npm does not help much, and is already tested on Travis and in Console GH Action
  • run the Action with all reasonable combinations of shard cound and -j, to see what the run times are
  • shard the tests smarter, based on the knowledge of their past duration, with some provision not to forget running newly added tests

Having a target goal for the test runtime would help. I am now thinking of aiming for <10 minutes for the ASan instrumented run.

Using larger -j values is risky as it increases the probability of spurious failures in the tests, caused by timeouts.

Long term, it would help to have more efficient tests. A lot of what's there now is something a QE team would traditionally write; orchestrated end-to-end system tests, except it is better integrated with the build process, and therefore somewhat easier to debug than a stand-alone test suite (possibly running on multiple machines or docker containers) would be.

edit: I like splitting the compile and test phases into separate GHA jobs. I think the complexity from that is actually outweighed by the clear separation of the two phases and the knowledge of what precisely is needed to compile, and what files are needed to run the tests, and the ability to fetch the workspace, replicate environment (There is PPA repository with Python3.6 for Focal) and run what tests I want on my machine from sources compiled in Travis.

@gemmellr
Copy link
Member

gemmellr commented Aug 5, 2020

fix Proton build to not build test and example programs (either there isn't an option for it, or it does not work, not sure now)

Seems reasonable.

set Dispatch build not to build the console, as that takes about 2 minutes, caching npm does not help much, and is already tested on Travis and in Console GH Action

Yep, I had just noticed the main build was caching there, and was actually thinking of suggesting maybe the workflows should be combined. Alternatively, ensuring the build isnt duplicated would make sense if its that slow.

shard the tests smarter, based on the knowledge of their past duration, with some provision not to forget running newly added tests
I'd try to avoid the CI being overly smart here, it tends to lead to hassle or test holes later. I think looking at seperating overly long running tests up in the build itself, or finding if they are burning unecessary time and elmiminating [some of] it, such that there is a more uniform spread in general might make most sense.

Having a target goal for the test runtime would help. I am now thinking of aiming for <10 minutes for the ASan instrumented run.

Hard to say. In part it depends how often folks are going to use it and actually look at it. Thus far I think most folks rely on their local testing and dont much care about the CI. Though maybe thats as it is slower. It might help if the main developers actually chipped in here.

Using larger -j values is risky as it increases the probability of spurious failures in the tests, caused by timeouts.
Yeah, I'd avoid those personally simply due to making the simple logs unreadable :)

edit: I like splitting the compile and test phases into separate GHA jobs. I think the complexity from that is actually outweighed by the clear separation of the two phases and the knowledge of what precisely is needed to compile, and what files are needed to run the tests, and the ability to fetch the workspace, replicate environment (There is PPA repository with Python3.6 for Focal) and run what tests I want on my machine from sources compiled in Travis.

Fair enough. I lean towards simpler CI thats easier to maintain and more like what people using the source do. I'll admit working on mostly Java stuff means I have things easier, generally having little need to consider replicating environments or using compiled-elsewhere bits.

Bubblewrap (https://github.com/containers/bubblewrap) is
an unprivileged sandbox. It allows running tests in parallel,
without any port clashes with already running servers.
@jiridanek jiridanek force-pushed the jd_2020_08_github_action_build branch from 637b79a to 27102c0 Compare August 5, 2020 22:34
@jiridanek
Copy link
Contributor Author

jiridanek commented Aug 5, 2020

Caching npm is not helpful, but caching compiles with ccache is. The Build/Compile step in this PR takes under 2 minutes when the cache is ready.

I've checked few combinations of shard count and thread count. I certainly don't want to go higher than -j2, due to spurious failures on timeouts that happen often with more threads. Keeping the shard count low is good for ergonomic, to have small number of PR jobs. So I am going with two shards and two threads, in the end. It runs for about 12 minutes (build + the slower of the shards), which should be OK for now. The equivalent job on Travis takes 30 minutes.

Skipping Proton tests/examples won't help much, when ccache is already deployed. Skipping the Console build for ASan GH Action is helpful, it saves almost two minutes of the compile time. Partitioning the tests into shards in an intelligent manner (using knowledge of their past duration) would help a lot, but that adds complications. Maybe next time.

This is a CI job triggered by pushes and pull requests sent to GitHub.

Output artifacts include parts of the workspace before running tests; this is
what the machines running test shards get. At the end, each shard outputs
a XML CTest report.

Note that `ctest -j2` results in test log lines being mixed up in stdout.
Refer to the output XML report or use grep (grep for lines starting with
the test number) to get understandable log.

* Using python 3.6 as this is the version that works on all platforms tested.
* Using two shards with -j2, which should balance speed and convenience.
* Using ccache w/ 400MB size limit, to avoid purges during the build.
@jiridanek
Copy link
Contributor Author

Table of test duration for combinations of shard count and -j count. Red numbers mean that at least one shard had at least one failed test.

image

Copy link
Member

@gemmellr gemmellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see you found newlines (between steps), much nicer :)

uses: actions/upload-artifact@v2
if: ${{ ! cancelled() }}
with:
name: Test_Results_${{matrix.os}}_${{matrix.buildType}}_${{matrix.shard}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are to be different runtimeCheck values these will clash. If there arent, maybe drop the related matrix element.

uses: actions/upload-artifact@v2
if: failure()
with:
name: cores_${{matrix.os}}_${{matrix.buildType}}_${{matrix.shard}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@asfgit asfgit closed this in 16ac2c5 Aug 6, 2020
@gemmellr
Copy link
Member

gemmellr commented Aug 6, 2020

I just included the runtimeCheck value to the test names, consistent with the workspace filename used, and pushed the change.

@jiridanek jiridanek deleted the jd_2020_08_github_action_build branch August 6, 2020 14:13
@jiridanek
Copy link
Contributor Author

GH Action on master is done, while Travis has not even started with rat check ;) Beating Travis at speed is not much of an accomplishment, but I'm happy with the result anyways.

@gemmellr
Copy link
Member

gemmellr commented Aug 6, 2020

Give it time ;)

Everyone will flock to GHA, the apache org might have not have a shared quota big enough to avoid huge projects from causing queues for the rest, and then Travis/other could become more responsive. Repeat cycle.

It happens every time something becomes the shiny new CI Env hehe. The ASF Jenkins servers long used to be uselessly overloaded - now they too easily beat Travis / Appveyor most of the time as enough bigger jobs moved away from using Jenkins to using those.

Hence why I like having CI jobs in all the envs. Invariably at any given point at least one of them is backlogged, down, or walking wounded, but others are in a better state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants