-
Notifications
You must be signed in to change notification settings - Fork 1.9k
run ballista integration test in CI #688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Integration test took 28m to run, compared to our main Rust test job which took 14m. I think we can cut down the build time by 10m if we avoid rebuilding the base image on every run. I am thinking we can host the pre-built base image with public github docker registry, thoughts? |
|
The build log is a bit unreadable in its current form. Could we trim the output? I went through the build log and I am not sure having a pre-built image helps:
I suggest that we trim the time by caching the cargo build, which has no dependency on docker nor docker registry. |
|
You are right, base image build took around 5 mins, then it's 20 mins for ballista build and 4 mins for running the integration tests. I will look into optimization for the ballista build step. |
|
Thanks @houqp it is great to see this being worked on 🚀 |
|
Turns out release build made a big difference for test run time, in debug mode, job runtime went from 28m to almost 1 hour. |
|
The integration tests are crazy slow because the distributed query execution in Ballista is fundamentally broken (see #707) and fragments of the query are executed multiple times. I am hoping to have this all fixed in the next few weeks (I only have time at weekends to work on this). |
|
Finally got cargo cache working within docker builds, this is way more complicated than I initially expected... Ballista integration test run now completes roughly 1 minute faster than our full Rust workspace test suit when there is a cache hit. I think we should be good for now. As a bonus, subsequent ballista docker build should be a lot faster on local machine as well if buildx cache is enabled. |
|
Thanks @houqp for persevering with this! I will try and review in the next day or two. |
|
converting PR back to draft mode since I noticed buildx just released a native github action backend that we can leverage to keep layer cache size from growing unbounded. i will give https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md#cache-backend-api a try this weekend. |
|
Marking PRs that haven't had activity in over a month as 'stale-pr' to help me filter the list. Please remove the label or let me know if "stale" is not the correct designation |
|
Closing a seemingly stale PR -- please reopen if that was a mistake. |
## Which issue does this PR close? Part of apache#372 and apache#551 ## Rationale for this change To be ready for Spark 4.0 ## What changes are included in this PR? This PR fixes the test that requires to see SparkArithmeticException ## How are these changes tested? Enabled `SPARK-40389: Don't eliminate a cast which can cause overflow`
Which issue does this PR close?
Closes apache/datafusion-ballista#24
Rationale for this change
Avoid ballista integration test regressions automatically.
What changes are included in this PR?
New CI job and updated integration test script to make it runnable within CI.
Merged ballista base docker build with the main docker build so we can leverage buildx cache for the app image build. On top of that, we don't get much value out of dedicated base image if we are not publishing that base image to a public registry.
reference: automata-network/automata#11