Skip to content

Conversation

@Suryansh-Dey
Copy link

Which issue does this PR close?

None. An improvement.

Rationale for this change

Several GitHub Actions workflows were missing rust-cache, causing redundant recompilation of dependencies and tool installations on every run. This increases CI time and resource usage unnecessarily.

What changes are included in this PR?

Added Swatinem/rust-cache to the following workflows:

Workflow Jobs Modified What's Cached
dev.yml license-header-check, typos hawkeye, typos-cli tool installations
docs.yaml build-docs cargo-depgraph installation
docs_pr.yaml linux-test-doc-build cargo-depgraph installation
extended.yml linux-build-lib, linux-test-extended, hash-collisions, sqllogictest-sqlite Build artifacts and dependencies

Notes:

  • Existing cargo clean steps in extended.yml are preserved to prevent disk space exhaustion on standard GitHub runners
  • Cache is only saved on main branch to avoid polluting the cache with PR-specific builds
  • Used cache-targets: false for tool-only caching jobs to minimize cache size

Are these changes tested?

  • Validated YAML syntax of all modified files
  • Changes follow existing patterns in rust.yml which already uses rust-cache

Are there any user-facing changes?

No - this is a CI-only improvement that should reduce workflow execution time.

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Feb 6, 2026
@alamb
Copy link
Contributor

alamb commented Feb 7, 2026

Thank you @Suryansh-Dey

Can you please measure the impact this change has on build times (with links to the jobs where you measured)?

We have found in the past that these caches actually slow things down

@Suryansh-Dey
Copy link
Author

Suryansh-Dey commented Feb 7, 2026

Here is the Comparison

Without cache

without_cache

source
Here you can see Dependency cache because my commit did add the cache but still its the first time. So it was a cache miss hence it is without cache

With cache

with_cache

source
As you can see with cache, the compiling was never done since we used cache. Because Cache recovering takes less than 15 sec and it save 5m of compiling, it will always help everywhere.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Suryansh-Dey

@Omega359 I think you may have removed some of these caches last time -- do you remember anything else we should look as part of this change?

@alamb
Copy link
Contributor

alamb commented Feb 8, 2026

Thanks for the analysis @Suryansh-Dey -- did you compare the timing of this branch with the timing of what happens on main (without any caching)? I think that is probably the most important comparison

I looked at the most recent build job from main (without any caching) and it also takes about 2 minutes

https://github.com/apache/datafusion/actions/runs/21791124499/job/62870501521

Screenshot 2026-02-08 at 7 22 17 AM

However, it may be on a different runner, so I started the tests on this PR so we can measure again

@Suryansh-Dey
Copy link
Author

Actually the CI you ran was of Rust.yaml which does have Cache. You may see the Rust Dependency Cache there. It was added in an old pr

95e583f9 Dmitrii Blaginin (2025-07-12 16:38):
Improve Ci cache (#16709)

But they were missing in extended.yaml. I guess I used a bad example

@Omega359
Copy link
Contributor

Omega359 commented Feb 8, 2026

Thank you @Suryansh-Dey

@Omega359 I think you may have removed some of these caches last time -- do you remember anything else we should look as part of this change?

CI would run out of disk space iirc

@Omega359
Copy link
Contributor

Omega359 commented Feb 8, 2026

I went through some of the previous PR's related to this. It's a mixed bag, sccache was in for a bit but was yanked because of a security concern I think. Rust cache was in but at the time I think we had issues with disk space and overall I don't think it helped much since the cache hit rate wasn't amazing because of github runners have so little disk space.

@Suryansh-Dey
Copy link
Author

Closing for now till I get something more effective

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

development-process Related to development process of DataFusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants