Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46668][DOCS] Parallelize Sphinx build of Python API docs #44680

Closed
wants to merge 3 commits into from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Jan 11, 2024

What changes were proposed in this pull request?

Upgrade to Sphinx 4.5.0, which is the latest in the 4.x line and includes the fix for parallel builds on macOS.

Enable parallel Sphinx workers to build the Python API docs.

I experimented with a few different values, and auto seems to work best. Configuring 4 workers seems to yield the same improvement as auto, suggesting parallelization beyond that is ineffective due to some sort of resource contention. But I left it as auto since that's more dynamic and should work better across varied environments.

On my 16-core Intel workstation, the runtime of make html was cut by ~60%.

# `make html` @ master
real    43m51.167s
user    41m43.526s
sys     0m39.651s

# `make html` with parallel workers
real    17m8.424s
user    174m42.051s
sys     5m8.824s

Here on CI, the "Run documentation build" step seems also to have improved from the usual of ~50 minutes down to ~31 minutes.

Why are the changes needed?

This saves developer time and CI time.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

I manually built and reviewed the docs using:

SKIP_SCALADOC=1 SKIP_SQLDOC=1 SKIP_RDOC=1 time bundle exec jekyll build

Was this patch authored or co-authored using generative AI tooling?

No.

@nchammas
Copy link
Contributor Author

cc @itholic and @HyukjinKwon. This PR builds on #44012.

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when CI passing.

Nice fix.

@HyukjinKwon
Copy link
Member

Merged to master.

@nchammas nchammas deleted the SPARK-46668-parallel-sphinx branch January 11, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants