New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46764][DOCS] Reorganize script to build API docs #44791
Conversation
In the future, I think it would be appropriate to convert this script into Python, as Ruby is not a focus of our community. (I don't use Ruby much myself, either.) But organizing the code into clear functions is something we'd want to do regardless of the language. |
print_header "Building Scala and Java API docs." | ||
cd(SPARK_PROJECT_ROOT) | ||
|
||
command = "build/sbt -Pkinesis-asl clean compile unidoc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we build both the Scala docs as well as either the Python or SQL docs, we end up building Spark twice. In a follow-up improvement, I think it would make sense to build Spark just once if any of the Scala, Python, or SQL docs are requested:
./build/sbt -Pkinesis-asl -Phive clean package
Then, if Scala docs are specifically also requested, we just build the unidoc
:
./build/sbt -Pkinesis-asl -Phive unidoc
This should save us around 2 minutes on the complete documentation build.
For now, I'm leaving it like this since that's the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure we need the kinesis-asl
profile here, as it seems the Kinesis docs are built with or without it. But I'd leave that as-is for now since it's not that much noise and I'm not 100% sure it's useless.
It was added in #2239.
@@ -0,0 +1,205 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change:
dev/change-scala-version.sh:echo "$BASEDIR/docs/_plugins/copy_api_dirs.rb"
dev/change-scala-version.sh: sed_i '/\-Pscala-'$TO_VERSION'/!s:build/sbt:build/sbt \-Pscala\-'$TO_VERSION':' "$BASEDIR/docs/_plugins/copy_api_dirs.rb"
dev/change-scala-version.sh: sed_i 's:build/sbt \-Pscala\-'$FROM_VERSION':build/sbt:' "$BASEDIR/docs/_plugins/copy_api_dirs.rb"
dev/change-scala-version.sh:sed_i 's/scala\-'$FROM_VERSION'/scala\-'$TO_VERSION'/' "$BASEDIR/docs/_plugins/copy_api_dirs.rb"
together ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch. Will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and updated the PR description.
@@ -0,0 +1,205 @@ | |||
# | |||
# Licensed to the Apache Software Foundation (ASF) under one or more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the change basically refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The goal is to preserve current behavior while making the code easier to maintain.
I diffed the built @HyukjinKwon - I think this PR is good to go. Let me know if there is anything more you'd like me to do here. |
Merged to master. |
### What changes were proposed in this pull request? As [suggested here][1], this change improves the documentation build so that it builds Spark at most one time, regardless of what API docs are requested in the build. [1]: #44791 (comment) ### Why are the changes needed? There is no need to build Spark multiple times when generating docs. In particular, building Scala and Python docs, or Scala and SQL docs, causes Spark to be built twice. Fixing this problem saves us a couple of minutes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I built the docs as follows on `master` as well as on this branch: ```sh time SKIP_RDOC=1 SKIP_PYTHONDOC=1 bundle exec jekyll build ``` The time results before and after this change are as follows: ``` before ------ real 6m48.815s user 23m17.943s sys 1m29.578s after ----- real 4m10.672s user 14m10.130s sys 1m0.773s ``` That's a savings of about 2.5 minutes. Additionally, I diffed the generated `_site/` dir across `master` and this branch and confirmed they are essentially identical except for some general SQL examples files. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44865 from nchammas/SPARK-46825-jekyll-build-spark-once. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
We are abusing Jekyll's plugin system to set flags for what API docs to build. This change maintains this overall status quo but makes the following improvements:
The behavior of the documentation build otherwise remains unchanged.
Why are the changes needed?
This should make maintaining this part of the doc building workflow easier.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
In several ways:
SKIP_RDOC=1
: build-docs.log.zipmaster
and diffed the resulting_site/
directory against the one output by this branch._site/
directories are identical except for minor differences in some general SQL function example files../dev/change-scala-version.sh 2.13
runs successfully (though I didn't try to run the modified scripts after that).Was this patch authored or co-authored using generative AI tooling?
No.