[SPARK-46764][DOCS] Reorganize script to build API docs #44791

nchammas · 2024-01-19T02:05:55Z

What changes were proposed in this pull request?

We are abusing Jekyll's plugin system to set flags for what API docs to build. This change maintains this overall status quo but makes the following improvements:

Rename the plugin file to be in line with its actual purpose.
Organize the code into functions so it's easier to follow and understand.
Print section headers that are easier to find in the output when building the docs.

The behavior of the documentation build otherwise remains unchanged.

Why are the changes needed?

This should make maintaining this part of the doc building workflow easier.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

In several ways:

I built the docs with various skip flags set and confirmed the build succeeds. I also manually browsed through some of the Scala, Java, Python, and SQL API docs.
- The only docs I didn't test building were the R docs, because I do not have R installed locally.
- Here is the build output on my machine for SKIP_RDOC=1: build-docs.log.zip
- There are some errors output when building the Java unidoc, but these are present already in master.
I built the docs against master and diffed the resulting _site/ directory against the one output by this branch.
- The _site/ directories are identical except for minor differences in some general SQL function example files.
- Here is the diff: site.diff.zip
I also confirmed that ./dev/change-scala-version.sh 2.13 runs successfully (though I didn't try to run the modified scripts after that).

Was this patch authored or co-authored using generative AI tooling?

No.

nchammas · 2024-01-19T02:08:24Z

In the future, I think it would be appropriate to convert this script into Python, as Ruby is not a focus of our community. (I don't use Ruby much myself, either.) But organizing the code into clear functions is something we'd want to do regardless of the language.

nchammas · 2024-01-19T15:48:14Z

docs/_plugins/build_api_docs.rb

+  print_header "Building Scala and Java API docs."
+  cd(SPARK_PROJECT_ROOT)
+
+  command = "build/sbt -Pkinesis-asl clean compile unidoc"


When we build both the Scala docs as well as either the Python or SQL docs, we end up building Spark twice. In a follow-up improvement, I think it would make sense to build Spark just once if any of the Scala, Python, or SQL docs are requested:

./build/sbt -Pkinesis-asl -Phive clean package

Then, if Scala docs are specifically also requested, we just build the unidoc:

./build/sbt -Pkinesis-asl -Phive unidoc

This should save us around 2 minutes on the complete documentation build.

For now, I'm leaving it like this since that's the current behavior.

I'm also not sure we need the kinesis-asl profile here, as it seems the Kinesis docs are built with or without it. But I'd leave that as-is for now since it's not that much noise and I'm not 100% sure it's useless.

It was added in #2239.

HyukjinKwon · 2024-01-22T00:52:50Z

docs/_plugins/build_api_docs.rb

@@ -0,0 +1,205 @@
+#


We should probably change:

dev/change-scala-version.sh:echo "$BASEDIR/docs/_plugins/copy_api_dirs.rb" dev/change-scala-version.sh: sed_i '/\-Pscala-'$TO_VERSION'/!s:build/sbt:build/sbt \-Pscala\-'$TO_VERSION':' "$BASEDIR/docs/_plugins/copy_api_dirs.rb" dev/change-scala-version.sh: sed_i 's:build/sbt \-Pscala\-'$FROM_VERSION':build/sbt:' "$BASEDIR/docs/_plugins/copy_api_dirs.rb" dev/change-scala-version.sh:sed_i 's/scala\-'$FROM_VERSION'/scala\-'$TO_VERSION'/' "$BASEDIR/docs/_plugins/copy_api_dirs.rb"

together ..

Oh, good catch. Will update.

Fixed and updated the PR description.

HyukjinKwon · 2024-01-22T00:53:32Z

docs/_plugins/build_api_docs.rb

@@ -0,0 +1,205 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more


So is the change basically refactoring?

Yes. The goal is to preserve current behavior while making the code easier to maintain.

nchammas · 2024-01-23T16:39:15Z

I diffed the built _site/ directory across master and this branch. They are identical except for some minor differences in some generated files. I've updated the PR description accordingly with the details.

@HyukjinKwon - I think this PR is good to go. Let me know if there is anything more you'd like me to do here.

HyukjinKwon · 2024-01-23T23:58:40Z

Merged to master.

### What changes were proposed in this pull request? As [suggested here][1], this change improves the documentation build so that it builds Spark at most one time, regardless of what API docs are requested in the build. [1]: #44791 (comment) ### Why are the changes needed? There is no need to build Spark multiple times when generating docs. In particular, building Scala and Python docs, or Scala and SQL docs, causes Spark to be built twice. Fixing this problem saves us a couple of minutes. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I built the docs as follows on `master` as well as on this branch: ```sh time SKIP_RDOC=1 SKIP_PYTHONDOC=1 bundle exec jekyll build ``` The time results before and after this change are as follows: ``` before ------ real 6m48.815s user 23m17.943s sys 1m29.578s after ----- real 4m10.672s user 14m10.130s sys 1m0.773s ``` That's a savings of about 2.5 minutes. Additionally, I diffed the generated `_site/` dir across `master` and this branch and confirmed they are essentially identical except for some general SQL examples files. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44865 from nchammas/SPARK-46825-jekyll-build-spark-once. Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

reorganize script to build api docs

6370d1c

github-actions bot added the DOCS label Jan 19, 2024

nchammas commented Jan 19, 2024

View reviewed changes

HyukjinKwon reviewed Jan 22, 2024

View reviewed changes

nchammas added 2 commits January 21, 2024 20:13

Merge branch 'master' into create-api-docs

3bf2643

update references to Ruby plugin file

04122fa

github-actions bot added the BUILD label Jan 22, 2024

Merge branch 'master' into create-api-docs

3e5130d

HyukjinKwon approved these changes Jan 23, 2024

View reviewed changes

HyukjinKwon closed this in d1fbc4c Jan 23, 2024

nchammas deleted the create-api-docs branch January 24, 2024 00:13

nchammas mentioned this pull request Jan 24, 2024

[SPARK-46825][DOCS] Build Spark only once when building docs #44865

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46764][DOCS] Reorganize script to build API docs #44791

[SPARK-46764][DOCS] Reorganize script to build API docs #44791

nchammas commented Jan 19, 2024 •

edited

nchammas commented Jan 19, 2024

nchammas Jan 19, 2024

nchammas Jan 19, 2024

HyukjinKwon Jan 22, 2024

nchammas Jan 22, 2024

nchammas Jan 22, 2024

HyukjinKwon Jan 22, 2024

nchammas Jan 22, 2024

nchammas commented Jan 23, 2024

HyukjinKwon commented Jan 23, 2024

		@@ -0,0 +1,205 @@
		#
		# Licensed to the Apache Software Foundation (ASF) under one or more

[SPARK-46764][DOCS] Reorganize script to build API docs #44791

[SPARK-46764][DOCS] Reorganize script to build API docs #44791

Conversation

nchammas commented Jan 19, 2024 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

nchammas commented Jan 19, 2024

nchammas Jan 19, 2024

Choose a reason for hiding this comment

nchammas Jan 19, 2024

Choose a reason for hiding this comment

HyukjinKwon Jan 22, 2024

Choose a reason for hiding this comment

nchammas Jan 22, 2024

Choose a reason for hiding this comment

nchammas Jan 22, 2024

Choose a reason for hiding this comment

HyukjinKwon Jan 22, 2024

Choose a reason for hiding this comment

nchammas Jan 22, 2024

Choose a reason for hiding this comment

nchammas commented Jan 23, 2024

HyukjinKwon commented Jan 23, 2024

nchammas commented Jan 19, 2024 •

edited