Skip to content

Commit

Permalink
[SPARK-46825][DOCS] Build Spark only once when building docs
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

As [suggested here][1], this change improves the documentation build so that it builds Spark at most one time, regardless of what API docs are requested in the build.

[1]: #44791 (comment)

### Why are the changes needed?

There is no need to build Spark multiple times when generating docs. In particular, building Scala and Python docs, or Scala and SQL docs, causes Spark to be built twice.

Fixing this problem saves us a couple of minutes.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I built the docs as follows on `master` as well as on this branch:

```sh
time SKIP_RDOC=1 SKIP_PYTHONDOC=1 bundle exec jekyll build
```

The time results before and after this change are as follows:

```
before
------
real    6m48.815s
user    23m17.943s
sys     1m29.578s

after
-----
real    4m10.672s
user    14m10.130s
sys     1m0.773s
```

That's a savings of about 2.5 minutes.

Additionally, I diffed the generated `_site/` dir across `master` and this branch and confirmed they are essentially identical except for some general SQL examples files.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44865 from nchammas/SPARK-46825-jekyll-build-spark-once.

Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
nchammas authored and HyukjinKwon committed Jan 24, 2024
1 parent ee6ed43 commit 7004dd9
Showing 1 changed file with 19 additions and 17 deletions.
36 changes: 19 additions & 17 deletions docs/_plugins/build_api_docs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,27 @@ def print_header(text)
puts banner_bar
end

def build_spark_if_necessary
if $spark_package_is_built
return
end

print_header "Building Spark."
cd(SPARK_PROJECT_ROOT)
command = "build/sbt -Phive -Pkinesis-asl clean package"
puts "Running '#{command}'; this may take a few minutes..."
system(command) || raise("Failed to build Spark")
$spark_package_is_built = true
end

def build_scala_and_java_docs
build_spark_if_necessary

print_header "Building Scala and Java API docs."
cd(SPARK_PROJECT_ROOT)

command = "build/sbt -Pkinesis-asl clean compile unidoc"
puts "Running '#{command}'; this may take a few minutes..."
command = "build/sbt -Pkinesis-asl unidoc"
puts "Running '#{command}'..."
system(command) || raise("Unidoc generation failed")

puts "Moving back into docs dir."
Expand Down Expand Up @@ -124,19 +139,8 @@ def build_scala_and_java_docs
File.open(css_file, 'a') { |f| f.write("\n" + css.join()) }
end

def build_spark_package
print_header "Building Spark package."
cd(SPARK_PROJECT_ROOT)
command = "build/sbt clean package -Phive"
puts "Running '#{command}'; this may take a few minutes..."
system(command) || raise("Failed to build Spark")
$spark_package_is_built = true
end

def build_python_docs
if !$spark_package_is_built
build_spark_package
end
build_spark_if_necessary

print_header "Building Python API docs."
cd("#{SPARK_PROJECT_ROOT}/python/docs")
Expand Down Expand Up @@ -168,9 +172,7 @@ def build_r_docs
end

def build_sql_docs
if !$spark_package_is_built
build_spark_package
end
build_spark_if_necessary

print_header "Building SQL API docs."
cd("#{SPARK_PROJECT_ROOT}/sql")
Expand Down

0 comments on commit 7004dd9

Please sign in to comment.