Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-13793][docs] build each language in a separate subprocess #9489

Merged
merged 3 commits into from Feb 13, 2020

Conversation

NicoK
Copy link
Contributor

@NicoK NicoK commented Aug 20, 2019

What is the purpose of the change

When not serving docs and just building them, e.g. via ./build_docs.sh, we can spawn sub-processes for each language build instead of running them single-threaded (jekyll currently does not support parallel builds).

before: ~37s
after: ~20s

Brief change log

Building on top of #9487, this PR adds:

  • change build_docs.sh to support parallel builds and merge resulting files together
    (admittedly this seems a bit hacky)
  • enhance build_docs.sh: show usage if wrong option is given
  • exclude build scripts from generated contents

Verifying this change

I verified the changes in the generated HTML pages (nothing changed except for the build scripts which I removed from the generated pages via a hotfix).

@flinkbot
Copy link
Collaborator

flinkbot commented Aug 20, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit a687283 (Wed Dec 04 15:08:41 UTC 2019)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Aug 20, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@NicoK
Copy link
Contributor Author

NicoK commented Nov 14, 2019

I just rebased this PR now that we haveJekyll 4 changes in from #9487.

@NicoK NicoK requested a review from zentol November 28, 2019 10:36
@zentol zentol self-assigned this Dec 12, 2019
@@ -72,6 +72,8 @@ while getopts "piez" opt; do
z)
JEKYLL_CONFIG="--config _config.yml,_config_dev_zh.yml"
;;
*) echo "usage: $0 [-e|-z] [-i|-p]" >&2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is different

@@ -17,3 +17,6 @@

exclude:
- "*.zh.md"
- "content"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these required for correctness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not excluded, they may end up in the generated folder (as yet another subdirectory) again, e.g. when the zh version builds, if would not see the content_en as a target dir (and exclude it)

@@ -78,4 +80,28 @@ while getopts "piez" opt; do
done

# use 'bundle exec' to insert the local Ruby dependencies
bundle exec jekyll ${JEKYLL_CMD} ${JEKYLL_CONFIG} --source "${DOCS_SRC}" --destination "${DOCS_DST}"

if [ "${JEKYLL_CMD}" = "build" ] && [ -z "${JEKYLL_CONFIG}" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we modified the e/z cases above to instead set DOC_LANGUAGES to en/zh we wouldn't need this condition nor else branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a branch for the DOC_LANGUAGES (to get all languages if none of the e/z options was used) but simpler anyway; I'll give it a try

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the branch is still always needed since there is no parallel serve...so not sure whether this change makes sense

@zentol
Copy link
Contributor

zentol commented Feb 6, 2020

@NicoK ping

Copy link
Contributor

@zentol zentol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we started jekyll with --serve for the en documentation, and fire up a background process that builds and copies the chinese docs into DOCS_DST?


# run processes and store pids
echo "Spawning parallel builds for ${DOC_LANGUAGES}..."
pids=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my god is this actually valid in bash? It looks soo much like an incomplete statement.

Copy link
Contributor Author

@NicoK NicoK Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is ;)
but I can also replace it with pids=""

@NicoK
Copy link
Contributor Author

NicoK commented Feb 7, 2020

Serving en and building zh in the background would be a nice trick but the result depends on the timing of when the zh build finishes. If it is first, then the en build will remove the contents directory and replace it with its own (thus removing the zh files) :(

Verified by running this:

set -x
(set -x; bundle exec jekyll build --baseurl= --config _config.yml,_config_dev_zh.yml --source "`pwd`" --destination "`pwd`/content_zh" && mkdir -p "`pwd`/content" && cp -aln "`pwd`/content_zh/." "`pwd`/content" && rm -rf "`pwd`/content_zh") &
bundle exec jekyll serve --baseurl= --watch --incremental --config _config.yml,_config_dev_en.yml --source "`pwd`" --destination "`pwd`/content"

@NicoK
Copy link
Contributor Author

NicoK commented Feb 7, 2020

Ideally, of course, these tricks would not be necessary and jekyll would just use more system resources, i.e. do a parallel build on its own. That would give an even higher speed-up but is, unfortunately, not available, afaik

When not serving docs and just building them, e.g. via `./build_docs.sh`, we
spawn sub-processes for each language build instead of running them
single-threaded (jekyll currently does not support parallel builds).
@NicoK
Copy link
Contributor Author

NicoK commented Feb 7, 2020

FYI: I rebased onto latest master again and verified that the result from building the old way vs. this PR is the same

@NicoK NicoK merged commit fcb443a into apache:master Feb 13, 2020
@NicoK NicoK deleted the f13793 branch February 13, 2020 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants