Skip to content

Conversation

@MrBago
Copy link
Contributor

@MrBago MrBago commented Nov 30, 2018

This PR adds doc building to the release script. The script will checking the release tag and build the docs in docker. It will then push the docs branch, eg "gh-pages-1.4.0" upstream & replace the "gh-pages" branch.

I opted for replacing the gh-pages branch instead of creating a PR because this type of PR will inherently have conflicts.

# build the API docs
docker run --rm -v "$(pwd):/mnt/sparkdl" databricks/sparkdl-docs dev/build-docs-in-docker.sh
docker run --rm -v "$(pwd):/mnt/sparkdl" databricks/sparkdl-docs \
bash -i -c dev/build-docs-in-docker.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make a separate PR for this fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can if you think we should, it's tiny so I was just folding it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is tiny but it is a separate fix and it doesn't need to wait for review of the release script to get in. Let's keep it with this PR but create separate PRs in the future to keep each minimal.

Copy link
Contributor

@mengxr mengxr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We need to modify "docs/_config.yml" to update release version there or change "docs/_plugins/copy_api_dirs.rb" to get the version automatically.
  • It would be nice to allow doc build/release independently. One package release must be paired with doc release. However, we might need to do small doc fixes after release.

Both could be done in follow-up PRs.

dev/release.py Outdated
WORKING_BRANCH = "PREP_RELEASE_%s"
DATABRICKS_REMOTE = "git@github.com:databricks/spark-deep-learning.git"
PUBLISH_MODES = {"local": "publishLocal", "m2": "publishM2", "spark-package": "spDist"}
PUBLISH_MODES = {"local": "publishLocal", "m2": "publishM2", "spark-package-local": "spDist",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"spark-package-local", "spark-package-publish" not used

dev/release.py Outdated
DATABRICKS_REMOTE = "git@github.com:databricks/spark-deep-learning.git"
PUBLISH_MODES = {"local": "publishLocal", "m2": "publishM2", "spark-package": "spDist"}
PUBLISH_MODES = {"local": "publishLocal", "m2": "publishM2", "spark-package-local": "spDist",
"spark-package-publish": "sbPublish"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spPublish? If the feature is not implemented, please remove it and keep this PR minimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't quite follow, I think we need spPublish publishing to spark-packages, I can drop spDist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was "sbPublish" instead of "spPublish". My ask is mainly why it shows up in a PR that is titled "doc build"?

dev/release.py Outdated
if working_branch in check_output(["git", "branch"]):
prominentPrint(
"Working branch %s already exists, please delete it and try again." % working_branch)
gh_pages_branch = GH_PAGES_BRANCH % release_version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • rename it to "gh_pages_release_branch" or "gh_pages_working_branch".
  • This is just a working branch. I would append a timestamp at the end to avoid conflict if I need to rerun the script. (Same for working_branch.

dev/release.py Outdated
"Working branch %s already exists, please delete it and try again." % working_branch)
gh_pages_branch = GH_PAGES_BRANCH % release_version
local_branches = check_output(["git", "branch"])
target_branches = [gh_pages_branch, working_branch, "gh-pages"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is common to have "gh-pages" as a local branch. The script doesn't depend on this local branch.

dev/release.py Outdated
gh_pages_branch = GH_PAGES_BRANCH % release_version
local_branches = check_output(["git", "branch"])
target_branches = [gh_pages_branch, working_branch, "gh-pages"]
conflict_branches = list(filter(lambda a: a in local_branches, target_branches))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is flaky. I have a local branch called "gh-pages-bak".

dev/release.py Outdated
prominentPrint(
"Working branch %s already exists, please delete it and try again." % working_branch)
gh_pages_branch = GH_PAGES_BRANCH % release_version
local_branches = check_output(["git", "branch"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all "check_output`, we need ".decode()" in py3. If the script only works for py2, put an assertion at the top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the script, it works with python 2 & 3 now.

msg = "Would you like to push docs branch to remote, %s, and update gh-pages branch? (y/n)"
msg %= git_remote
if verify(msg, interactive):
check_call(["git", "push", git_remote, gh_pages_branch])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to push the working branch? The next command force update the doc branch already. Is it for a quick revert back to this version in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya this is for quick revert in the future, but actually I think it should be a tag not a branch. I've made that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with using a tag is that it shows up in the "releases" section of the github ui. I'll work on this tomorrow.

WORKING_BRANCH = "PREP_RELEASE_%s"
DATABRICKS_REMOTE = "git@github.com:databricks/spark-deep-learning.git"
PUBLISH_MODES = {"local": "publishLocal", "m2": "publishM2", "spark-package": "spDist"}
PUBLISH_MODES = {"local": "publishLocal", "m2": "publishM2", "spark-package-publish": "spPublish"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't know why the changes are relevant to "build and push docs". If they are necessary to include, it is worth an inline GitHub comment here. If they are irrelevant, we should create a separate PR.

Keeping PRs minimal is to help both PR review and merge. For example, I spent ~10 mins yesterday trying to understand how it is relevant to doc build but got no clue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't related to docs, I should have commented sooner.

I realized while working on docs that we'll need this publish method available to use this for an actual release. I can split this into another PR if you think we should.

dev/release.py Outdated
if working_branch in check_output(["git", "branch"]):
prominentPrint(
"Working branch %s already exists, please delete it and try again." % working_branch)
working_branch = WORKING_BRANCH % (release_version, time.hour, time.minute, time.second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is missing date. It might be easier convert the timestamp into a single string first and include it in the template here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use a custom format for the datetime because the default has characters that cannot be used in a git branch name, I'll update the custom format to include the date.

dev/release.py Outdated
prominentPrint(
"Working branch %s already exists, please delete it and try again." % working_branch)
working_branch = WORKING_BRANCH % (release_version, time.hour, time.minute, time.second)
gh_pages_branch = WORKING_DOCS_BRANCH % (release_version, time.hour, time.minute, time.second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@MrBago
Copy link
Contributor Author

MrBago commented Dec 1, 2018

@mengxr I think this is ready for another review, I'm also working on a PR to update the _config.yml file as part of the release. I tried changing docs/_plugins/copy_api_dirs.rb to use version.sbt instead of the config file, but it seems that jekyll reads the config file too, so that only updated some of the version references but not all.

@mengxr
Copy link
Contributor

mengxr commented Dec 1, 2018

The doc release script looks good. I'm merging this PR in. However, there might be an issue with doc build. I cannot find the public APIs in the generated Python API html page. Could you see if this happens on your laptop?

@mengxr mengxr merged commit c176516 into databricks:master Dec 1, 2018
@mengxr
Copy link
Contributor

mengxr commented Dec 1, 2018

This is the error I hit:

/mnt/sparkdl/python/docs/index.rst:15: WARNING: toctree contains reference to nonexisting document u' sparkdl.graph'
/mnt/sparkdl/python/docs/index.rst:15: WARNING: toctree contains reference to nonexisting document u' sparkdl.image'
/mnt/sparkdl/python/docs/index.rst:15: WARNING: toctree contains reference to nonexisting document u' sparkdl.transformers'
/mnt/sparkdl/python/docs/index.rst:15: WARNING: toctree contains reference to nonexisting document u' sparkdl.udf'
/mnt/sparkdl/python/docs/index.rst:15: WARNING: toctree contains reference to nonexisting document u' sparkdl.utils'
WARNING: autodoc: failed to import module u'sparkdl'; the following exception was raised:
No module named tensorframes
WARNING: autodoc: failed to import module u'graph.input' from module u'sparkdl'; the following exception was raised:
No module named tensorframes
WARNING: autodoc: failed to import module u'graph.utils' from module u'sparkdl'; the following exception was raised:
No module named tensorframes
WARNING: autodoc: failed to import module u'image.imageIO' from module u'sparkdl'; the following exception was raised:
No module named tensorframes
WARNING: autodoc: failed to import module u'transformers.utils' from module u'sparkdl'; the following exception was raised:
No module named tensorframes
WARNING: autodoc: failed to import module u'udf.keras_image_model' from module u'sparkdl'; the following exception was raised:
No module named tensorframes
WARNING: autodoc: failed to import module u'utils.keras_model' from module u'sparkdl'; the following exception was raised:
No module named tensorframes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants