ARROW-1299: [Docs] Publish nightly docs build from crossbow#173
Conversation
277da62 to
b85c4da
Compare
b85c4da to
55bdd8f
Compare
569fc77 to
6ec620f
Compare
589b928 to
2196d2d
Compare
2196d2d to
3e73c1b
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
@kszucs @kou what do you think of this approach?
I wanted to used https://github.com/dsaltares/fetch-gh-release-asset to download the docs from the crossbow release, but since INFRA doesn't allow such third-party actions, I copied that in for now.
| git add docs/dev --all | ||
| git commit -m "Updating dev docs (build ${DATE})" | ||
| echo "git push" | ||
| git push |
There was a problem hiding this comment.
I think this push is not yet working because that is disallowed by default from pushes (to avoid an infinite loop of actions re-triggering themselves), according to https://stackoverflow.com/a/58393457/653364 (which suggests you need to use a custom personal access token instead of the default one).
I can maybe also use a similar approach as in deploy.yml
There was a problem hiding this comment.
I think that the information is old. It will work.
I have a job that pushes a commit to gh-pages for all push: https://github.com/ruby/csv/blob/master/.github/workflows/test.yml#L96-L135
And it works.
There was a problem hiding this comment.
OK, then this should probably work indeed (once it is merged and running on master?)
| cd asf-site | ||
| echo "$(git log -1)" | ||
| git config user.name "$(git log -1 --pretty=format:%an)" | ||
| git config user.email "$(git log -1 --pretty=format:%ae)" |
There was a problem hiding this comment.
This sets the commit author to the last author on the asf-site branch (so a random committer). Do we care about who exactly is the commit author for this? (it will be the last one of us that committed something to the site)
I first did
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
to have the commit author be a more anonymous bot, but then that user does not have push rights? (or is there a way to workaround this?)
There was a problem hiding this comment.
We can use the following:
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"There was a problem hiding this comment.
I already did I think? (unless I am misunderstanding)
|
@jorisvandenbossche thanks for working on this! How about replacing the fetch release asset with an archery subcommand? It whould be fairly easy using either github3.py or PyGithub. |
kou
left a comment
There was a problem hiding this comment.
Concern: apache/arrow-site repository size will be increased steadily. We may need to remove the previous commit to update dev docs (only if the previous commit is an update dev docs commit) before we update dev docs.
| git config user.name "$(git log -1 --pretty=format:%an)" | ||
| git config user.email "$(git log -1 --pretty=format:%ae)" | ||
| mkdir -p docs/dev | ||
| tar -xvzf ../docs.tar.gz -C docs/dev --strip-components=1 |
There was a problem hiding this comment.
Could you replace docs/dev/c_glib/index.html with docs/c_glib/index.html after tar -x?
We want to use https://github.com/apache/arrow-site/blob/master/_docs/c_glib/index.md for c_glib/index.html.
See also: https://github.com/apache/arrow/blob/master/dev/release/post-09-docs.sh#L59
| push: | ||
| pull_request: |
There was a problem hiding this comment.
No, that's just temporary for testing on this PR. After that only the cron job will be sufficient
There was a problem hiding this comment.
OK.
Could you remove push and pull_request before we merge this?
We can keep pull_request if we use the forked repository's gh-pages branch instead of apache/arrow-site's asf-site branch like our existing .github/workflows/deploy.yml does.
There was a problem hiding this comment.
Yes, will certainly update this before merging
There was a problem hiding this comment.
We can keep
pull_requestif we use the forked repository'sgh-pagesbranch instead ofapache/arrow-site'sasf-sitebranch like our existing.github/workflows/deploy.ymldoes.
OK, I see how this is done in deploy.yml. Now, for the docs I think that's probably less relevant, as it is not content from a PR that is being built (as is the case for the main site here), but rather just fetched from elsewhere. But we can always add it later if it would be useful to have.
| cd asf-site | ||
| echo "$(git log -1)" | ||
| git config user.name "$(git log -1 --pretty=format:%an)" | ||
| git config user.email "$(git log -1 --pretty=format:%ae)" |
There was a problem hiding this comment.
We can use the following:
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"| git add docs/dev --all | ||
| git commit -m "Updating dev docs (build ${DATE})" | ||
| echo "git push" | ||
| git push |
There was a problem hiding this comment.
I think that the information is old. It will work.
I have a job that pushes a commit to gh-pages for all push: https://github.com/ruby/csv/blob/master/.github/workflows/test.yml#L96-L135
And it works.
I thought you were suggesting to implement this in archery, but it seems it actually already exists! (the archery crossbow |
b5fb4dd to
b336bc0
Compare
b336bc0 to
caaa0e8
Compare
|
Downloading the artifacts with archery is working now! (much nicer) |
I were suggesting that, I forgot that we have both upload and download artifacts already implemented :D |
@jorisvandenbossche @kszucs What do you think about this? |
Yes, this is indeed a problem (I raised a similar concern when adding the multiple versions of the docs, as that also steadily increases the repo size). For the dev docs specifically, we of course only need to have the latest version and can thus overwrite / clean-up the git history to avoid increasing the repo size. The options I was thinking about:
Your idea of removing the previous commit to update dev docs (only if the previous commit is an update dev docs commit) could also be an option, but that will miss some of those commits, and also requires force pushing. I was a bit hesitant to take any option that requires a force push (so didn't yet add anything in this PR, leaving the manual clean-up from time to time), but maybe that's not actually a problem? |
|
I updated this to now also use Any more thoughts about the comment above (how to handle the increasing size / is it OK to force push from the action?) Also, would you be fine with going forward with this PR, and give this a try in practice? We can still update it to handle git branch cleaning to reduce history size later in a follow-up, I think. |
I think force pushing would be sufficient.
Agree, though the build is actually failing due to missing permissions. |
According to the comment of @kou above (#173 (comment)), this should work (once merged in master?) |
kou
left a comment
There was a problem hiding this comment.
though the build is actually failing due to missing permissions.
According to the comment of @kou above (#173 (comment)), this should work (once merged in master?) (but honestly no idea if it will actually work)
Yes. Pull request should not change the original repository for security reason.
It's not failed on the fork repository:
- https://github.com/jorisvandenbossche/arrow-site/runs/4773741252?check_suite_focus=true
- https://github.com/jorisvandenbossche/arrow-site/commits/asf-site
Also, would you be fine with going forward with this PR, and give this a try in practice? We can still update it to handle git branch cleaning to reduce history size later in a follow-up, I think.
OK.
|
|
||
| - name: Fetch Crossbow branches | ||
| run: | | ||
| cd crossbow | ||
| git fetch origin +refs/heads/*:refs/remotes/origin/* |
There was a problem hiding this comment.
Can we use fetch-depth: 0 here?
https://github.com/actions/checkout/#Fetch-all-history-for-all-tags-and-branches
| - name: Fetch Crossbow branches | |
| run: | | |
| cd crossbow | |
| git fetch origin +refs/heads/*:refs/remotes/origin/* | |
| fetch-depth: 0 |
| push: | ||
| pull_request: |
There was a problem hiding this comment.
OK.
Could you remove push and pull_request before we merge this?
We can keep pull_request if we use the forked repository's gh-pages branch instead of apache/arrow-site's asf-site branch like our existing .github/workflows/deploy.yml does.
|
Fixed a few last things (the date in the commit message, also ensured we don't commit the ignored With that, I am going to merge this, so we can see how this goes in master and further improve if needed. One thing I already noticed is that, depending on the exact time of the day, it can happen that the latest nightly build tag is already found but still building, and so the download doesn't do anything. If that turns out to be a problem, we could change the cron job to run twice a day, so that at least one of both will download the latest docs. |
No description provided.