Skip to content

ARROW-1299: [Docs] Publish nightly docs build from crossbow#173

Merged
jorisvandenbossche merged 26 commits into
apache:masterfrom
jorisvandenbossche:ARROW-1299-host-dev-docs
Jan 12, 2022
Merged

ARROW-1299: [Docs] Publish nightly docs build from crossbow#173
jorisvandenbossche merged 26 commits into
apache:masterfrom
jorisvandenbossche:ARROW-1299-host-dev-docs

Conversation

@jorisvandenbossche
Copy link
Copy Markdown
Member

No description provided.

@github-actions
Copy link
Copy Markdown

@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-1299-host-dev-docs branch 2 times, most recently from 589b928 to 2196d2d Compare December 15, 2021 15:04
Copy link
Copy Markdown
Member Author

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kszucs @kou what do you think of this approach?

I wanted to used https://github.com/dsaltares/fetch-gh-release-asset to download the docs from the crossbow release, but since INFRA doesn't allow such third-party actions, I copied that in for now.

Comment thread .github/workflows/devdocs.yml Outdated
git add docs/dev --all
git commit -m "Updating dev docs (build ${DATE})"
echo "git push"
git push
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this push is not yet working because that is disallowed by default from pushes (to avoid an infinite loop of actions re-triggering themselves), according to https://stackoverflow.com/a/58393457/653364 (which suggests you need to use a custom personal access token instead of the default one).
I can maybe also use a similar approach as in deploy.yml

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the information is old. It will work.

I have a job that pushes a commit to gh-pages for all push: https://github.com/ruby/csv/blob/master/.github/workflows/test.yml#L96-L135
And it works.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then this should probably work indeed (once it is merged and running on master?)

Comment thread .github/workflows/devdocs.yml Outdated
cd asf-site
echo "$(git log -1)"
git config user.name "$(git log -1 --pretty=format:%an)"
git config user.email "$(git log -1 --pretty=format:%ae)"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets the commit author to the last author on the asf-site branch (so a random committer). Do we care about who exactly is the commit author for this? (it will be the last one of us that committed something to the site)

I first did

          git config user.name "github-actions[bot]"
          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"

to have the commit author be a more anonymous bot, but then that user does not have push rights? (or is there a way to workaround this?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the following:

            git config user.name "github-actions[bot]"
            git config user.email "github-actions[bot]@users.noreply.github.com"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Could you try this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already did I think? (unless I am misunderstanding)

@kszucs
Copy link
Copy Markdown
Member

kszucs commented Dec 15, 2021

@jorisvandenbossche thanks for working on this!

How about replacing the fetch release asset with an archery subcommand? It whould be fairly easy using either github3.py or PyGithub.

Copy link
Copy Markdown
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern: apache/arrow-site repository size will be increased steadily. We may need to remove the previous commit to update dev docs (only if the previous commit is an update dev docs commit) before we update dev docs.

Comment thread .github/workflows/devdocs.yml Outdated
git config user.name "$(git log -1 --pretty=format:%an)"
git config user.email "$(git log -1 --pretty=format:%ae)"
mkdir -p docs/dev
tar -xvzf ../docs.tar.gz -C docs/dev --strip-components=1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace docs/dev/c_glib/index.html with docs/c_glib/index.html after tar -x?
We want to use https://github.com/apache/arrow-site/blob/master/_docs/c_glib/index.md for c_glib/index.html.

See also: https://github.com/apache/arrow/blob/master/dev/release/post-09-docs.sh#L59

Comment thread .github/workflows/devdocs.yml Outdated
Comment on lines +21 to +22
push:
pull_request:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these events?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's just temporary for testing on this PR. After that only the cron job will be sufficient

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.
Could you remove push and pull_request before we merge this?

We can keep pull_request if we use the forked repository's gh-pages branch instead of apache/arrow-site's asf-site branch like our existing .github/workflows/deploy.yml does.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will certainly update this before merging

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep pull_request if we use the forked repository's gh-pages branch instead of apache/arrow-site's asf-site branch like our existing .github/workflows/deploy.yml does.

OK, I see how this is done in deploy.yml. Now, for the docs I think that's probably less relevant, as it is not content from a PR that is being built (as is the case for the main site here), but rather just fetched from elsewhere. But we can always add it later if it would be useful to have.

Comment thread .github/workflows/devdocs.yml Outdated
cd asf-site
echo "$(git log -1)"
git config user.name "$(git log -1 --pretty=format:%an)"
git config user.email "$(git log -1 --pretty=format:%ae)"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the following:

            git config user.name "github-actions[bot]"
            git config user.email "github-actions[bot]@users.noreply.github.com"

git add docs/dev --all
git commit -m "Updating dev docs (build ${DATE})"
echo "git push"
git push
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the information is old. It will work.

I have a job that pushes a commit to gh-pages for all push: https://github.com/ruby/csv/blob/master/.github/workflows/test.yml#L96-L135
And it works.

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

How about replacing the fetch release asset with an archery subcommand? It whould be fairly easy using either github3.py or PyGithub.

I thought you were suggesting to implement this in archery, but it seems it actually already exists! (the archery crossbow download-artifacts and latest-prefix options, testing them out now)

@jorisvandenbossche jorisvandenbossche force-pushed the ARROW-1299-host-dev-docs branch 2 times, most recently from b5fb4dd to b336bc0 Compare December 16, 2021 15:23
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

Downloading the artifacts with archery is working now! (much nicer)
Querying the latest available nightly build number with archery will require apache/arrow#11978

@kszucs
Copy link
Copy Markdown
Member

kszucs commented Dec 16, 2021

I thought you were suggesting to implement this in archery, but it seems it actually already exists!

I were suggesting that, I forgot that we have both upload and download artifacts already implemented :D

Comment thread .github/workflows/devdocs.yml Outdated
@kou
Copy link
Copy Markdown
Member

kou commented Dec 17, 2021

Concern: apache/arrow-site repository size will be increased steadily. We may need to remove the previous commit to update dev docs (only if the previous commit is an update dev docs commit) before we update dev docs.

@jorisvandenbossche @kszucs What do you think about this?

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

jorisvandenbossche commented Dec 23, 2021

Concern: apache/arrow-site repository size will be increased steadily. We may need to remove the previous commit to update dev docs (only if the previous commit is an update dev docs commit) before we update dev docs.

Yes, this is indeed a problem (I raised a similar concern when adding the multiple versions of the docs, as that also steadily increases the repo size). For the dev docs specifically, we of course only need to have the latest version and can thus overwrite / clean-up the git history to avoid increasing the repo size.

The options I was thinking about:

  • In the action, remove the history of just the dev docs directory with something like https://github.com/newren/git-filter-repo/. This rewrites history and thus requires force pushing from inside the action.
  • In the action, we could also completely re-initialize the asf-site branch from the current content + updated dev docs (in the idea that we mostly care about history in the mater branch, and less so in the asf-site branch?). But similarly as above, this requires force pushing from inside the action (so probably not much advantage for this option compared to the one above)
  • Keep the action as is (additional commit, so no need to force push), but from time to time clean-up the asf-site branch and force push manually (we could have a similar script to remove the history in the dev docs directory)

Your idea of removing the previous commit to update dev docs (only if the previous commit is an update dev docs commit) could also be an option, but that will miss some of those commits, and also requires force pushing.

I was a bit hesitant to take any option that requires a force push (so didn't yet add anything in this PR, leaving the manual clean-up from time to time), but maybe that's not actually a problem?

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

I updated this to now also use archery to get the tag of the latest doc build, which seems to work.

Any more thoughts about the comment above (how to handle the increasing size / is it OK to force push from the action?)

Also, would you be fine with going forward with this PR, and give this a try in practice? We can still update it to handle git branch cleaning to reduce history size later in a follow-up, I think.

@kszucs
Copy link
Copy Markdown
Member

kszucs commented Jan 11, 2022

I updated this to now also use archery to get the tag of the latest doc build, which seems to work.

Any more thoughts about the comment above (how to handle the increasing size / is it OK to force push from the action?)

I think force pushing would be sufficient.

Also, would you be fine with going forward with this PR, and give this a try in practice? We can still update it to handle git branch cleaning to reduce history size later in a follow-up, I think.

Agree, though the build is actually failing due to missing permissions.

@jorisvandenbossche
Copy link
Copy Markdown
Member Author

though the build is actually failing due to missing permissions.

According to the comment of @kou above (#173 (comment)), this should work (once merged in master?)
(but honestly no idea if it will actually work)

Copy link
Copy Markdown
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though the build is actually failing due to missing permissions.

According to the comment of @kou above (#173 (comment)), this should work (once merged in master?) (but honestly no idea if it will actually work)

Yes. Pull request should not change the original repository for security reason.

It's not failed on the fork repository:

Also, would you be fine with going forward with this PR, and give this a try in practice? We can still update it to handle git branch cleaning to reduce history size later in a follow-up, I think.

OK.

Comment thread .github/workflows/devdocs.yml Outdated
Comment on lines +43 to +47

- name: Fetch Crossbow branches
run: |
cd crossbow
git fetch origin +refs/heads/*:refs/remotes/origin/*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use fetch-depth: 0 here?

https://github.com/actions/checkout/#Fetch-all-history-for-all-tags-and-branches

Suggested change
- name: Fetch Crossbow branches
run: |
cd crossbow
git fetch origin +refs/heads/*:refs/remotes/origin/*
fetch-depth: 0

Comment thread .github/workflows/devdocs.yml Outdated
Comment on lines +21 to +22
push:
pull_request:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.
Could you remove push and pull_request before we merge this?

We can keep pull_request if we use the forked repository's gh-pages branch instead of apache/arrow-site's asf-site branch like our existing .github/workflows/deploy.yml does.

Comment thread .github/workflows/devdocs.yml Outdated
Comment thread .github/workflows/devdocs.yml Outdated
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

Fixed a few last things (the date in the commit message, also ensured we don't commit the ignored .doctrees), verified that this still works on my fork, and then as last commit removed the push and pull_request triggers.

With that, I am going to merge this, so we can see how this goes in master and further improve if needed.
Thanks both for the feedback!

One thing I already noticed is that, depending on the exact time of the day, it can happen that the latest nightly build tag is already found but still building, and so the download doesn't do anything. If that turns out to be a problem, we could change the cron job to run twice a day, so that at least one of both will download the latest docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants