Skip to content

Commit

Permalink
ARROW-16901: [R][CI] Prune R nightly builds (#13453)
Browse files Browse the repository at this point in the history
This PR adds pruning to the nightly R upload, 14 versions will be kept by default. 

I have removed the `burnett01/rsync-deployments` actions because the use of docker for this was unnecessary and the action can only upload to a remote. This new manual version also utilizes host key checking for which I created `secrets.NIGHTLIES_RSYNC_HOST_KEY` (which should contain the result of ` ssh-keyscan -H nightlies.apache.org 2> /dev/null` and needs to be added to apache/arrow before this can run).  This way we are no longer depending on the action and it's associated Dockerfile (`drinternet/rsync`). 

We might want to refactor this into a local action for use with all nightly upload jobs.

The pruning is not super efficient as we download the whole nightly repository (on cache miss). This could be avoided for the libarrow files, they could possibly be deleted via ssh instead but we need to download all R packages as `tools::write_PACKAGES` needs access to each archive.

Authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
assignUser committed Jul 6, 2022
1 parent 6c4261e commit 804c08c
Show file tree
Hide file tree
Showing 4 changed files with 285 additions and 14 deletions.
70 changes: 70 additions & 0 deletions .github/actions/sync-nightlies/README.md
@@ -0,0 +1,70 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
--->

# Sync Nightlies
This action can be used to sync directories from/to [nightlies.apache.org] with
rsync. It requires the correct secrets to be in place as described
[below](#usage).
Currently this action is intended to sync the *contents* of `local_path` to
`remote_path` (or vice versa), so a slash will be appended to the source path.
Uploading single files or dirs is not possible directly but only by wrapping
them in an additional directory.

## Inputs
- `upload` Set to `true` to upload from `local_path` to `remote_path`
- `switches` See rsync --help for available switches.
- `local_path` The relative local path within $GITHUB_WORKSPACE
- `remote_path` The remote path incl. sub dirs e.g. {{secrets.path}}/arrow/r.
- `remote_host` The remote host.
- `remote_port` The remote port.
- `remote_user` The remote user.
- `remote_key` The remote ssh key.
- `remote_host_key` The host key fot StrictHostKeyChecking.

## Usage
The secrets have to be set by INFRA, except `secrets.NIGHTLIES_RSYNC_HOST_KEY`
which should contain the result of `ssh-keyscan -H nightlies.apache.org 2>
/dev/null`. This example requires apache/arrow to be checked out in `arrow`.

```yaml
- name: Sync from Remote
uses: ./arrow/.github/actions/sync-nightlies
with:
switches: -avzh --update --delete --progress
local_path: repo
remote_path: ${{ secrets.NIGHTLIES_RSYNC_PATH }}/arrow/r
remote_host: ${{ secrets.NIGHTLIES_RSYNC_HOST }}
remote_port: ${{ secrets.NIGHTLIES_RSYNC_PORT }}
remote_user: ${{ secrets.NIGHTLIES_RSYNC_USER }}
remote_key: ${{ secrets.NIGHTLIES_RSYNC_KEY }}
remote_host_key: ${{ secrets.NIGHTLIES_RSYNC_HOST_KEY }}

- name: Sync to Remote
uses: ./arrow/.github/actions/sync-nightlies
with:
upload: true
switches: -avzh --update --delete --progress
local_path: repo
remote_path: ${{ secrets.NIGHTLIES_RSYNC_PATH }}/arrow/r
remote_host: ${{ secrets.NIGHTLIES_RSYNC_HOST }}
remote_port: ${{ secrets.NIGHTLIES_RSYNC_PORT }}
remote_user: ${{ secrets.NIGHTLIES_RSYNC_USER }}
remote_key: ${{ secrets.NIGHTLIES_RSYNC_KEY }}
remote_host_key: ${{ secrets.NIGHTLIES_RSYNC_HOST_KEY }}
```
95 changes: 95 additions & 0 deletions .github/actions/sync-nightlies/action.yml
@@ -0,0 +1,95 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: 'Sync Nightlies'
description: 'Sync files to and from nightlies.apache.org'
inputs:
upload:
description: 'Sync from local to remote'
default: false
required: false
switches:
description: 'see rsync --help'
required: true
local_path:
description: 'The relative local path within $GITHUB_WORKSPACE'
required: true
remote_path:
description: 'The remote path incl. sub dirs e.g. {{secrets.path}}/arrow/r'
required: true
remote_host:
description: 'The remote host'
required: true
remote_port:
description: 'The remote port'
required: false
default: 22
remote_user:
description: 'The remote user'
required: true
remote_key:
description: 'The remote key'
required: true
remote_host_key:
description: 'The host key for StrictHostKeyChecking'

required: true

runs:
using: "composite"
steps:
- name: Sync files
shell: bash
env:
SWITCHES: "${{ inputs.switches }}"
LOCAL_PATH: "${{ github.workspace }}/${{ inputs.local_path }}"

SSH_KEY: "${{ inputs.remote_key }}"
PORT: "${{ inputs.remote_port }}"
USER: "${{ inputs.remote_user }}"
HOST: "${{ inputs.remote_host }}"
HOST_KEY: "${{ inputs.remote_host_key }}"
REMOTE_PATH: "${{ inputs.remote_path }}"
run: |
# Make SSH key available and add remote to known hosts
eval "$(ssh-agent)" > /dev/null
echo "$SSH_KEY" | tr -d '\r' | ssh-add - >/dev/null
mkdir -p .ssh
chmod go-rwx .ssh
echo "$HOST_KEY" >> .ssh/known_hosts
# strict errors
set -eu
# We have to use a custom RSH to supply the port
RSH="ssh -o UserKnownHostsFile=.ssh/known_hosts -p $PORT"
DSN="$USER@$HOST"
# It is important to append '/' to the source path otherwise
# the entire source dir will be created as a sub dir in the destination
if [ "${{ inputs.upload }}" = true ]
then
SOURCE=$LOCAL_PATH/
DEST=$DSN:$REMOTE_PATH
else
SOURCE=$DSN:$REMOTE_PATH/
DEST=$LOCAL_PATH
fi
rsync $SWITCHES --rsh="$RSH" $SOURCE $DEST
75 changes: 61 additions & 14 deletions .github/workflows/r_nightly.yml
Expand Up @@ -17,17 +17,25 @@

name: Upload R Nightly builds
# This workflow downloads the (nightly) binaries created in crossbow and uploads them
# to nightlies.apache.org. Due to authorization requirements, this upload can't be done

# to nightlies.apache.org. Due to authorization requirements, this upload can't be done
# from the crossbow repository.

# This removes all permissions from the token
permissions:
contents: none

on:
workflow_dispatch:
inputs:
prefix:
description: Job prefix to use.
required: false
default: ''
keep:
description: Number of versions to keep.
required: false
default: 14

schedule:
#Crossbow packaging runs at 0 8 * * *
- cron: '0 14 * * *'
Expand Down Expand Up @@ -78,10 +86,28 @@ jobs:
echo "No files found. Stopping upload."
exit 1
fi
- name: Cache Repo
uses: actions/cache@v3
with:
path: repo
key: r-nightly-${{ github.run_id }}
restore-keys: r-nightly-
- name: Sync from Remote
uses: ./arrow/.github/actions/sync-nightlies
with:
switches: -avzh --update --delete --progress
local_path: repo
remote_path: ${{ secrets.NIGHTLIES_RSYNC_PATH }}/arrow/r
remote_host: ${{ secrets.NIGHTLIES_RSYNC_HOST }}
remote_port: ${{ secrets.NIGHTLIES_RSYNC_PORT }}
remote_user: ${{ secrets.NIGHTLIES_RSYNC_USER }}
remote_key: ${{ secrets.NIGHTLIES_RSYNC_KEY }}
remote_host_key: ${{ secrets.NIGHTLIES_RSYNC_HOST_KEY }}
- run: tree repo
- name: Build Repository
shell: Rscript {0}
run: |
# folder that we rsync to nightlies.apache.org
# folder that we sync to nightlies.apache.org
repo_root <- "repo"
# The binaries are in a nested dir
# so we need to find the correct path.
Expand All @@ -101,18 +127,37 @@ jobs:
# strip superfluous nested dirs
new_paths <- sub(art_path, ".", new_paths)
dirs <- dirname(new_paths)
dir_result <- sapply(dirs, dir.create, recursive = TRUE)
if (!all(dir_result)) {
stop("There was an issue while creating the folders!")
}
sapply(dirs, dir.create, recursive = TRUE, showWarnings = FALSE)
copy_result <- file.copy(current_path, new_paths)
# overwrite allows us to "force push" a new version with the same name
copy_result <- file.copy(current_path, new_paths, overwrite = TRUE)
if (!all(copy_result)) {
stop("There was an issue while copying the files!")
}
- name: Prune Repository
shell: bash
env:
KEEP: ${{ github.event.inputs.keep || 14 }}
run: |
prune() {
# list files | retain $KEEP newest files | delete everything else
ls -t $1/arrow* | tail -n +$((KEEP + 1)) | xargs --no-run-if-empty rm
}
# find leaf sub dirs
repo_dirs=$(find repo -type d -links 2)
# We want to retain $keep (14) versions of each pkg/lib so we call
# prune on each leaf dir and not on repo/.
for dir in ${repo_dirs[@]}; do
prune $dir
done
- name: Update Repository Index
shell: Rscript {0}
run: |
# folder that we sync to nightlies.apache.org
repo_root <- "repo"
tools::write_PACKAGES(file.path(repo_root, "src/contrib"), type = "source", verbose = TRUE)
repo_dirs <- list.dirs(repo_root)
Expand All @@ -125,14 +170,16 @@ jobs:
tools::write_PACKAGES(dir, type = ifelse(on_win, "win.binary", "mac.binary"), verbose = TRUE )
}
- name: Show repo contents
run: ls -R repo
- name: Upload Files
uses: burnett01/rsync-deployments@5.2
run: tree repo
- name: Sync to Remote
uses: ./arrow/.github/actions/sync-nightlies
with:
switches: -avzr
path: repo/*
upload: true
switches: -avzh --update --delete --progress
local_path: repo
remote_path: ${{ secrets.NIGHTLIES_RSYNC_PATH }}/arrow/r
remote_host: ${{ secrets.NIGHTLIES_RSYNC_HOST }}
remote_port: ${{ secrets.NIGHTLIES_RSYNC_PORT }}
remote_user: ${{ secrets.NIGHTLIES_RSYNC_USER }}
remote_key: ${{ secrets.NIGHTLIES_RSYNC_KEY }}
remote_host_key: ${{ secrets.NIGHTLIES_RSYNC_HOST_KEY }}
59 changes: 59 additions & 0 deletions LICENSE.txt
Expand Up @@ -2331,3 +2331,62 @@ The file dev/tasks/r/github.packages.yml contains code from
https://github.com/ursa-labs/arrow-r-nightly

which is made available under the Apache License 2.0.

--------------------------------------------------------------------------------
.github/actions/sync-nightlies/action.yml (some portions)

Some portions of this file are derived from code from

https://github.com/JoshPiper/rsync-docker

which is made available under the MIT license

Copyright (c) 2020 Joshua Piper

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

--------------------------------------------------------------------------------
.github/actions/sync-nightlies/action.yml (some portions)

Some portions of this file are derived from code from

https://github.com/burnett01/rsync-deployments

which is made available under the MIT license

Copyright (c) 2019-2022 Contention
Copyright (c) 2019-2022 Burnett01

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

0 comments on commit 804c08c

Please sign in to comment.