Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Raven testdata to Thredds for Raven tutorial notebooks #72

Merged
merged 26 commits into from
Oct 15, 2020
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
80e6f60
deploy-data: initial version to deploy data from multiple git repos
tlvu Oct 7, 2020
b4b8cb5
deploy-data: detect typo in the keys in config file
tlvu Oct 7, 2020
f16b2c9
deploy-data: make branch field in config file optional
tlvu Oct 7, 2020
9f0eed0
deploy-data: log remote on fetch for future debugging
tlvu Oct 7, 2020
162b5f0
deploy-data: add log output redirection for non-interractive use
tlvu Oct 7, 2020
c8a236a
deploy-data: ensure config yml file is provided and is abs path for d…
tlvu Oct 7, 2020
e6ecc54
deploy-data: use rsync from a docker to be able to run inside another…
tlvu Oct 7, 2020
10485c0
deploy-data: avoid name clash with other docker run of yq
tlvu Oct 7, 2020
cf78f31
deploy-data: rsync content below source_dir only, not including sourc…
tlvu Oct 8, 2020
e1e677b
deploy-data: ensure rsync have write access to the parent folder
tlvu Oct 8, 2020
eb95daf
scheduler: new optional cronjob to deploy Raven testdata to Thredds
tlvu Oct 8, 2020
50a1953
deploy raven testdata cronjob: yml config file path has to exist on h…
tlvu Oct 8, 2020
4e73a85
deploy-data: expose rsync options for include/exlude filter rules
tlvu Oct 8, 2020
0f23b8c
deploy raven testdata cronjob: allow to customize log file name and l…
tlvu Oct 8, 2020
fe41c84
deploy-data: add documentation
tlvu Oct 9, 2020
9d3dd19
env.local: reference optional cronjob to auto deploy Raven testdata t…
tlvu Oct 9, 2020
8936199
README: reference Ouranos specific override repo to demo infra-as-code
tlvu Oct 9, 2020
259f228
deploy-data: add more include and very basic remap example
tlvu Oct 14, 2020
a3937a4
deploy-data: allow checkout cache config to be set in config file
tlvu Oct 14, 2020
f80a614
deploy raven testdata cronjob: allow to provide alternate config file
tlvu Oct 14, 2020
8d76526
deploy-data: support git clone over ssh for private repos
tlvu Oct 14, 2020
91a583b
deploy raven testdata cronjob: support git clone over ssh for private…
tlvu Oct 15, 2020
5e79750
deploy-data: additional error checking for git clone and fetch that r…
tlvu Oct 15, 2020
40081c7
deploy raven testdata cronjob: change checkout cache to under /data f…
tlvu Oct 15, 2020
6fab60a
deploy-data: fix error checking always exit even if no error
tlvu Oct 15, 2020
60b817f
deploy-data: really fix error checking always exit even if no error
tlvu Oct 15, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 4 additions & 1 deletion birdhouse/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,10 @@ for your organization. For an example of possible override, see how the [emu
service](optional-components/emu/docker-compose-extra.yml)
([README](optional-components/README.md)) can be optionally added to the
deployment via the [override
mechanism](https://docs.docker.com/compose/extends/).
mechanism](https://docs.docker.com/compose/extends/). Ouranos specific
override can be found in this
[birdhouse-deploy-ouranos](https://github.com/bird-house/birdhouse-deploy-ouranos)
repo.

The automatic deployment is able to handle multiple repos, so will trigger if
this repo or your private-personalized-config repo changes, giving you
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
##############################################################################
# Configuration vars, set in env.local before sourcing this file.
# This job assume the "scheduler" component is enabled.
##############################################################################

# Cronjob schedule to trigger deployment attempt.
if [ -z "$DEPLOY_RAVEN_TESTDATA_SCHEDULE" ]; then
DEPLOY_RAVEN_TESTDATA_SCHEDULE="*/30 * * * *" # UTC
fi

# Location for local cache of git clone to save bandwidth and time from always
# re-cloning from scratch.
if [ -z "$DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE" ]; then
DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE="/tmp/deploy_raven_testdata_to_thredds_checkout_cache"
fi

# Log file location. Default location under /var/log/PAVICS/ has built-in logrotate.
if [ -z "$DEPLOY_RAVEN_TESTDATA_LOGFILE" ]; then
DEPLOY_RAVEN_TESTDATA_LOGFILE="/var/log/PAVICS/deploy_raven_testdata_to_thredds.log"
fi

##############################################################################
# End configuration vars
##############################################################################


if [ -z "`echo "$AUTODEPLOY_EXTRA_SCHEDULER_JOBS" | grep deploy_raven_testdata_to_thredds`" ]; then

# Add job only if not already added (config is read twice during
# autodeploy process.

LOGFILE_DIRNAME="`dirname "$DEPLOY_RAVEN_TESTDATA_LOGFILE"`"

export AUTODEPLOY_EXTRA_SCHEDULER_JOBS="
$AUTODEPLOY_EXTRA_SCHEDULER_JOBS

- name: deploy_raven_testdata_to_thredds
comment: Auto-deploy Raven testdata to Thredds for Raven tutorial notebooks.
schedule: '$DEPLOY_RAVEN_TESTDATA_SCHEDULE'
command: '/deploy-data ${COMPOSE_DIR}/deployment/deploy-data-raven-testdata-to-thredds.yml'
dockerargs: >-
--rm --name deploy_raven_testdata_to_thredds
--volume /var/run/docker.sock:/var/run/docker.sock:ro
--volume ${COMPOSE_DIR}/deployment/deploy-data:/deploy-data:ro
--volume ${COMPOSE_DIR}/deployment/deploy-data-raven-testdata-to-thredds.yml:${COMPOSE_DIR}/deployment/deploy-data-raven-testdata-to-thredds.yml:ro
--volume ${DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE}:${DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE}:rw
--volume ${LOGFILE_DIRNAME}:${LOGFILE_DIRNAME}:rw
--env DEPLOY_DATA_CHECKOUT_CACHE=${DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE}
--env DEPLOY_DATA_LOGFILE=${DEPLOY_RAVEN_TESTDATA_LOGFILE}
image: 'docker:19.03.6-git'
"

fi
151 changes: 151 additions & 0 deletions birdhouse/deployment/deploy-data
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
#!/bin/sh
# Deploy data from git repo(s) to local folder(s).
#
# See sample input config in deploy-data.config.sample.yml for how to specify
# which git repo(s), which git branch for each repo, which sub-folder(s) to
# sync to which local folder(s) and rsync extra options for each sub-folder.
#
# The git repo clones are cached for faster subsequent runs and rsync is used
# to only modify files that actually changed, to keep the file tree in sync and
# to have include/exclude filter rules. All these options are not available if
# using regular 'cp'.
#
# Docker image is used for yq (yaml file parser) and rsync so this script have
# very few install dependencies (only need docker and git installed locally)
# so it can runs inside very minimalistic image (the 'docker' Docker image).
#
# Setting environment variable DEPLOY_DATA_LOGFILE='/path/to/logfile.log'
# will redirect all STDOUT and STDERR to that logfile so this script will be
# completely silent.
#
# Other self explanatory environment variables DEPLOY_DATA_CHECKOUT_CACHE,
# DEPLOY_DATA_YQ_IMAGE, DEPLOY_DATA_RSYNC_IMAGE.
#

if [ ! -z "$DEPLOY_DATA_LOGFILE" ]; then
exec >>$DEPLOY_DATA_LOGFILE 2>&1
fi


cleanup_on_exit() {
set +x
echo "
datadeploy finished START_TIME=$START_TIME
datadeploy finished END_TIME=`date -Isecond`"
}

trap cleanup_on_exit EXIT


if [ -z "$DEPLOY_DATA_CHECKOUT_CACHE" ]; then
DEPLOY_DATA_CHECKOUT_CACHE="/tmp/deploy-data-clone-cache"
fi

if [ -z "$DEPLOY_DATA_YQ_IMAGE" ]; then
DEPLOY_DATA_YQ_IMAGE="mikefarah/yq:3.3.4"
fi

if [ -z "$DEPLOY_DATA_RSYNC_IMAGE" ]; then
DEPLOY_DATA_RSYNC_IMAGE="eeacms/rsync:2.3"
fi

CONFIG_YML="$1"
if [ -z "$CONFIG_YML" ]; then
echo "ERROR: missing config.yml file" 1>&2
exit 2
else
shift
# Docker volume mount requires absolute path.
CONFIG_YML="`realpath "$CONFIG_YML"`"
fi


yq() {
docker run --rm --name deploy_data_yq -v $CONFIG_YML:$CONFIG_YML:ro $DEPLOY_DATA_YQ_IMAGE yq "$@"
}

# Empty value could mean typo in the keys in the config file.
ensure_not_empty() {
if [ -z "$*" ]; then
echo "ERROR: value empty" 1>&2
exit 1
fi
}


START_TIME="`date -Isecond`"
echo "==========
datadeploy START_TIME=$START_TIME"

set -x

GIT_REPO_URLS="`yq r -p v $CONFIG_YML deploy\[*\].repo_url`"
ensure_not_empty "$GIT_REPO_URLS"
REPO_NUM=0

for GIT_REPO_URL in $GIT_REPO_URLS; do

GIT_BRANCH="`yq r -p v $CONFIG_YML --defaultValue origin/master deploy\[$REPO_NUM\].branch`"
ensure_not_empty "$GIT_BRANCH"
GIT_CHECKOUT_NAME="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].checkout_name`"
ensure_not_empty "$GIT_CHECKOUT_NAME"

CHECKOUT_CACHE="`yq r -p v $CONFIG_YML config.checkout_cache`"
if [ -z "$CHECKOUT_CACHE" ]; then
CHECKOUT_CACHE="$DEPLOY_DATA_CHECKOUT_CACHE"
fi
CLONE_DEST="$CHECKOUT_CACHE/$GIT_CHECKOUT_NAME"
if [ ! -d "$CLONE_DEST" ]; then
echo "checkout repo '$GIT_REPO_URL' on branch '$GIT_BRANCH' to '$CLONE_DEST'"
git clone $GIT_REPO_URL $CLONE_DEST
cd $CLONE_DEST
git checkout $GIT_BRANCH
else
echo "refresh repo '$CLONE_DEST' on branch '$GIT_BRANCH'"
cd $CLONE_DEST
git remote -v # log remote, should match GIT_REPO_URL
git clean -fdx # force, recur dir, also clean .gitignore files and untracked files
git fetch --prune --all
git checkout --force $GIT_BRANCH # force checkout to throwaway local changes
fi

SRC_DIRS="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].dir_maps\[*\].source_dir`"
ensure_not_empty "$SRC_DIRS"
DIR_NUM=0

for SRC_DIR in $SRC_DIRS; do
DEST_DIR="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].dir_maps\[$DIR_NUM\].dest_dir`"
ensure_not_empty "$DEST_DIR"
RSYNC_EXTRA_OPTS="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].dir_maps\[$DIR_NUM\].rsync_extra_opts`"

echo "sync '$SRC_DIR' to '$DEST_DIR'"
DEST_DIR_PARENT="`dirname "$DEST_DIR"`"
SRC_DIR_ABS_PATH="`pwd`/$SRC_DIR"
USER_ID="`id -u`"
GROUP_ID="`id -g`"

# Ensure DEST_DIR_PARENT is created using current USER_ID/GROUP_ID for
# next rsync to have proper write access.
mkdir -p "$DEST_DIR_PARENT"

# Rsync with --checksum to only update file that changed.
docker run --rm --name deploy_data_rsync \
--volume $SRC_DIR_ABS_PATH:$SRC_DIR_ABS_PATH:ro \
--volume $DEST_DIR_PARENT:$DEST_DIR_PARENT:rw \
--user $USER_ID:$GROUP_ID \
--entrypoint /usr/bin/rsync \
$DEPLOY_DATA_RSYNC_IMAGE \
--recursive --links --checksum --delete \
--itemize-changes --human-readable --verbose \
--prune-empty-dirs $RSYNC_EXTRA_OPTS \
$SRC_DIR_ABS_PATH/ $DEST_DIR

DIR_NUM=`expr $DIR_NUM + 1`
done

REPO_NUM=`expr $REPO_NUM + 1`

done


# vi: tabstop=8 expandtab shiftwidth=4 softtabstop=4
11 changes: 11 additions & 0 deletions birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
deploy:
- repo_url: https://github.com/Ouranosinc/raven
# optional, default "origin/master"
# branch:
checkout_name: raven
dir_maps:
# rsync content below source_dir into dest_dir
- source_dir: tests/testdata
dest_dir: /data/datasets/testdata/raven
# only sync .nc files
rsync_extra_opts: --include=*/ --include=*.nc --exclude=*
55 changes: 55 additions & 0 deletions birdhouse/deployment/deploy-data.config.sample.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Sample config file for deploy-data script.
#
# Many git repos are supported. For each repo, many mapping between source dir
# and destination dir are supported. For each mapping, extra rsync option can
# be provided to include/exclude a subset of files to keep in sync.

config:
# optional, default "/tmp/deploy-data-clone-cache"
# can also be set by env var DEPLOY_DATA_CHECKOUT_CACHE
# setting in this config file have precedence over env var
# checkout_cache:

deploy:
- repo_url: https://github.com/Ouranosinc/jenkins-master
# optional, default "origin/master"
# branch:
checkout_name: jenkins-master
dir_maps:
# rsync content below source_dir into dest_dir
- source_dir: initial-jenkins-plugins-suggestion
dest_dir: /tmp/deploy-data-test-deploy/jenkins-plugins
# optional, useful for include/exclude filter rules
# rsync_extra_opts:

- repo_url: https://github.com/Ouranosinc/jenkins-config
branch: origin/master
checkout_name: jenkins-config
dir_maps:
- source_dir: canarie-presentation/
dest_dir: /tmp/deploy-data-test-deploy/canarie
# sync only .txt, .html and .gif files, if other already existing files,
# ignore them, unless they have same extensions.
rsync_extra_opts: --include=*/ --include=*.txt --include=*.html --include=*.gif --exclude=*
- source_dir: jcasc
# remap dir jcasc inside previous dir canarie, without conflicting with
# previous canarie sync. This works because no .txt, .html, .gif in jcasc.
dest_dir: /tmp/deploy-data-test-deploy/canarie/jcasc
rsync_extra_opts:

- repo_url: https://github.com/Ouranosinc/pavics-sdi
# branch:
checkout_name: pavics-sdi
dir_maps:
# sync only 2 sub-dirs and .rst files under source/
- source_dir: docs/
dest_dir: /tmp/deploy-data-test-deploy/pavics-sdi
rsync_extra_opts: --include=*/ --include=source/tutorials/** --include=source/processes/** --include=source/*.rst --exclude=*
# sync only .yml files at the root of checkout
- source_dir: .
dest_dir: /tmp/deploy-data-test-deploy/pavics-sdi
rsync_extra_opts: --include=/ --include=*.yml --exclude=*
# move dir 'notebooks' one level higher in hierarchy
- source_dir: docs/source
dest_dir: /tmp/deploy-data-test-deploy/pavics-sdi
rsync_extra_opts: --include=*/ --include=notebooks/** --exclude=*
12 changes: 12 additions & 0 deletions birdhouse/env.local.example
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,19 @@ export POSTGRES_MAGPIE_PASSWORD=postgres-qwerty
# See the job for additional possible configurations. The "scheduler"
# component needs to be enabled for this pre-configured job to work.
#
#if [ -f "/<absolute path>/components/scheduler/renew_letsencrypt_ssl_cert_extra_job.env" ]; then
#. /<absolute path>/components/scheduler/renew_letsencrypt_ssl_cert_extra_job.env
#fi
#
# Load pre-configured cronjob to automatically deploy Raven testdata to Thredds
# for Raven tutorial notebooks.
#
# See the job for additional possible configurations. The "scheduler"
# component needs to be enabled for this pre-configured job to work.
#
#if [ -f "/<absolute path>/components/scheduler/deploy_raven_testdata_to_thredds.env" ]; then
#. /<absolute path>/components/scheduler/deploy_raven_testdata_to_thredds.env
#fi

# Public (on the internet) fully qualified domain name of this Pavics
# installation. This is optional so default to the same internal PAVICS_FQDN if
Expand Down
6 changes: 5 additions & 1 deletion birdhouse/vagrant-utils/configure-pavics.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,11 @@ RENEW_LETSENCRYPT_SSL_SCHEDULE="22 9 * * *" # UTC
# This repo will be volume-mount at /vagrant so can not go higher.
RENEW_LETSENCRYPT_SSL_NUM_PARENTS_MOUNT="/"

. $PWD/components/scheduler/renew_letsencrypt_ssl_cert_extra_job.env
# Only source if file exist. Allow for config file to be backward-compat with
# older version of the repo where the .env file do not exist yet.
if [ -f "$PWD/components/scheduler/renew_letsencrypt_ssl_cert_extra_job.env" ]; then
. $PWD/components/scheduler/renew_letsencrypt_ssl_cert_extra_job.env
fi
EOF
elif [ -n "$KITENAME" -a -n "$KITESUBDOMAIN" ]; then
cat <<EOF >> env.local
Expand Down