Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Raven testdata to Thredds for Raven tutorial notebooks #72

Merged
merged 26 commits into from Oct 15, 2020

Conversation

tlvu
Copy link
Collaborator

@tlvu tlvu commented Oct 9, 2020

Part of Ouranosinc/raven#185

Leveraging the cron daemon of the scheduler component, sync Raven testdata to Thredds for Raven tutorial notebooks.

Activation of the pre-configured cronjob is via env.local as usual for infra-as-code.

New generic deploy-data script can clone any number of git repos, sync any number of folders in the git repo to any number of local folders, with ability to cherry-pick just the few files needed (Raven testdata has many types of files, we only need to sync .nc files to Thredds, to avoid polluting Thredds storage /data/datasets/testdata/raven).

Limitation of the first version of this deploy-data script:

  • Do not handle re-organizing file layout, this is a pure sync only with very limited rsync filtering for now (tutorial notebooks deploy from multiple repos, need re-organizing the file layout)

So the script has room to grow. I see it as a generic solution to the repeated problem "take files from various git repos and deploy them somewhere automatically". If we need to deploy another repo, juste write a new config file, stop writing boilerplate code again.

Minor unrelated change in this PR:

  • README update to reference the new birdhouse-deploy-ouranos.
  • Make sourcing the various pre-configured cronjob backward-compat with older version of the repo where those cronjob did not exist yet.

tlvu added 17 commits October 7, 2020 10:22
Code is working against included deploy-data.config.sample.yml.
…ost due to volume mount

Fix following error:

+ docker run --rm --name deploy_data_yq -v /deploy-data-raven-testdata-to-thredds.yml:/deploy-data-raven-testdata-to-thredds.yml:ro mikefarah/yq:3.3.4 yq r -p v /deploy-data-raven-testdata-to-thredds.yml '[*].repo_url'
Error: yaml: input error: read /deploy-data-raven-testdata-to-thredds.yml: is a directory
Raven testdata sync log:

+ docker run --rm --name deploy_data_rsync --volume /tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven/tests/testdata:/tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven/tests/testdata:ro --volume /data/datasets/testdata:/data/datasets/testdata:rw --user 0:0 --entrypoint /usr/bin/rsync eeacms/rsync:2.3 --recursive --links --checksum --delete --itemize-changes --human-readable --verbose --prune-empty-dirs '--include=*/' '--include=*.nc' '--exclude=*' /tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven/tests/testdata/ /data/datasets/testdata/raven
building file list ... done
created directory /data/datasets/testdata/raven
cd+++++++++ ./
cd+++++++++ XSS_forecast_data/
>f+++++++++ XSS_forecast_data/XSS_fcst_det.nc
>f+++++++++ XSS_forecast_data/XSS_fcst_ens.nc
>f+++++++++ XSS_forecast_data/XSS_obs.nc
cd+++++++++ cmip5/
>f+++++++++ cmip5/tas_Amon_CanESM2_rcp85_r1i1p1_200601-210012_subset.nc
cd+++++++++ gr4j_cemaneige/
>f+++++++++ gr4j_cemaneige/evap.nc
>f+++++++++ gr4j_cemaneige/pr.nc
>f+++++++++ gr4j_cemaneige/tas.nc
cd+++++++++ hydro_simulations/
>f+++++++++ hydro_simulations/raven-gr4j-cemaneige-sim_gr4jcn-0_Hydrographs.nc
>f+++++++++ hydro_simulations/raven-gr4j-cemaneige-sim_hmets-0_Hydrographs.nc
cd+++++++++ input2d/
>f+++++++++ input2d/input2d.nc
cd+++++++++ ostrich-gr4j-cemaneige/
>f+++++++++ ostrich-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ ostrich-hbv-ec/
>f+++++++++ ostrich-hbv-ec/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ ostrich-hmets/
>f+++++++++ ostrich-hmets/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ ostrich-mohyse/
>f+++++++++ ostrich-mohyse/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ raven-gr4j-cemaneige/
>f+++++++++ raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc
>f+++++++++ raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily_2d.nc
>f+++++++++ raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily_3d.nc
cd+++++++++ ts_stats_outputs/
>f+++++++++ ts_stats_outputs/out.nc

sent 18.72M bytes  received 446 bytes  37.44M bytes/sec
total size is 18.71M  speedup is 1.00

Test delete, modify, add new .nc file:

+ docker run --rm --name deploy_data_rsync --volume /tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven/tests/testdata:/tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven/tests/testdata:ro --volume /data/datasets/testdata:/data/datasets/testdata:rw --user 0:0 --entrypoint /usr/bin/rsync eeacms/rsync:2.3 --recursive --links --checksum --delete --itemize-changes --human-readable --verbose --prune-empty-dirs '--include=*/' '--include=*.nc' '--exclude=*' /tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven/tests/testdata/ /data/datasets/testdata/raven
building file list ... done
*deleting   gr4j_cemaneige/evap.nc
>fcsT...... gr4j_cemaneige/pr.nc
>f+++++++++ gr4j_cemaneige/toto.nc
@tlvu tlvu changed the title Sync Raven testdata to Thredds Sync Raven testdata to Thredds for Raven tutorial notebooks Oct 9, 2020
Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want a second review from someone more competent in bash.

Copy link
Collaborator

@matprov matprov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Other configs need to be already set before config file parsing occur so
can not be set in config file.
… repos

+ docker run --rm --name deploy_data_yq -v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:/vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:ro mikefarah/yq:3.3.4 yq r -p v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml config.git_ssh_identity_file
+ GIT_SSH_IDENTITY_FILE=
+ '[' -z  ]
+ GIT_SSH_IDENTITY_FILE=/home/vagrant/.ssh/id_rsa_git_ssh_read_only
+ '[' '!' -z /home/vagrant/.ssh/id_rsa_git_ssh_read_only ]
+ export 'GIT_SSH_COMMAND=ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentityFile=/home/vagrant/.ssh/id_rsa_git_ssh_read_only'
+ yq r -p v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml 'deploy[*].repo_url'
+ docker run --rm --name deploy_data_yq -v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:/vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:ro mikefarah/yq:3.3.4 yq r -p v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml 'deploy[*].repo_url'
+ GIT_REPO_URLS=git@github.com:Ouranosinc/raven.git
+ ensure_not_empty git@github.com:Ouranosinc/raven.git
+ '[' -z git@github.com:Ouranosinc/raven.git ]
+ REPO_NUM=0
+ yq r -p v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml --defaultValue origin/master 'deploy[0].branch'
+ docker run --rm --name deploy_data_yq -v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:/vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:ro mikefarah/yq:3.3.4 yq r -p v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml --defaultValue origin/master 'deploy[0].branch'
+ GIT_BRANCH=origin/master
+ ensure_not_empty origin/master
+ '[' -z origin/master ]
+ yq r -p v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml 'deploy[0].checkout_name'
+ docker run --rm --name deploy_data_yq -v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:/vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml:ro mikefarah/yq:3.3.4 yq r -p v /vagrant/birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml 'deploy[0].checkout_name'
+ GIT_CHECKOUT_NAME=raven
+ ensure_not_empty raven
+ '[' -z raven ]
+ CLONE_DEST=/tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven
+ '[' '!' -d /tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven ]
+ echo 'checkout repo '"'"'git@github.com:Ouranosinc/raven.git'"'"' on branch '"'"'origin/master'"'"' to '"'"'/tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven'"'"
checkout repo 'git@github.com:Ouranosinc/raven.git' on branch 'origin/master' to '/tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven'
+ git clone git@github.com:Ouranosinc/raven.git /tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven
Cloning into '/tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven'...
Warning: Permanently added 'github.com,140.82.114.4' (RSA) to the list of known hosts.^M
Updating files:  99% (382/383)^MUpdating files: 100% (383/383)^MUpdating files: 100% (383/383), done.
+ cd /tmp/deploy_raven_testdata_to_thredds_checkout_cache/raven
+ git checkout origin/master
Note: switching to 'origin/master'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 58e8978 Merge pull request #279 from Ouranosinc/climatologyESP
…equire credentials

+ git fetch --prune --all
Fetching origin
Host key verification failed.^M
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
+ echo 'git fetch failed'
git fetch failed
+ exit 1
…or persistence

/tmp will disappear on reboot.
+ git clone https://github.com/Ouranosinc/raven /data/deploy_data_cache/deploy_raven_testdata_to_thredds/raven
Cloning into '/data/deploy_data_cache/deploy_raven_testdata_to_thredds/raven'...
Updating files:  89% (344/383)^MUpdating files:  90% (345/383)^MUpdating files:  91% (349/383)^MUpdating files:  92% (353/383)^MUpdating files:  93% (357/383)^MUpdating files:  94% (361/383)^MUpdating files:  95% (364/383)^MUpdating files:  96% (368/383)^MUpdating files:  97% (372/383)^MUpdating files:  98% (376/383)^MUpdating files:  99% (380/383)^MUpdating files: 100% (383/383)^MUpdating files: 100% (383/383), done.
+ exit 1
+ git clone https://github.com/Ouranosinc/raven /data/deploy_data_cache/deploy_raven_testdata_to_thredds/raven
Cloning into '/data/deploy_data_cache/deploy_raven_testdata_to_thredds/raven'...
Updating files: 100% (383/383), done.
+ exit 1
@tlvu tlvu merged commit 5ba68a0 into master Oct 15, 2020
@tlvu tlvu deleted the sync-raven-testdata-to-thredds branch October 15, 2020 02:45
@tlvu
Copy link
Collaborator Author

tlvu commented Oct 15, 2020

Tagged 1.11.4.

Autodeployed to prod:

triggerdeploy finished START_TIME=2020-10-15T05:07:03+0000
triggerdeploy finished   END_TIME=2020-10-15T05:08:45+0000

The following raven testdata are now available at https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/testdata/raven/catalog.html

created directory /data/datasets/testdata/raven
cd+++++++++ ./
cd+++++++++ XSS_forecast_data/
>f+++++++++ XSS_forecast_data/XSS_fcst_det.nc
>f+++++++++ XSS_forecast_data/XSS_fcst_ens.nc
>f+++++++++ XSS_forecast_data/XSS_obs.nc
cd+++++++++ cmip5/
>f+++++++++ cmip5/tas_Amon_CanESM2_rcp85_r1i1p1_200601-210012_subset.nc
cd+++++++++ gr4j_cemaneige/
>f+++++++++ gr4j_cemaneige/evap.nc
>f+++++++++ gr4j_cemaneige/pr.nc
>f+++++++++ gr4j_cemaneige/tas.nc
cd+++++++++ hydro_simulations/
>f+++++++++ hydro_simulations/raven-gr4j-cemaneige-sim_gr4jcn-0_Hydrographs.nc
>f+++++++++ hydro_simulations/raven-gr4j-cemaneige-sim_hmets-0_Hydrographs.nc
cd+++++++++ input2d/
>f+++++++++ input2d/input2d.nc
cd+++++++++ ostrich-gr4j-cemaneige/
>f+++++++++ ostrich-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ ostrich-hbv-ec/
>f+++++++++ ostrich-hbv-ec/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ ostrich-hmets/
>f+++++++++ ostrich-hmets/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ ostrich-mohyse/
>f+++++++++ ostrich-mohyse/Salmon-River-Near-Prince-George_meteo_daily.nc
cd+++++++++ raven-gr4j-cemaneige/
>f+++++++++ raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc
>f+++++++++ raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily_2d.nc
>f+++++++++ raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily_3d.nc
cd+++++++++ ts_stats_outputs/
>f+++++++++ ts_stats_outputs/out.nc

sent 18.72M bytes  received 446 bytes  37.44M bytes/sec
total size is 18.71M  speedup is 1.00
+ expr 0 + 1
+ DIR_NUM=1
+ expr 0 + 1
+ REPO_NUM=1
+ cleanup_on_exit
+ set +x

datadeploy finished START_TIME=2020-10-15T05:30:02+0000
datadeploy finished   END_TIME=2020-10-15T05:30:59+0000

tlvu added a commit that referenced this pull request Dec 17, 2020
…ctions-for-deploy-data

Add ability to execute post actions for deploy-data script.

Script `deploy-data` was previously introduced in PR #72 to deploy any files from any git repos to the local host it runs.

Now it grows the ability to run commands from the git repo it just pulls.

Being able to run commands open new possibilities:
* post-processing after files from git repo are deployed (ex: advanced file re-mapping)
* execute up-to-date scripts from git repos (PR bird-house/birdhouse-deploy-ouranos#2)

Combining this `deploy-data` with the `scheduler` component means we have a way for cronjobs to automatically always execute the most up-to-date version of any scripts from any git repos.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants