Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove WPS name in output paths #116

Merged
merged 4 commits into from Feb 22, 2023
Merged

Remove WPS name in output paths #116

merged 4 commits into from Feb 22, 2023

Conversation

aulemahal
Copy link
Collaborator

Remove WPS name from output url for notebook testing

This replaces bird-house/finch#273 as a solution for the change in output paths in wps outputs.

The previous solution was breaking finch's CI. pyWPS wants "outputpath" and "outputurl" to match. This means that if "outputurl" has two levels, "outputpath" must have the same second level. But, when developing and testing (both locally and on the CI) "outputpath" is not explicitly set, thus is the default, thus does not match "outputurl"'s second level. The previous solution was patching the notebooks directly when generating them, but nothing of the sort was done in the test suite, which then failed.

The proper way to fix this in the testing is this sanitizing file, thus I am suggesting to take care of the path issue here directly, and only here.

The related rollback in finch will be done in bird-house/finch#272.

Remove WPS name from output url for notebook testing
@aulemahal aulemahal requested a review from tlvu February 22, 2023 16:59
@aulemahal aulemahal changed the title rm wps name in output Remove WPS name in output paths Feb 22, 2023
@tlvu
Copy link
Contributor

tlvu commented Feb 22, 2023

How about the reversed: /wpsoutputsfinch (one level) replace to /wpsoutputs/finch 2 levels to match production.

So the one level will be transparent for dev local env.

This is so we can refresh the notebook using the production Jupyter env also, in addition to using make.

This means we have to be able to set outputurl by make start which I am not sure which config file it is reading right now. If this can not be done then your option is the only working one.

Also have to handle raven as well, in addition to finch and weaver.

I can work on this. But I'll need your help to test the dev workflow.

@aulemahal
Copy link
Collaborator Author

If the outputurl has only 1 level, I think we do not need to change the outputpath. So your solution works here also, but I don't see the benefits as it's the same number of file changes?

I am assuming the production notebook tests are passing through this sanitizing file too? Then is there a reason for the pushed notebooks to fit the production one exactly?

So for example:

This proposition necessitates cleaning up finch's Makefile and this PR.

This proposition necessitates modifying finch's config and Makefile slightly and an modified version of this PR.

En somme, both proposition are equivalent to me. Unless we want the notebooks to have realistic (but not exact) paths in their output. But I don't understand why this would be important.

@tlvu
Copy link
Contributor

tlvu commented Feb 22, 2023

Your proposition

* Notebook on finch's docs,  with sed :  http://pavics.ouranos.ca/wpsoutputsfinch/RANDOM/out.nc

The sed from make should still produce /wpsoutputs/finch/RANDOM/out.nc

  * Sanitized: http://WPS_URL/wpsoutputs/PLACEHOLDER/out.nc

All Sanitized will be /wpsoutputs/finch/PLACEHOLDER/out.nc

This proposition necessitates modifying finch's config and Makefile slightly and an modified version of this PR.

Agreed

En somme, both proposition are equivalent to me. Unless we want the notebooks to have realistic (but not exact) paths in their output. But I don't understand why this would be important.

Agreed, the difference is for simple output refresh case I can take directly the output produced by Jenkins nightly and commit them straight. With your proposition, to refresh, I always have to make develop and make start and make refresh-notebooks.

@aulemahal
Copy link
Collaborator Author

Haha, I'm still confused. Why couldn't you commit the output of jenkins directly with my proposition? If they passed jenkins, they would pass the CI tests, no?

But also, I'll go with the proposition that demands the less job in any case.

@tlvu
Copy link
Contributor

tlvu commented Feb 22, 2023

But also, I'll go with the proposition that demands the less job in any case.

Exactly. I think I'll just actually try your PR instead of trying to interpret it. Maybe it handles my usecase to straight commit from Jenkins output already so that will be the less work solution :D

Slash was only included in the weaver case.
Copy link
Contributor

@tlvu tlvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good for me, works both on Jenkins and locally in finch repo (make test-notebooks) with the notebook taken straight from Jenkins output.

However, please also handle all the other birds and remove the trailing slash (/wpsoutputs/finch/ should become /wpsoutputs/) removing finch/ including the ending slash.

Thanks for doing it.

@tlvu
Copy link
Contributor

tlvu commented Feb 22, 2023

The related rollback in finch will be done in bird-house/finch#272.

There is no rollback required. Leave the sed that append finch/ to wpsouputs/ since it does not break anything and any output refresh directly from Jenkins output will contain that finch/ anyways. Keep it there for consistency when doing make refresh-notebooks.

@@ -71,17 +71,17 @@ replace: 100% Done | 1.0s

[travis-ci_wps_output_url]
# output_netcdf='http://localhost:5000/outputs/50c0a3f8-67c7-11ea-9e2d-b06ebf31cced/frost-days_SRES-A2-experiment_20460101-20650101.nc
regex: http://localhost:\d+/outputs/(?:(?:finch/)|(?:weaver/))?
regex: http://localhost:\d+/outputs/(?:[a-z-_]*/)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh dash being a special char, I think it might need to be escaped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is pure Python regex, I think it doesn't need to be escaped, but it's less confusing if it is!

@@ -71,17 +71,17 @@ replace: 100% Done | 1.0s

[travis-ci_wps_output_url]
# output_netcdf='http://localhost:5000/outputs/50c0a3f8-67c7-11ea-9e2d-b06ebf31cced/frost-days_SRES-A2-experiment_20460101-20650101.nc
regex: http://localhost:\d+/outputs/(?:[a-z-_]*/)?
regex: http://localhost:\d+/outputs/(?:[a-z\-\_]*/)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, you over do it :D underscore is not a special char but I think there is no harm so we can leave it like that.

@aulemahal aulemahal merged commit dca3028 into master Feb 22, 2023
@aulemahal aulemahal deleted the upd-sanitize branch February 22, 2023 22:42
aulemahal added a commit to bird-house/finch that referenced this pull request Jun 2, 2023
…ndling (#272)

## Overview

This PR fixes #270 
Changes:

* Update to xclim 0.40.0.
- `fit` and `stats` are now `discharge_distribution_fit` and
`discharge_stats`.
- The two above and `freq_analysis` now take `discharge` as input (not
`da`).

* Passing an empty string to `ensemble_percentiles` in ensemble
processes will return the merged un-reduced ensemble. The different
members are listed along the `realization` coordinates through raw names
allowing for basic distinction between the input members.
* No more configuration handled by cli arguments to `finch start`.
Instead of having 2 default config (`default.cfg` and those arguments),
everything is handled by `default.cfg`. This removes the need for the
templates folder and `jinja2`.
* No more replacement of WPS urls in the notebooks. Before, we were
transforming the `localhost:5000` urls to `pavics.ouranos.ca` but
changes in the folder hierarchy on the production server have made this
method a bit awkward.

## Related Issue / Discussion
Ouranosinc/PAVICS-e2e-workflow-tests#116

## Additional Information
The easy solution for populating the `realization` coordinate is to use
the filenames. Those are constructed from the source, so there might be
much more information than necessary. Also, for the multi-scenario case,
I had to explicitly remove the "rcpXX" string from the name to allow
correct concatenation.

A cleaner solution would be to propagate what our `Dataset` class has
parsed from the filename, but that would demand either clean attributes
all along or to pass filenames along the computation in a more
structured manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants