Skip to content

Commit

Permalink
Update xclim - Ensembles without percentiles - Clean up dev config ha…
Browse files Browse the repository at this point in the history
…ndling (#272)

## Overview

This PR fixes #270 
Changes:

* Update to xclim 0.40.0.
- `fit` and `stats` are now `discharge_distribution_fit` and
`discharge_stats`.
- The two above and `freq_analysis` now take `discharge` as input (not
`da`).

* Passing an empty string to `ensemble_percentiles` in ensemble
processes will return the merged un-reduced ensemble. The different
members are listed along the `realization` coordinates through raw names
allowing for basic distinction between the input members.
* No more configuration handled by cli arguments to `finch start`.
Instead of having 2 default config (`default.cfg` and those arguments),
everything is handled by `default.cfg`. This removes the need for the
templates folder and `jinja2`.
* No more replacement of WPS urls in the notebooks. Before, we were
transforming the `localhost:5000` urls to `pavics.ouranos.ca` but
changes in the folder hierarchy on the production server have made this
method a bit awkward.

## Related Issue / Discussion
Ouranosinc/PAVICS-e2e-workflow-tests#116

## Additional Information
The easy solution for populating the `realization` coordinate is to use
the filenames. Those are constructed from the source, so there might be
much more information than necessary. Also, for the multi-scenario case,
I had to explicitly remove the "rcpXX" string from the name to allow
correct concatenation.

A cleaner solution would be to propagate what our `Dataset` class has
parsed from the filename, but that would demand either clean attributes
all along or to pass filenames along the computation in a more
structured manner.
  • Loading branch information
aulemahal committed Jun 2, 2023
2 parents dd35019 + e38c68f commit 1f2dcca
Show file tree
Hide file tree
Showing 19 changed files with 367 additions and 2,353 deletions.
5 changes: 4 additions & 1 deletion CHANGES.rst
Expand Up @@ -2,10 +2,13 @@ Changes
*******

TBD (unreleased)
===================
================
* Fixed iter_local when depth > 0 to avoid all files to be considered twice
* Revised documentation configuration on ReadTheDocs to leverage Anaconda (Mambaforge)
* Minor adjustments to dependency configurations
* Update to xclim 0.40.0.
* Passing an empty string to `ensemble_percentiles` in ensemble processes will return the merged un-reduced ensemble. The different members are listed along the `realization` coordinates through raw names allowing for basic distinction between the input members.
* Removed configuration elements handling from `finch start`. One can still pass custom config files, but all configuration defaults are handled by `finch/default.cfg` and the WSGI function. `jinja2` is not a dependency anymore.

0.10.0 (2022-11-04)
===================
Expand Down
7 changes: 1 addition & 6 deletions Makefile
Expand Up @@ -4,11 +4,6 @@ APP_NAME := finch

WPS_URL = http://localhost:5000

# Used in target refresh-notebooks to make it looks like the notebooks have
# been refreshed from the production server below instead of from the local dev
# instance so the notebooks can also be used as tutorial notebooks.
OUTPUT_URL = https://pavics.ouranos.ca/wpsoutputs/finch

SANITIZE_FILE := https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/raw/master/notebooks/output-sanitize.cfg

# end of configuration
Expand Down Expand Up @@ -136,7 +131,7 @@ lint:
.PHONY: refresh-notebooks
refresh-notebooks:
@echo "Refresh all notebook outputs under docs/source/notebooks"
@bash -c 'for nb in $(CURDIR)/docs/source/notebooks/*.ipynb; do WPS_URL="$(WPS_URL)" jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=60 --output "$$nb" "$$nb"; sed -i "s@$(WPS_URL)/outputs/@$(OUTPUT_URL)/@g" "$$nb"; done; cd $(APP_ROOT)'
@bash -c 'for nb in $(CURDIR)/docs/source/notebooks/*.ipynb; do WPS_URL="$(WPS_URL)" jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=60 --output "$$nb" "$$nb"; done; cd $(APP_ROOT)'

## Sphinx targets

Expand Down
138 changes: 103 additions & 35 deletions docs/source/notebooks/dap_subset.ipynb

Large diffs are not rendered by default.

277 changes: 120 additions & 157 deletions docs/source/notebooks/finch-usage.ipynb

Large diffs are not rendered by default.

2,018 changes: 0 additions & 2,018 deletions docs/source/notebooks/subset.ipynb

This file was deleted.

2 changes: 1 addition & 1 deletion environment-docs.yml
Expand Up @@ -12,4 +12,4 @@ dependencies:
- sphinx >=4.0
- sphinxcontrib-bibtex
- unidecode
- xclim =0.38 # remember to match xclim version in requirements_docs.txt as well
- xclim =0.40 # remember to match xclim version in requirements_docs.txt as well
9 changes: 4 additions & 5 deletions environment.yml
Expand Up @@ -10,12 +10,11 @@ dependencies:
- dask
- distributed
- geopandas
- jinja2
- h5netcdf
- netcdf4
- numpy
- pandas
- parse
- pint <0.20 # xclim < 0.39 is broken by pint >= 0.20
- psutil
- python-slugify
- pywps >=4.5.1
Expand All @@ -25,6 +24,6 @@ dependencies:
- sentry-sdk
- siphon
- unidecode
- xarray >=0.18.2
- xclim =0.38 # remember to match xclim version in requirements_docs.txt as well
- xesmf >=0.6.2
- xarray >=2023.01.0
- xclim =0.40 # remember to match xclim version in requirements_docs.txt as well
- xesmf >=0.7
85 changes: 3 additions & 82 deletions finch/cli.py
Expand Up @@ -11,7 +11,6 @@

import click
import psutil
from jinja2 import Environment, PackageLoader
from pywps import configuration

from . import wsgi
Expand All @@ -20,17 +19,6 @@

CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])

template_env = Environment(loader=PackageLoader("finch", "templates"), autoescape=True)


def write_user_config(**kwargs): # noqa: D103
config_templ = template_env.get_template("pywps.cfg")
rendered_config = config_templ.render(**kwargs)
config_file = os.path.abspath(os.path.join(os.path.curdir, ".custom.cfg"))
with open(config_file, "w") as fp:
fp.write(rendered_config)
return config_file


def get_host(): # noqa: D103
url = configuration.get_config_value("server", "url")
Expand Down Expand Up @@ -126,89 +114,22 @@ def stop():
"--bind-host",
"-b",
metavar="IP-ADDRESS",
default="127.0.0.1",
help="IP address used to bind service.",
)
@click.option("--daemon", "-d", is_flag=True, help="run in daemon mode.")
@click.option(
"--hostname",
metavar="HOSTNAME",
default="localhost",
help="hostname in PyWPS configuration.",
)
@click.option(
"--port", metavar="PORT", default="5000", help="port in PyWPS configuration."
)
@click.option(
"--maxsingleinputsize",
default="200mb",
help="maxsingleinputsize in PyWPS configuration.",
)
@click.option(
"--maxprocesses",
metavar="INT",
default="10",
help="maxprocesses in PyWPS configuration.",
)
@click.option(
"--parallelprocesses",
metavar="INT",
default="2",
help="parallelprocesses in PyWPS configuration.",
)
@click.option(
"--log-level",
metavar="LEVEL",
default="INFO",
help="log level in PyWPS configuration.",
)
@click.option(
"--log-file",
metavar="PATH",
default="pywps.log",
help="log file in PyWPS configuration.",
)
@click.option(
"--database",
default="sqlite:///pywps-logs.sqlite",
help="database in PyWPS configuration",
)
def start(
config,
bind_host,
daemon,
hostname,
port,
maxsingleinputsize,
maxprocesses,
parallelprocesses,
log_level,
log_file,
database,
):
def start(config, bind_host, daemon):
"""Start PyWPS service.
This service is by default available at http://localhost:5000/wps
The default configuration is from finch/default.cfg
"""
if os.path.exists(PID_FILE):
click.echo(f'PID file exists: "{PID_FILE}". Service still running?')
os._exit(0)
cfgfiles = []
cfgfiles.append(
write_user_config(
wps_hostname=hostname,
wps_port=port,
wps_maxsingleinputsize=maxsingleinputsize,
wps_maxprocesses=maxprocesses,
wps_parallelprocesses=parallelprocesses,
wps_log_level=log_level,
wps_log_file=log_file,
wps_database=database,
)
)
if config:
cfgfiles.append(config)
app = wsgi.create_app(cfgfiles)
app = wsgi.create_app(cfgfiles) # Will add default.cfg to the config
# let's start the service ...
# See:
# * https://github.com/geopython/pywps-flask/blob/master/demo.py
Expand Down
2 changes: 1 addition & 1 deletion finch/default.cfg
Expand Up @@ -8,7 +8,7 @@ provider_url=https://finch.readthedocs.org/en/latest/

[server]
url = http://localhost:5000/wps
outputurl = http://localhost:5000/outputs
outputurl = http://localhost:5000/outputs/
allowedinputpaths = /
maxsingleinputsize = 200mb
maxprocesses = 10
Expand Down
7 changes: 6 additions & 1 deletion finch/processes/__init__.py
Expand Up @@ -37,7 +37,12 @@ def filter_func(elem):
ind.realm in realms
and ind.identifier is not None
and name not in exclude
and ind.identifier.upper() == ind._registry_id # official indicator
and (
ind.identifier.upper() == ind._registry_id # official indicator
or ind._registry_id.startswith(
"xclim.core.indicator"
) # oups. Bug for discharge_distribution_fit in xclim 0.40
)
)

out = dict(filter(filter_func, xclim_registry.items()))
Expand Down
28 changes: 23 additions & 5 deletions finch/processes/ensemble_utils.py
Expand Up @@ -12,7 +12,7 @@
import xarray as xr
from pandas.api.types import is_numeric_dtype
from parse import parse
from pywps import FORMATS, ComplexInput, Process, configuration
from pywps import FORMATS, ComplexInput, Process
from pywps.app.exceptions import ProcessError
from pywps.exceptions import InvalidParameterValue
from siphon.catalog import TDSCatalog
Expand Down Expand Up @@ -348,7 +348,9 @@ def make_file_groups(files_list: List[Path], variables: set) -> List[Dict[str, P
def make_ensemble(
files: List[Path], percentiles: List[int], average_dims: Optional[Tuple[str]] = None
) -> None: # noqa: D103
ensemble = ensembles.create_ensemble(files)
ensemble = ensembles.create_ensemble(
files, realizations=[file.stem for file in files]
)
# make sure we have data starting in 1950
ensemble = ensemble.sel(time=(ensemble.time.dt.year >= 1950))

Expand All @@ -361,7 +363,12 @@ def make_ensemble(
if average_dims is not None:
ensemble = ensemble.mean(dim=average_dims)

ensemble_percentiles = ensembles.ensemble_percentiles(ensemble, values=percentiles)
if percentiles:
ensemble_percentiles = ensembles.ensemble_percentiles(
ensemble, values=percentiles
)
else:
ensemble_percentiles = ensemble

# Doy data converted previously is converted back.
for v in ensemble_percentiles.data_vars:
Expand All @@ -374,7 +381,7 @@ def make_ensemble(
# a best effort at working around what looks like a bug in either xclim or xarray.
# The xarray documentation mentions: 'this method can be necessary when working
# with many file objects on disk.'
ensemble_percentiles.load()
# ensemble_percentiles.load()

return ensemble_percentiles

Expand Down Expand Up @@ -516,7 +523,11 @@ def ensemble_common_handler(
if not convert_to_csv:
del process.status_percentage_steps["convert_to_csv"]
percentiles_string = request.inputs["ensemble_percentiles"][0].data
ensemble_percentiles = [int(p.strip()) for p in percentiles_string.split(",")]
ensemble_percentiles = (
[int(p.strip()) for p in percentiles_string.split(",")]
if percentiles_string != "None"
else []
)

write_log(
process,
Expand Down Expand Up @@ -618,6 +629,13 @@ def ensemble_common_handler(

process.set_workdir(str(base_work_dir))

if "realization" in ensembles[0].dims and len(scenarios) > 1:
# For non-reducing calls with multiple scenarios, remove the scenario information from the member name.
for scen, ds in zip(scenarios, ensembles):
ds["realization"] = [
real.replace(scen, "") for real in ds.realization.values
]

ensemble = xr.concat(
ensembles, dim=xr.DataArray(scenarios, dims=("scenario",), name="scenario")
)
Expand Down
2 changes: 1 addition & 1 deletion finch/processes/wps_base.py
Expand Up @@ -154,7 +154,7 @@ def convert_xclim_inputs_to_pywps(
# Only for generic types
data_types = {
InputKind.BOOL: "boolean",
InputKind.QUANTITY_STR: "string",
InputKind.QUANTIFIED: "string",
InputKind.NUMBER: "integer",
InputKind.NUMBER_SEQUENCE: "integer",
InputKind.STRING: "string",
Expand Down
4 changes: 3 additions & 1 deletion finch/processes/wpsio.py
Expand Up @@ -218,7 +218,9 @@ def get_ensemble_inputs(novar=False):
"Ensemble percentiles",
abstract=(
"Ensemble percentiles to calculate for input climate simulations. "
"Accepts a comma separated list of integers."
"Accepts a comma separated list of integers. An empty string will "
"disable the ensemble reduction and the output will have all members "
"along the 'realization' dimension, using the input filenames as coordinates."
),
data_type="string",
default="10,50,90",
Expand Down
27 changes: 0 additions & 27 deletions finch/templates/pywps.cfg

This file was deleted.

4 changes: 1 addition & 3 deletions requirements.txt
@@ -1,13 +1,11 @@
cftime
pywps>=4.5.1
xclim==0.38
pint<=0.19.2 # xclim < 0.39 breaks with pint 0.20
xclim==0.40
xarray>=0.18.2
pandas
xesmf>=0.6.2
clisops>=0.9.3
geopandas
jinja2
click
psutil
bottleneck
Expand Down
1 change: 0 additions & 1 deletion requirements_dev.txt
Expand Up @@ -14,6 +14,5 @@ jupyter_client
# Changing dependencies above this comment will create merge conflicts when updating the cookiecutter template with cruft. Add extra requirements below this line.
birdhouse-birdy>=0.8.1
geojson
h5netcdf
matplotlib
pre-commit
3 changes: 1 addition & 2 deletions requirements_docs.txt
Expand Up @@ -2,8 +2,7 @@ pywps>=4.5.1
sphinx>=4.0
nbsphinx
ipython
xclim==0.38
pint<0.20 # xclim < 0.39 breaks with pint 0.20
xclim==0.40
matplotlib
birdhouse-birdy>=0.8.1
unidecode
Expand Down

0 comments on commit 1f2dcca

Please sign in to comment.