Update xclim - Ensembles without percentiles - Clean up dev config ha…

…ndling (#272) ## Overview This PR fixes #270 Changes: * Update to xclim 0.40.0. - `fit` and `stats` are now `discharge_distribution_fit` and `discharge_stats`. - The two above and `freq_analysis` now take `discharge` as input (not `da`). * Passing an empty string to `ensemble_percentiles` in ensemble processes will return the merged un-reduced ensemble. The different members are listed along the `realization` coordinates through raw names allowing for basic distinction between the input members. * No more configuration handled by cli arguments to `finch start`. Instead of having 2 default config (`default.cfg` and those arguments), everything is handled by `default.cfg`. This removes the need for the templates folder and `jinja2`. * No more replacement of WPS urls in the notebooks. Before, we were transforming the `localhost:5000` urls to `pavics.ouranos.ca` but changes in the folder hierarchy on the production server have made this method a bit awkward. ## Related Issue / Discussion Ouranosinc/PAVICS-e2e-workflow-tests#116 ## Additional Information The easy solution for populating the `realization` coordinate is to use the filenames. Those are constructed from the source, so there might be much more information than necessary. Also, for the multi-scenario case, I had to explicitly remove the "rcpXX" string from the name to allow correct concatenation. A cleaner solution would be to propagate what our `Dataset` class has parsed from the filename, but that would demand either clean attributes all along or to pass filenames along the computation in a more structured manner.
bird-house · Jun 2, 2023 · 1f2dcca · 1f2dcca
2 parents dd35019 + e38c68f
commit 1f2dcca
Show file tree

Hide file tree

Showing 19 changed files with 367 additions and 2,353 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -2,10 +2,13 @@ Changes
 *******
 
 TBD (unreleased)
-===================
+================
 * Fixed iter_local when depth > 0 to avoid all files to be considered twice
 * Revised documentation configuration on ReadTheDocs to leverage Anaconda (Mambaforge)
 * Minor adjustments to dependency configurations
+* Update to xclim 0.40.0.
+* Passing an empty string to `ensemble_percentiles` in ensemble processes will return the merged un-reduced ensemble. The different members are listed along the `realization` coordinates through raw names allowing for basic distinction between the input members.
+* Removed configuration elements handling from `finch start`. One can still pass custom config files, but all configuration defaults are handled by `finch/default.cfg` and the WSGI function. `jinja2` is not a dependency anymore.
 
 0.10.0 (2022-11-04)
 ===================

diff --git a/Makefile b/Makefile
@@ -4,11 +4,6 @@ APP_NAME := finch
 
 WPS_URL = http://localhost:5000
 
-# Used in target refresh-notebooks to make it looks like the notebooks have
-# been refreshed from the production server below instead of from the local dev
-# instance so the notebooks can also be used as tutorial notebooks.
-OUTPUT_URL = https://pavics.ouranos.ca/wpsoutputs/finch
-
 SANITIZE_FILE := https://github.com/Ouranosinc/PAVICS-e2e-workflow-tests/raw/master/notebooks/output-sanitize.cfg
 
 # end of configuration
@@ -136,7 +131,7 @@ lint:
 .PHONY: refresh-notebooks
 refresh-notebooks:
 	@echo "Refresh all notebook outputs under docs/source/notebooks"
-	@bash -c 'for nb in $(CURDIR)/docs/source/notebooks/*.ipynb; do WPS_URL="$(WPS_URL)" jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=60 --output "$$nb" "$$nb"; sed -i "s@$(WPS_URL)/outputs/@$(OUTPUT_URL)/@g" "$$nb"; done; cd $(APP_ROOT)'
+	@bash -c 'for nb in $(CURDIR)/docs/source/notebooks/*.ipynb; do WPS_URL="$(WPS_URL)" jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=60 --output "$$nb" "$$nb"; done; cd $(APP_ROOT)'
 
 ## Sphinx targets
 

diff --git a/docs/source/notebooks/dap_subset.ipynb b/docs/source/notebooks/dap_subset.ipynb
diff --git a/docs/source/notebooks/finch-usage.ipynb b/docs/source/notebooks/finch-usage.ipynb
diff --git a/docs/source/notebooks/subset.ipynb b/docs/source/notebooks/subset.ipynb
diff --git a/environment-docs.yml b/environment-docs.yml
@@ -12,4 +12,4 @@ dependencies:
   - sphinx >=4.0
   - sphinxcontrib-bibtex
   - unidecode
-  - xclim =0.38  # remember to match xclim version in requirements_docs.txt as well
+  - xclim =0.40  # remember to match xclim version in requirements_docs.txt as well
diff --git a/environment.yml b/environment.yml
@@ -10,12 +10,11 @@ dependencies:
   - dask
   - distributed
   - geopandas
-  - jinja2
+  - h5netcdf
   - netcdf4
   - numpy
   - pandas
   - parse
-  - pint <0.20  # xclim < 0.39 is broken by pint >= 0.20
   - psutil
   - python-slugify
   - pywps >=4.5.1
@@ -25,6 +24,6 @@ dependencies:
   - sentry-sdk
   - siphon
   - unidecode
-  - xarray >=0.18.2
-  - xclim =0.38  # remember to match xclim version in requirements_docs.txt as well
-  - xesmf >=0.6.2
+  - xarray >=2023.01.0
+  - xclim =0.40  # remember to match xclim version in requirements_docs.txt as well
+  - xesmf >=0.7
diff --git a/finch/cli.py b/finch/cli.py
@@ -11,7 +11,6 @@
 
 import click
 import psutil
-from jinja2 import Environment, PackageLoader
 from pywps import configuration
 
 from . import wsgi
@@ -20,17 +19,6 @@
 
 CONTEXT_SETTINGS = dict(help_option_names=["-h", "--help"])
 
-template_env = Environment(loader=PackageLoader("finch", "templates"), autoescape=True)
-
-
-def write_user_config(**kwargs):  # noqa: D103
-    config_templ = template_env.get_template("pywps.cfg")
-    rendered_config = config_templ.render(**kwargs)
-    config_file = os.path.abspath(os.path.join(os.path.curdir, ".custom.cfg"))
-    with open(config_file, "w") as fp:
-        fp.write(rendered_config)
-    return config_file
-
 
 def get_host():  # noqa: D103
     url = configuration.get_config_value("server", "url")
@@ -126,89 +114,22 @@ def stop():
     "--bind-host",
     "-b",
     metavar="IP-ADDRESS",
-    default="127.0.0.1",
     help="IP address used to bind service.",
 )
 @click.option("--daemon", "-d", is_flag=True, help="run in daemon mode.")
-@click.option(
-    "--hostname",
-    metavar="HOSTNAME",
-    default="localhost",
-    help="hostname in PyWPS configuration.",
-)
-@click.option(
-    "--port", metavar="PORT", default="5000", help="port in PyWPS configuration."
-)
-@click.option(
-    "--maxsingleinputsize",
-    default="200mb",
-    help="maxsingleinputsize in PyWPS configuration.",
-)
-@click.option(
-    "--maxprocesses",
-    metavar="INT",
-    default="10",
-    help="maxprocesses in PyWPS configuration.",
-)
-@click.option(
-    "--parallelprocesses",
-    metavar="INT",
-    default="2",
-    help="parallelprocesses in PyWPS configuration.",
-)
-@click.option(
-    "--log-level",
-    metavar="LEVEL",
-    default="INFO",
-    help="log level in PyWPS configuration.",
-)
-@click.option(
-    "--log-file",
-    metavar="PATH",
-    default="pywps.log",
-    help="log file in PyWPS configuration.",
-)
-@click.option(
-    "--database",
-    default="sqlite:///pywps-logs.sqlite",
-    help="database in PyWPS configuration",
-)
-def start(
-    config,
-    bind_host,
-    daemon,
-    hostname,
-    port,
-    maxsingleinputsize,
-    maxprocesses,
-    parallelprocesses,
-    log_level,
-    log_file,
-    database,
-):
+def start(config, bind_host, daemon):
     """Start PyWPS service.
 
     This service is by default available at http://localhost:5000/wps
+    The default configuration is from finch/default.cfg
     """
     if os.path.exists(PID_FILE):
         click.echo(f'PID file exists: "{PID_FILE}". Service still running?')
         os._exit(0)
     cfgfiles = []
-    cfgfiles.append(
-        write_user_config(
-            wps_hostname=hostname,
-            wps_port=port,
-            wps_maxsingleinputsize=maxsingleinputsize,
-            wps_maxprocesses=maxprocesses,
-            wps_parallelprocesses=parallelprocesses,
-            wps_log_level=log_level,
-            wps_log_file=log_file,
-            wps_database=database,
-        )
-    )
     if config:
         cfgfiles.append(config)
-    app = wsgi.create_app(cfgfiles)
+    app = wsgi.create_app(cfgfiles)  # Will add default.cfg to the config
     # let's start the service ...
     # See:
     # * https://github.com/geopython/pywps-flask/blob/master/demo.py

diff --git a/finch/default.cfg b/finch/default.cfg
@@ -8,7 +8,7 @@ provider_url=https://finch.readthedocs.org/en/latest/
 
 [server]
 url = http://localhost:5000/wps
-outputurl = http://localhost:5000/outputs
+outputurl = http://localhost:5000/outputs/
 allowedinputpaths = /
 maxsingleinputsize = 200mb
 maxprocesses = 10

diff --git a/finch/processes/__init__.py b/finch/processes/__init__.py
@@ -37,7 +37,12 @@ def filter_func(elem):
             ind.realm in realms
             and ind.identifier is not None
             and name not in exclude
-            and ind.identifier.upper() == ind._registry_id  # official indicator
+            and (
+                ind.identifier.upper() == ind._registry_id  # official indicator
+                or ind._registry_id.startswith(
+                    "xclim.core.indicator"
+                )  # oups. Bug for discharge_distribution_fit in xclim 0.40
+            )
         )
 
     out = dict(filter(filter_func, xclim_registry.items()))

diff --git a/finch/processes/ensemble_utils.py b/finch/processes/ensemble_utils.py
@@ -12,7 +12,7 @@
 import xarray as xr
 from pandas.api.types import is_numeric_dtype
 from parse import parse
-from pywps import FORMATS, ComplexInput, Process, configuration
+from pywps import FORMATS, ComplexInput, Process
 from pywps.app.exceptions import ProcessError
 from pywps.exceptions import InvalidParameterValue
 from siphon.catalog import TDSCatalog
@@ -348,7 +348,9 @@ def make_file_groups(files_list: List[Path], variables: set) -> List[Dict[str, P
 def make_ensemble(
     files: List[Path], percentiles: List[int], average_dims: Optional[Tuple[str]] = None
 ) -> None:  # noqa: D103
-    ensemble = ensembles.create_ensemble(files)
+    ensemble = ensembles.create_ensemble(
+        files, realizations=[file.stem for file in files]
+    )
     # make sure we have data starting in 1950
     ensemble = ensemble.sel(time=(ensemble.time.dt.year >= 1950))
 
@@ -361,7 +363,12 @@ def make_ensemble(
     if average_dims is not None:
         ensemble = ensemble.mean(dim=average_dims)
 
-    ensemble_percentiles = ensembles.ensemble_percentiles(ensemble, values=percentiles)
+    if percentiles:
+        ensemble_percentiles = ensembles.ensemble_percentiles(
+            ensemble, values=percentiles
+        )
+    else:
+        ensemble_percentiles = ensemble
 
     # Doy data converted previously is converted back.
     for v in ensemble_percentiles.data_vars:
@@ -374,7 +381,7 @@ def make_ensemble(
     # a best effort at working around what looks like a bug in either xclim or xarray.
     # The xarray documentation mentions: 'this method can be necessary when working
     # with many file objects on disk.'
-    ensemble_percentiles.load()
+    # ensemble_percentiles.load()
 
     return ensemble_percentiles
 
@@ -516,7 +523,11 @@ def ensemble_common_handler(
     if not convert_to_csv:
         del process.status_percentage_steps["convert_to_csv"]
     percentiles_string = request.inputs["ensemble_percentiles"][0].data
-    ensemble_percentiles = [int(p.strip()) for p in percentiles_string.split(",")]
+    ensemble_percentiles = (
+        [int(p.strip()) for p in percentiles_string.split(",")]
+        if percentiles_string != "None"
+        else []
+    )
 
     write_log(
         process,
@@ -618,6 +629,13 @@ def ensemble_common_handler(
 
     process.set_workdir(str(base_work_dir))
 
+    if "realization" in ensembles[0].dims and len(scenarios) > 1:
+        # For non-reducing calls with multiple scenarios, remove the scenario information from the member name.
+        for scen, ds in zip(scenarios, ensembles):
+            ds["realization"] = [
+                real.replace(scen, "") for real in ds.realization.values
+            ]
+
     ensemble = xr.concat(
         ensembles, dim=xr.DataArray(scenarios, dims=("scenario",), name="scenario")
     )

diff --git a/finch/processes/wps_base.py b/finch/processes/wps_base.py
@@ -154,7 +154,7 @@ def convert_xclim_inputs_to_pywps(
     # Only for generic types
     data_types = {
         InputKind.BOOL: "boolean",
-        InputKind.QUANTITY_STR: "string",
+        InputKind.QUANTIFIED: "string",
         InputKind.NUMBER: "integer",
         InputKind.NUMBER_SEQUENCE: "integer",
         InputKind.STRING: "string",

diff --git a/finch/processes/wpsio.py b/finch/processes/wpsio.py
@@ -218,7 +218,9 @@ def get_ensemble_inputs(novar=False):
     "Ensemble percentiles",
     abstract=(
         "Ensemble percentiles to calculate for input climate simulations. "
-        "Accepts a comma separated list of integers."
+        "Accepts a comma separated list of integers. An empty string will "
+        "disable the ensemble reduction and the output will have all members "
+        "along the 'realization' dimension, using the input filenames as coordinates."
     ),
     data_type="string",
     default="10,50,90",

diff --git a/finch/templates/pywps.cfg b/finch/templates/pywps.cfg
diff --git a/requirements.txt b/requirements.txt
@@ -1,13 +1,11 @@
 cftime
 pywps>=4.5.1
-xclim==0.38
-pint<=0.19.2  # xclim < 0.39 breaks with pint 0.20
+xclim==0.40
 xarray>=0.18.2
 pandas
 xesmf>=0.6.2
 clisops>=0.9.3
 geopandas
-jinja2
 click
 psutil
 bottleneck

diff --git a/requirements_dev.txt b/requirements_dev.txt
@@ -14,6 +14,5 @@ jupyter_client
 # Changing dependencies above this comment will create merge conflicts when updating the cookiecutter template with cruft. Add extra requirements below this line.
 birdhouse-birdy>=0.8.1
 geojson
-h5netcdf
 matplotlib
 pre-commit
diff --git a/requirements_docs.txt b/requirements_docs.txt
@@ -2,8 +2,7 @@ pywps>=4.5.1
 sphinx>=4.0
 nbsphinx
 ipython
-xclim==0.38
-pint<0.20  # xclim < 0.39 breaks with pint 0.20
+xclim==0.40
 matplotlib
 birdhouse-birdy>=0.8.1
 unidecode