Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML notebook #191

Merged
merged 18 commits into from Apr 26, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions HISTORY.rst
Expand Up @@ -25,6 +25,7 @@ New features and enhancements
* New masking feature in ``extract_dataset``. (:issue:`180`, :pull:`182`).
* New function ``xs.spatial.subset`` to replace ``xs.extract.clisops_subset`` and add method "sel". (:issue:`180`, :pull:`182`).
* Add long_name attribute to diagnostics. ( :pull:`189`).
* Added a new YAML-centric notebook (:issue:`8`, :pull:`191`).

Breaking changes
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -56,6 +57,7 @@ Internal changes
* The top-level Makefile now includes a `linkcheck` recipe, and the ReadTheDocs configuration no longer reinstalls the `llvmlite` compiler library. (:pull:`173`).
* The checkups on coverage and duplicates can now be skipped in `subset_file_coverage`. (:pull:`170`).
* Changed the `ProjectCatalog` docstrings to make it more obvious that it needs to be created empty. (:issue:`99`, :pull:`184`).
* Added parse_config to the functions in xscen.spatial and to reduce_ensemble (:pull:`191`).

v0.5.0 (2023-02-28)
-------------------
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Expand Up @@ -33,6 +33,7 @@ Features
notebooks/3_diagnostics
notebooks/4_ensemble_reduction
notebooks/5_warminglevels
notebooks/6_config
columns
api
contributing
Expand Down
320 changes: 320 additions & 0 deletions docs/notebooks/6_config.ipynb
@@ -0,0 +1,320 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "3c366641",
"metadata": {},
"source": [
"# YAML usage\n",
"\n",
"<div class=\"alert alert-info\"> <b>NOTE:</b> This tutorial will mostly remain xscen-specific and, thus, will not go into more advanced YAML functionalities such as anchors. More information on that can be consulted <a href=https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/>here</a>, while <a href=https://github.com/Ouranosinc/xscen/blob/main/templates/1-basic_workflow_with_config/config1.yml>this template</a> makes ample use of them. </div>\n",
"\n",
"While parameters can be explicitely given to functions, most support the use of YAML configuration files to automatically pass arguments. This tutorial will go over basic principles on how to write and prepare configuration files, and provide a few examples.\n",
"\n",
"An `xscen` function supports YAML parametrisation if it is preceded by the `parse_config` wrapper in the code. Currently supported functions are:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e6662ca",
"metadata": {},
"outputs": [],
"source": [
"from xscen.config import get_configurable\n",
"\n",
"list(get_configurable().keys())"
]
},
{
"cell_type": "markdown",
"id": "1fdc6de2-2b19-4452-823e-158f690e9ff3",
"metadata": {},
"source": [
"## Loading an existing YAML config file\n",
"\n",
"YAML files are read using `xscen.load_config`. Any number of files can be called, which will be merged together into a single python dictionary accessed through `xscen.CONFIG`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9bd25e3b-07a7-48d1-8fc3-755641f835eb",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"import xscen as xs\n",
"from xscen import CONFIG"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f2d9b141-1768-4783-aec1-8016a1201ff8",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Load configuration\n",
"xs.load_config(\n",
" str(\n",
" Path().absolute().parent.parent\n",
" / \"templates\"\n",
" / \"1-basic_workflow_with_config\"\n",
" / \"config1.yml\"\n",
" ),\n",
" # str(Path().absolute().parent.parent / \"templates\" / \"1-basic_workflow_with_config\" / \"paths1_example.yml\") We can't actually load this file due to the fake paths, but this would be the format\n",
")\n",
"\n",
"# Display the dictionary keys\n",
"print(CONFIG.keys())"
]
},
{
"cell_type": "markdown",
"id": "77d673f4-9659-4aa4-935a-32020385d4ee",
"metadata": {
"tags": []
},
"source": [
"`xscen.CONFIG` behaves similarly to a python dictionary, but has a custom `__getitem__` that returns a `deepcopy` of the requested item. As such, it is unmutable and thus, reliable and robust."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1f292ec-d9f3-433a-9ec2-3cea84faf8dd",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# A normal python dictionary is mutable, but a CONFIG dictionary is not.\n",
"pydict = dict(CONFIG[\"project\"])\n",
"print(CONFIG[\"project\"][\"id\"], \", \", pydict[\"id\"])\n",
"pydict2 = pydict\n",
"pydict2[\"id\"] = \"modified id\"\n",
"print(CONFIG[\"project\"][\"id\"], \", \", pydict[\"id\"], \", \", pydict2[\"id\"])\n",
"pydict3 = pydict2\n",
"pydict3[\"id\"] = \"even more modified id\"\n",
"print(\n",
" CONFIG[\"project\"][\"id\"],\n",
" \", \",\n",
" pydict[\"id\"],\n",
" \", \",\n",
" pydict2[\"id\"],\n",
" \", \",\n",
" pydict3[\"id\"],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "a565f3a1-530e-45f1-96af-e6a80fe23ca8",
"metadata": {},
"source": [
"## Building a YAML config file\n",
"### Generic arguments\n",
"\n",
"Since `CONFIG` is a python dictionary, anything can be written in it if it is deemed useful for the execution of the script. A good practice, such as seen in [this template' config1.yml](https://github.com/Ouranosinc/xscen/tree/main/templates/1-basic_workflow_with_config/config1.yml), is for example to use the YAML file to provide a list of tasks to be accomplished, give the general description of the project, or provide a dask configuration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75bf160f-c256-4893-b147-90a864832731",
"metadata": {},
"outputs": [],
"source": [
"print(CONFIG[\"tasks\"])\n",
"print(CONFIG[\"project\"])\n",
"print(CONFIG[\"regrid\"][\"dask\"])"
]
},
{
"cell_type": "markdown",
"id": "8c4a85bc-7a8a-40b6-af6c-45d0b5b9b289",
"metadata": {},
"source": [
"These are not linked to any function and will not automatically be called upon by `xscen`, but can be referred to during the execution of the script. Below is an example where `tasks` is used to instruct on which tasks to accomplish and which to skip. Many such example can be seen throughout [the provided templates](https://github.com/Ouranosinc/xscen/tree/main/templates)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae19691c-e7d8-4747-bca1-acb5b042b4cf",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"if \"extract\" in CONFIG[\"tasks\"]:\n",
" print(\"This will start the extraction process.\")\n",
"\n",
"if \"figures\" in CONFIG[\"tasks\"]:\n",
" print(\n",
" \"This would start creating figures, but it will be skipped since it is not in the list of tasks.\"\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "c8d0d86f-12f7-483a-8a85-3102e4faf8bc",
"metadata": {},
"source": [
"### Function-specific parameters\n",
"\n",
"In addition to generic arguments, a major convenience of YAML files is that parameters can be automatically fed to functions if they are wrapped by `@parse_config` (see above for the list of currently supported functions). The exact following format has to be used:\n",
"\n",
"```\n",
"module:\n",
" function:\n",
" argument:\n",
"```\n",
"\n",
"The most up-to-date list of modules can be consulted [here](https://xscen.readthedocs.io/en/latest/modules.html), as well as at the start of this tutorial. A simple example would be as follows:\n",
"```\n",
"regrid:\n",
" regrid_dataset:\n",
" regridder_kwargs:\n",
" method: bilinear\n",
" extrap_method: inverse_dist\n",
" reuse_weights: False\n",
"```\n",
"\n",
"Some functions have arguments in the form of lists and dictionaries. These are also supported:\n",
"```\n",
"extract:\n",
" search_data_catalogs:\n",
" variables_and_freqs:\n",
" tasmax: D\n",
" tasmin: D\n",
" pr: D\n",
" dtr: D\n",
" allow_resampling: False\n",
" allow_conversion: True\n",
" periods: ['1991', '2020']\n",
" other_search_criteria:\n",
" source:\n",
" \"ERA5-Land\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a59bcf70-9371-4fa4-ad1e-bd4f98dd9d12",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Note that the YAML used here is more complex and separates tasks between 'reconstruction' and 'simulation', which would break the automatic passing of arguments.\n",
"print(\n",
" CONFIG[\"extract\"][\"reconstruction\"][\"search_data_catalogs\"][\"variables_and_freqs\"]\n",
") # Dictionary\n",
"print(CONFIG[\"extract\"][\"reconstruction\"][\"search_data_catalogs\"][\"periods\"]) # List"
]
},
{
"cell_type": "markdown",
"id": "85b36803-29b8-4993-add3-3c69e4e4a750",
"metadata": {},
"source": [
"Let's test that it is working, using `climatological_mean`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71287ea3-a7b6-41b6-91f4-e36644d463cc",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# We should obtain 30-year means separated in 10-year intervals.\n",
"CONFIG[\"aggregate\"][\"climatological_mean\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13e7468e-c6f6-4ae8-b204-76b3b4c49000",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# Create a dummy dataset\n",
"import xarray as xr\n",
"\n",
"time = pd.date_range(\"1951-01-01\", \"2100-01-01\", freq=\"AS-JAN\")\n",
"da = xr.DataArray([0] * len(time), coords={\"time\": time})\n",
"da.name = \"test\"\n",
"ds = da.to_dataset()\n",
"\n",
"# Call climatological_mean using no argument other than what's in CONFIG\n",
"print(xs.climatological_mean(ds))"
]
},
{
"cell_type": "markdown",
"id": "a19640b8-987d-4272-81f3-b25c9ba42bf2",
"metadata": {},
"source": [
"### Managing paths\n",
"\n",
"As a final note, it should be said that YAML files are a good way to privately provide paths to a script without having to explicitely write them in the code. [An example is provided here](https://github.com/Ouranosinc/xscen/blob/main/templates/1-basic_workflow_with_config/paths1_example.yml). As stated earlier, `xs.load_config` will merge together the provided YAML files into a single dictionary, meaning that the separation will be seamless once the script is running.\n",
"\n",
"As an added protection, if the script is to be hosted on Github, `paths.yml` (or whatever it is being called) can then be added to the `.gitignore`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec183acb",
"metadata": {},
"outputs": [],
"source": [
"getattr(xs.catalog.ProjectCatalog, \"configurable\", False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5051c7f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
10 changes: 5 additions & 5 deletions templates/1-basic_workflow_with_config/config1.yml
Expand Up @@ -432,15 +432,15 @@ logging: # general logging args
class : logging.StreamHandler
formatter: default
level : INFO
file:
class: logging.FileHandler
formatter: default
level : DEBUG
# file:
# class: logging.FileHandler
# formatter: default
# level : DEBUG
loggers:
xscen:
propagate: False
level: INFO
handlers: [file, console]
handlers: [console] # [file, console] could also be used to write the log to a file


to_dataset_dict: # parameters to open datasets
Expand Down