Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,23 @@
# compressor
# ClimateBenchPress

This repository contains the main functionality for the ClimateBenchPress compression benchmark.

## Getting Started

This project uses the uv package manager to handle dependencies. If you don't already have it installed follow the instructions at <https://docs.astral.sh/uv/getting-started/installation/>.

Next, clone this repository and within the project directory install all the necessary dependencies with:
```bash
uv sync
uv pip install -e "."
```

### Downloading the Data

Make sure you have all the necessary data downloaded by following the instructions at <https://github.com/ClimateBenchPress/data-loader>.

## Funding

ClimateBenchPress has been developed as part of [Embed2Scale](https://embed2scale.eu/) and [ESiWACE3](https://www.esiwace.eu/).

Funded by the European Union. This work has received funding from the European High Performance Computing Joint Undertaking (JU) under grant agreement No 101093054 and EU’s Horizon Europe program under grant agreement number 101131841. This work also received funding from [UK Research and Innovation (UKRI)](https://www.ukri.org/).
28 changes: 27 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,27 @@
# compressor
# ClimateBenchPress

This repository contains the main functionality for the ClimateBenchPress compression benchmark.

## Getting Started

This project uses the uv package manager to handle dependencies. If you don't already have it installed follow the instructions at <https://docs.astral.sh/uv/getting-started/installation/>.

Next, clone this repository and within the project directory install all the necessary dependencies with:
```bash
uv sync
uv pip install -e "."
```

### Downloading the Data

Make sure you have all the necessary data downloaded by following the instructions at <https://github.com/ClimateBenchPress/data-loader>.

## Using the Benchmark

Further details on how to run the benchmark evaluation code.

## Funding

ClimateBenchPress has been developed as part of [Embed2Scale](https://embed2scale.eu/) and [ESiWACE3](https://www.esiwace.eu/).

Funded by the European Union. This work has received funding from the European High Performance Computing Joint Undertaking (JU) under grant agreement No 101093054 and EU’s Horizon Europe program under grant agreement number 101131841. This work also received funding from [UK Research and Innovation (UKRI)](https://www.ukri.org/).
85 changes: 85 additions & 0 deletions docs/run_benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Evaluating the Benchmark Results

To evaluate the benchmark results, ensure you have data at `path/to/data-loader/datasets`, which should be downloaded using the [data loader](https://github.com/ClimateBenchPress/data-loader).

On a high-level the benchmark evaluation pipeline progresses in the following steps:

1. Compute three error bound levels for each input dataset.
2. Compress each input dataset with all the benchmark compressors for all three error bounds.
3. Compute compressor performance metrics for evaluation purposes.
4. Optional: Create summary plots of the benchmark results.

We will now go through each of these steps in more detail. As you work through these steps, the pipeline will progressively populate the directories `datasets-error-bounds`, `compressed-datasets`, `metrics`, and `plots`.

## Create Error Bounds

Begin by creating the error bounds for each dataset using the following command:
```bash
uv run python -m climatebenchpress.compressor.scripts.create_error_bounds \
--data-loader-basepath=path/to/data-loader
```
This step creates three error bounds for each variable in the datasets and stores the information in the `datasets-error-bounds` directory.

## Compress Input Datasets

Next, compress all the input datasets by running:
```bash
uv run python -m climatebenchpress.compressor.scripts.compress \
--data-loader-basepath=path/to/data-loader
```
This command will populate the `compressed-datasets` directory with the following structure:
```
compressed-datasets/
dataset1/
{var_name}-{err_bound_type}={low_err_bound_val}_{var_name2}-{err_bound_type2}={low_err_bound_val2}
compressor1/
decompressed.zarr
measurements.json
compressor2/
...
{var_name}-{err_bound_type}={mid_err_bound_val}_{var_name2}-{err_bound_type2}={mid_err_bound_val2}/
...
{var_name}-{err_bound_type}={high_err_bound_val}_{var_name2}-{err_bound_type2}={high_err_bound_val2}/
...
dataset2/
...
...
```
For each dataset, the results for the three different error bounds are stored in different directories. The `var_name` indicates the variable(s) in the dataset that are being compressed, while `err_bound_type` will be either `abs_error` or `rel_error`.

You can use additional arguments to control which compressors and datasets are processed: `--exclude-compressor` and `--exclude-dataset` to avoid using certain compressors and datasets, or `--include-compressor` and `--include-dataset` to only use selected compressors on selected datasets.
For example, the command
```bash
uv run python -m climatebenchpress.compressor.scripts.compress \
--data-loader-basepath=path/to/data-loader \
--include-compressor sz3 jpeg2000 \
--include-dataset era5
```
compresses the era5 data with the compressors SZ3 and JPEG2000.
These arguments are particularly useful if you wish to parallelize the benchmark evaluation using tools such as `xargs`.

## Compute Metrics

After compression, evaluate compression metrics on the compressed datasets using:
```bash
uv run python -m climatebenchpress.compressor.scripts.compute_metrics \
--data-loader-basepath=path/to/data-loader
```
You can apply the same filtering options with `--exclude-compressor`, `--exclude-dataset`, `--include-compressor` and `--include-dataset` arguments as used in the compression step.

Once the metrics are computed, combine all the metrics into a single CSV file by running:
```bash
uv run python -m climatebenchpress.compressor.scripts.concatenate_metrics
```
This will create the `metrics/all_results.csv` file which contains all the results.

## Optional: Create Plots

Finally, generate visualization plots with the following command:
```bash
uv run python -m climatebenchpress.compressor.plotting.plot_metrics \
--data-loader-basepath=path/to/data-loader
```
This will create plots in the `plots` directory. By default, this assumes access to a LaTeX compiler. If you do not have one on your system, you can add the `--avoid-latex` flag to this command.

Note that the full plotting process can take quite a lot of time because it generates individual plots for each error bound-compressor-dataset combination. If you want to avoid generating individual plots for certain datasets, you can do so with the `--exclude-dataset` command line option.
12 changes: 5 additions & 7 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ theme:

nav:
- Home: index.md
- Tutorials:
- Run the benchmark: run_benchmark.md
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file still missing?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups yes, it's added now!

- Links:
- GitHub: https://github.com/ClimateBenchPress/compressor/
- PyPI: https://pypi.org/project/climatebenchpress-compressor/
Expand All @@ -40,7 +42,7 @@ plugins:
source_dirs:
- nav_heading: [Documentation]
base: src
ignore: ["cf.py"]
ignore: ["cf.py", "monitor.py", "variable_plotters.py", "error_dist_plotter.py", "__init__.py"]
- mkdocstrings:
enable_inventory: true
handlers:
Expand All @@ -49,7 +51,7 @@ plugins:
docstring_section_style: list
docstring_style: numpy
show_if_no_docstring: true
filters: ["!^_$", "!^_[^_]", "!^__", "__init__", "__call__", "!^cf$"]
filters: ["!^_$", "!^_[^_]", "!^__", "__init__", "__call__", "!^cf$", "!^parser$", "!^args$"]
members_order: source
group_by_category: false
show_source: false
Expand All @@ -59,11 +61,7 @@ plugins:
show_root_toc_entry: false
merge_init_into_class: true
annotations_path: source
summary:
attributes: false
classes: true
functions: true
modules: true
summary: false
inventories:
- https://docs.python.org/3.12/objects.inv
- https://numpy.org/doc/2.2/objects.inv
Expand Down
30 changes: 28 additions & 2 deletions src/climatebenchpress/compressor/compressors/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
from abc import ABC, abstractmethod
from collections import defaultdict
from collections.abc import Mapping
from functools import partial
from dataclasses import dataclass
from functools import partial
from types import MappingProxyType
from typing import Callable, Optional

Expand All @@ -19,12 +19,35 @@

@dataclass
class NamedPerVariableCodec:
"""Dataclass representing a codec for one dataset and compressor.

Attributes
----------
name : str
Name of the error bound used to create the codecs, a combination of variable
names and error bounds.
codecs : dict[VariableName, Callable[[], Codec]]
Dictionary mapping variable names to codec constructors.
"""

name: ErrorBoundName
codecs: dict[VariableName, Callable[[], Codec]]


@dataclass
class ErrorBound:
"""Dataclass representing an error bound for a variable.

Can only have one of `abs_error` or `rel_error` set, not both.

Attributes
----------
abs_error : Optional[float]
Absolute error bound for the variable.
rel_error : Optional[float]
Relative error bound for the variable.
"""

abs_error: Optional[float] = None
rel_error: Optional[float] = None

Expand Down Expand Up @@ -52,7 +75,8 @@ class VariantErrorBoundPerVariable:


class Compressor(ABC):
# Abstract interface, must be implemented by subclasses
"""Abstract base class for compressors."""

name: str
description: str

Expand All @@ -65,6 +89,7 @@ def abs_bound_codec(
data_min: Optional[float] = None,
data_max: Optional[float] = None,
) -> Codec:
"""Create a codec with an absolute error bound."""
pass

@staticmethod
Expand All @@ -76,6 +101,7 @@ def rel_bound_codec(
data_min: Optional[float] = None,
data_max: Optional[float] = None,
) -> Codec:
"""Create a codec with a relative error bound."""
pass

@classmethod
Expand Down
7 changes: 7 additions & 0 deletions src/climatebenchpress/compressor/compressors/bitround.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@


class BitRound(Compressor):
"""Bit Rounding compressor.

This compressor applies bit rounding to the data, which reduces the precision of the data
while preserving its overall structure. It then applies the Zstandard lossless codec
for further compression.
"""

name = "bitround"
description = "Bit Rounding"

Expand Down
6 changes: 6 additions & 0 deletions src/climatebenchpress/compressor/compressors/bitround_pco.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@


class BitRoundPco(Compressor):
"""Bit Rounding + PCodec compressor.

This compressor first applies bit rounding to the data, which reduces the precision of the data
while preserving its overall structure. After that, it uses PCodec for further compression.
"""

name = "bitround-pco"
description = "Bit Rounding + PCodec"

Expand Down
15 changes: 15 additions & 0 deletions src/climatebenchpress/compressor/compressors/jpeg2000.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,21 @@


class Jpeg2000(Compressor):
"""JPEG2000 compressor.

Note that JPEG2000 does not guarantee pointwise error bounds, but only average error bounds
through specifying a target Peak Signal to Noise Ratio (PSNR). We convert
the absolute error bound to a PSNR value using the formula:
```
PSNR = 20 * (log10(data_range) - log10(error_bound))
```
where `data_range = max(data) - min(data)`.
Comment thread
treigerm marked this conversation as resolved.

Additionally, JPEG2000 expects integer data, not floating point, so we linearly quantize the
data into integers ranging between 0 and 2**25 - 1, with 2**25-1 the maximum integer
value accepted by JPEG2000.
"""

name = "jpeg2000"
description = "JPEG 2000"

Expand Down
2 changes: 2 additions & 0 deletions src/climatebenchpress/compressor/compressors/sperr.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@


class Sperr(Compressor):
"""SPERR compressor."""

name = "sperr"
description = "SPERR"

Expand Down
6 changes: 6 additions & 0 deletions src/climatebenchpress/compressor/compressors/stochround.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@


class StochRound(Compressor):
"""Stochastic Rounding + PCodec compressor.

This compressor first applies stochastic rounding to the data, which adds noise to the data
while rounding it. After that, it uses Zstandard for further compression.
"""

name = "stochround"
description = "Stochastic Rounding"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@


class StochRoundPco(Compressor):
"""Stochastic Rounding + PCodec compressor.

This compressor first applies stochastic rounding to the data, which adds noise to the data
while rounding it. After that, it uses PCodec for further compression.
"""

name = "stochround-pco"
description = "Stochastic Rounding + PCodec"

Expand Down
2 changes: 2 additions & 0 deletions src/climatebenchpress/compressor/compressors/sz3.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@


class Sz3(Compressor):
"""SZ3 compressor."""

name = "sz3"
description = "SZ3"

Expand Down
2 changes: 2 additions & 0 deletions src/climatebenchpress/compressor/compressors/tthresh.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@


class Tthresh(Compressor):
"""Tthresh compressor."""

name = "tthresh"
description = "tthresh"

Expand Down
2 changes: 2 additions & 0 deletions src/climatebenchpress/compressor/compressors/zfp.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@


class Zfp(Compressor):
"""ZFP compressor."""

name = "zfp"
description = "ZFP"

Expand Down
6 changes: 6 additions & 0 deletions src/climatebenchpress/compressor/compressors/zfp_round.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@


class ZfpRound(Compressor):
"""ZFP-ROUND compressor.

This is an adjusted version of the ZFP compressor with an improved rounding mechanism
for the transform coefficients.
"""

name = "zfp-round"
description = "ZFP-ROUND"

Expand Down
2 changes: 2 additions & 0 deletions src/climatebenchpress/compressor/metrics/abc.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@


class Metric(ABC):
"""Base class for metrics."""

@abstractmethod
def __call__(self, x: xr.DataArray, y: xr.DataArray) -> float:
"""
Expand Down
Loading