# `opinf.pre`

```{eval-rst}
.. automodule:: opinf.pre

.. currentmodule:: opinf.pre

**Functions**

.. autosummary::
   :toctree: _autosummaries
   :nosignatures:

   shift
   scale

**Classes**

.. autosummary::
   :toctree: _autosummaries
   :nosignatures:

   ShiftScaleTransformer
   TransformerMulti
   TransformerTemplate
```

## Preprocessing Data

Raw dynamical systems data often need to be lightly preprocessed before use in Operator Inference.
The tools in this module enable centering/shifting and scaling/nondimensionalization of snapshot data after lifting (when applicable) and prior to dimensionality reduction.

:::{admonition} Notation
:class: note

On this page,
- $\q \in \RR^n$ denotes the unprocessed state variable for which we have $k$ snapshots $\q_0,\ldots,\q_{k-1}\in\RR^n$,
- $\q'\in\RR^n$ denotes state variable after being shifted (centered), and
- $\q''\in\RR^n$ denotes the state variable after being shifted _and_ scaled (non-dimensionalized).

The tools demonstrated here define a mapping $\mathcal{T}:\RR^n\to\RR^n$ with $\q'' = \mathcal{T}(\q)$.
:::

::::{admonition} Example Data
:class: tip

The examples on this page use data from the combustion problem described in {cite}`swischuk2020combustion`.
The data consists of nine variables recorded at 100 points in time.

:::{dropdown} State Variables

- Pressure $p$
- $x$-velocity $v_{x}$
- $y$-velocity $v_{y}$
- Temperature $T$
- Specific volume (inverse density) $\xi = 1/\rho$
- Chemical species molar concentrations for CH$_{4}$, O$_{2}$, CO$_{2}$, and H$_{2}$O.

The dimension of the spatial discretization in the full example in {cite}`swischuk2020combustion` is $38{,}523$ per variable, so $n = 9 \times 38{,}523 = 346{,}707$.
Here we have downsampled the state dimension to $535$ for each variable for demonstration purposes, i.e., $n = 9 \times 535 = 4{,}815$.
:::

You can [download the data here](https://github.com/Willcox-Research-Group/rom-operator-inference-Python3/raw/data/pre_example.npy) to repeat the experiments.
::::

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import opinf

opinf.utils.mpl_config()

In [None]:
# Load the example snapshot data.
snapshots = np.load("pre_example.npy")

snapshots.shape

## Shifting / Centering

A common first preprocessing step is to shift the training snapshots by some reference snapshot $\bar{\q}\in\RR^n$, i.e.,

$$
    \q' = \q - \bar{\q}.
$$

For example, the reference snapshot could be chosen to be the average of the training snapshots:

$$
    \bar{\q}
    := \frac{1}{k}\sum_{j=0}^{k-1}\q_{j}.
$$

In this case, the transformed snapshots $\q_j' = \q_j - \bar{\q}$ are centered around $\0$.
This type of transformation can be accomplished using a {class}`ShiftScaleTransformer` with `centering=True` or the {func}`shift()` function.

In [None]:
# Extract the pressure variable from the snapshot data.
pressure = np.split(snapshots, 9, axis=0)[0]

# Initialize a ShiftScaleTransformer for centering the pressure variable.
pressure_transformer = opinf.pre.ShiftScaleTransformer(
    centering=True,
    verbose=True,
)

# Shift the pressure snapshots by the average pressure snapshot.
pressure_shifted = pressure_transformer.fit_transform(pressure)

In [None]:
# Average pressure value.
np.mean(pressure)

In [None]:
# Average shifted pressure value (zero).
np.mean(pressure_shifted)

In [None]:
# Plot the distribution of the entries of the raw and processed states.
fig, axes = plt.subplots(1, 2, sharey=True)
axes[0].hist(pressure.flatten(), bins=40)
axes[1].hist(pressure_shifted.flatten(), bins=40)
axes[0].set_ylabel("Frequency")
axes[0].set_xlabel("Pressure")
axes[1].set_xlabel("Shifted pressure")
fig.tight_layout()
plt.show()

::::{admonition} Shifting Affects Model Form
:class: important

Introducing a shift can cause a structural change in the governing dynamics.
When shifting state variables, the structure of a reduced-order model should be determined based on the dynamics of the shifted variable, not the original variable.

:::{dropdown} Example 1: Linear System

Consider the linear system

$$
\begin{align*}
    \ddt\q(t) = \A\q(t).
\end{align*}
$$

The dynamics of the shifted variable $\q'(t) = \q(t) - \bar{\q}$ are given by

$$
\begin{align*}
    \ddt\q'(t)
    = \ddt[\q(t) - \bar{\q}]
    = \ddt\q(t)
    = \A\q(t)
    = \A[\bar{\q} + \q'(t)]
    = \A\bar{\q} + \A\q'(t),
\end{align*}
$$

which has a new constant term $\A\bar{\q}$ in addition to a linear term $\A\q'(t)$.
If the variable $\q$ is used for Operator Inference, the reduced-order model should take on the linear form $\ddt\qhat(t) = \Ahat\qhat(t)$, while if $\q'$ is the state variable, the reduced-order model should be $\ddt\qhat(t) = \chat + \Ahat\qhat(t)$.
:::

:::{dropdown} Example 2: Quadratic System

Consider the purely quadratic system

$$
\begin{align*}
    \ddt\q(t) = \H[\q(t)\otimes\q(t)],
\end{align*}
$$

where $\otimes$ denotes the [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product).
An appropriate reduced-order model for this system is also quadratic, $\ddt\qhat(t) = \Hhat[\qhat(t)\otimes\qhat(t)]$.
However, the dynamics of the shifted variable $\q'(t) = \q(t) - \bar{\q}$ includes lower-order terms:

$$
\begin{align*}
    \ddt\q'(t)
    &= \ddt[\q(t) - \bar{\q}]
    \\
    &= \H[\q(t)\otimes\q(t)]
    \\
    &= \H[(\bar{\q} + \q'(t))\otimes(\bar{\q} + \q'(t))]
    \\
    &= \H[\bar{\q}\otimes\bar{\q}]
    + \H[\bar{\q}\otimes\q'(t)] + \H[\q'(t)\otimes\bar{\q}]
    + \H[\q'(t)\otimes\q'(t)].
\end{align*}
$$

The terms $\H[\bar{\q}\otimes\q'(t)] + \H[\q'(t)\otimes\bar{\q}]$ can be interpreted as a linear transformation of $\q'(t)$, hence an appropriate reduced-order model for $\q'(t)$ has the fully quadratic form $\ddt\qhat(t) = \chat + \Ahat\qhat(t) + \Hhat[\qhat(t)\otimes\qhat(t)]$.
:::
::::

## Scaling / Non-dimensionalization

Many engineering problems feature multiple variables with ranges across different scales.
For such cases, it is often beneficial to scale the variables to similar ranges so that one variable does not overwhelm the other in the operator learning.

A simple scaling is given by

$$
    \q'' = \frac{1}{\alpha}\q',
$$

where $\alpha$ is chosen by examining the range of the training data.
For example, after centering the data, a scaling to $[-1, 1]$ is given by

$$
    \q''
    = \frac{1}{\alpha}\big(\q - \bar{\q}\big)
    = \frac{1}{\alpha}\q',
    \qquad
    \alpha = \max_{i,j}|\tilde{q}_{ij}'|
$$

where $\tilde{q}_{ij}'$ is the $i$th entry of $\q_j' = \q_j - \bar{\q}$.

The `scaling` argument of the {class}`ShiftScaleTransformer` determines the type of scaling transformation; see also {func}`scale()`.

In [None]:
# Extract the H2O molar concentration.
water = np.split(snapshots, 9, axis=0)[-1]

# Compare the scales of the variables.
print(
    "Pressure range (raw):",
    f"[{pressure.min():.2e}, {pressure.max():.2e}]",
    sep="\t\t",
)
print(
    "Pressure range (shifted):",
    f"[{pressure_shifted.min():.2e}, {pressure_shifted.max():.2e}]",
    sep="\t",
)
print(
    "Water range:",
    f"[{water.min():.2e}, {water.max():.2e}]",
    sep="\t\t\t",
)

In [None]:
# Apply a min-max scaling to [0, .01] on the shifted pressure snapshots.
pressure_scaled, pscale1, pscale2 = opinf.pre.scale(
    pressure_shifted,
    scale_to=(0, 1e-2),
)

In [None]:
# Compare the scales of the variables.
print(
    "Pressure range (raw):",
    f"[{pressure.min():.2e}, {pressure.max():.2e}]",
    sep="\t\t",
)
print(
    "Pressure range (shifted):",
    f"[{pressure_shifted.min():.2e}, {pressure_shifted.max():.2e}]",
    sep="\t",
)
print(
    "Pressure range (scaled):",
    f"[{pressure_scaled.min():.2e}, {pressure_scaled.max():.2e}]",
    sep="\t",
)
print(
    "Water range:",
    f"[{water.min():.2e}, {water.max():.2e}]",
    sep="\t\t\t",
)

:::{note}
Choosing an advantageous preprocessing strategy is highly problem dependent, and the tools in this module are not the only ways to preprocess snapshot data.
See, for example, {cite}`issan2023shifted` for a compelling application of Operator Inference to solar wind streams in which preprocessing plays a vital role.
:::

## Multivariable Data

For systems where the full state consists of several variables (pressure, velocity, temperature, etc.), it may not be appropriate to apply the same scaling to each variable.
The {class}`TransformerMulti` class joins individual transformers together to handle multi-state data.

Below, we construct the following transformation for the nine state variables.
- Pressure: center, then scale to $[-1, 1]$.
- $x$-velocity: Scale to $[-1, 1]$.
- $y$-velocity: Scale to $[-1, 1]$.
- Temperature: center, then scale to $[-1, 1]$.
- Specific volume: scale to $[0, 1]$.
- Chemical species: scale to $[0, 1]$.

In [None]:
combustion_transformer = opinf.pre.TransformerMulti(
    transformers=[
        opinf.pre.ShiftScaleTransformer(
            name="pressure", centering=True, scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="x-velocity", scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="y-velocity", scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="temperature", centering=True, scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="specific volume", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="methane", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="oxygen", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="carbon dioxide", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="water", scaling="minmax", verbose=True
        ),
    ]
)

snapshots_preprocessed = combustion_transformer.fit_transform(snapshots)

In [None]:
# Extract a single variable from the processed snapshots.
oxygen_processed = combustion_transformer.get_var(
    "oxygen",
    snapshots_preprocessed,
)

oxygen_processed.shape

## Custom Transformers

New transformers can be defined by inheriting from the {class}`TransformerTemplate`.
Once implemented, the `verify()` method may be used to test for consistency between the required methods.

In [None]:
class MyTransformer(opinf.pre.TransformerTemplate):
    """Custom pre-processing transformation."""

    def __init__(self, hyperparameters, name=None):
        """Set any transformation hyperparameters.
        If there are no hyperparameters, __init__() may be omitted.
        """
        super().__init__(name)
        # Process/store 'hyperparameters' here.

    # Required methods --------------------------------------------------------
    def fit_transform(self, states, inplace=False):
        """Learn and apply the transformation."""
        # Set self.state_dimension in this method, e.g.,
        self.state_dimension = len(states)
        raise NotImplementedError

    def transform(self, states, inplace=False):
        """Apply the learned transformation."""
        raise NotImplementedError

    def inverse_transform(self, states_transformed, inplace=False, locs=None):
        """Apply the inverse of the learned transformation."""
        raise NotImplementedError

    # Optional methods --------------------------------------------------------
    # These may be deleted if not implemented.
    def transform_ddts(self, ddts, inplace=False):
        """Apply the learned transformation to snapshot time derivatives."""
        return NotImplemented

    def save(self, savefile, overwrite=False):
        """Save the transformer to an HDF5 file."""
        return NotImplemented

    @classmethod
    def load(cls, loadfile):
        """Load a transformer from an HDF5 file."""
        return NotImplemented

See the {class}`TransformerTemplate` page for details on the arguments for each method.

:::{admonition} Developer Note
:class: note

In order for a custom transformer to interface correctly with {class}`TransformerMulti`, the `save()` and `load()` methods should be implemented using {func}`opinf.utils.hdf5_savehandle()` and {func}`opinf.utils.hdf5_loadhandle()`, respectively.
:::