<div style="background-color: #ccffcc; padding: 10px;">
    <h1> Tutorial 2 </h1>
    <h2> Introduction to Modal Decomposition </h2>
</div>

# Overview

This Jupyter notebook demonstrates how modal decomposition methods can be used for flow feature extraction in fluid mechanics datasets. Modal decomposition techniques, such as Proper Orthogonal Decomposition (POD) and Dynamic Mode Decomposition (DMD), help identify coherent structures in fluid flows, providing a useful dimension reduction for subsequent reduced order modelling. The example application focuses on the classic problem of fluid flow past a cylinder, showing how these methods can simplify complex flow fields into a manageable number of modes. This dimension reduction enables efficient and accurate reduced order modelling using Sparse Identification of Nonlinear Dynamics (SINDy).

## Recommended reading

* [Reduced order modelling](https://uk.mathworks.com/discovery/reduced-order-modeling.html)
* [Dynamic Mode Decomposition (DMD) of numerical and experimental data](https://doi.org/10.1017/S0022112010001217)
* [Proper Orthogonal Decomposition (POD) MIT notes](http://web.mit.edu/6.242/www/images/lec6_6242_2004.pdf)

<hr>

<div style="background-color: #e6ccff; padding: 10px;">

<h1> Machine Learning Theory </h1>

# Modal Decomposition

## The problem

Flow feature extraction in fluid mechanics datasets involves identifying and characterizing significant patterns and structures within fluid flow data. This process is helpful for understanding complex flow behaviors, such as turbulence, vortex dynamics, and boundary layer interactions. By extracting these features, researchers can gain insights into the underlying physics of fluid flows and improve predictive models.

Modal decomposition methods, such as Proper Orthogonal Decomposition (POD) and Dynamic Mode Decomposition (DMD), are powerful tools for flow feature extraction. These methods decompose complex flow fields into a set of orthogonal modes, each representing a distinct flow feature. By analyzing these modes, researchers can isolate and study specific flow phenomena, leading to a deeper understanding of fluid dynamics and more efficient data analysis.

## Popular modal decomposition methods

* Singular Value Decomposition (SVD): A fundamental linear algebra technique used to decompose a matrix into its singular values and vectors, often used in various modal analysis methods.
* Proper Orthogonal Decomposition (POD): Also known as Principal Component Analysis (PCA) in statistics, POD identifies the most energetic modes in a flow field.
* Dynamic Mode Decomposition (DMD): A method that decomposes complex systems into modes with specific temporal behaviors, useful for analyzing dynamic features in fluid flows.
* Fourier Decomposition: Decomposes a signal into its constituent frequencies, often used for periodic or quasi-periodic flows.
* Wavelet Decomposition: Provides a time-frequency representation of a signal, useful for analyzing transient and multi-scale phenomena in fluid flows.

<div style="background-color: #cce5ff; padding: 10px;">

# Python

## [SciPy](https://scipy.org/)

SciPy is a widely used open-source library for scientific and technical computing in Python. It builds on the NumPy array object and provides a large collection of algorithms and functions for numerical integration, optimization, signal processing, linear algebra, and more. SciPy enables users to perform complex scientific computations with ease and efficiency. With its intuitive Python interface, SciPy is accessible for beginners, yet it also offers advanced capabilities for experienced programmers. SciPy is compatible with various platforms, from personal computers to high-performance computing environments.

## [PySINDy](https://github.com/dynamicslab/pysindy)

PySINDy is a sparse regression package with several implementations for the Sparse Identification of Nonlinear Dynamical systems (SINDy) method introduced in Brunton et al. (2016a), including the unified optimization approach of Champion et al. (2019), SINDy with control from Brunton et al. (2016b), Trapping SINDy from Kaptanoglu et al. (2021), SINDy-PI from Kaheman et al. (2020), PDE-FIND from Rudy et al. (2017), and so on. A comprehensive literature review is given in de Silva et al. (2020) and Kaptanoglu, de Silva et al. (2021).

## Further reading

If you want to run this notebook locally or on a remote service:

* [running Jupyter notebooks](https://jupyter.readthedocs.io/en/latest/running.html)
* [installing the required Python environments](https://github.com/cemac/LIFD_ENV_ML_NOTEBOOKS/blob/main/howtorun.md)
* [running the Jupyter notebooks locally](https://github.com/cemac/LIFD_ENV_ML_NOTEBOOKS/blob/main/jupyter_notebooks.md)

</div>

<hr>

<div style="background-color: #ffffcc; padding: 10px;">
    
<h1> Requirements </h1>

This notebook should run with the following requirements satisfied.

<h2> Python Packages: </h2>

* numpy
* scipy
* matplotlib
* notebook
* pysindy
* scikit-learn

<h2> Data Requirements</h2>

Required data from the fluid dynamics simulations are already included in the repository as `.npz` files.

</div>

**Contents:**

1. [Overview and machine-learning theory](#Overview)
2. [Singular Value Decomposition (SVD)](#Part-1:-SVD)
3. [Proper Orthogonal Decomposition (POD)](#Part-2:-POD)
4. [Dynamic Mode Decomposition (DMD)](#Part-3:-DMD)

<div style="background-color: #cce5ff; padding: 10px;">

## Import modules

First we will import all the modules needed during this tutorial.

</div>

### Note for Colab users

If you are using Google Colab to run this notebook, you will need to download an additional module now by uncommenting and running the next code cell.

In [None]:
# !wget https://raw.githubusercontent.com/cemac/LIFD_ModalDecomposition/refs/heads/main/helper_functions.py

Let's import all the libraries we need. This may take a few seconds, depending on the speed of your filesystem.

In [None]:
import numpy as np
import scipy.sparse as sp
import matplotlib.pyplot as plt
from sklearn.datasets import load_sample_image
from helper_functions import download_data

## Part 1: SVD

Let's start by reviewing the Singular Value Decomposition (SVD). The SVD is a powerful linear algebra technique that decomposes a matrix $\mathbf{A}$ as $\mathbf{A}=\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^H$, where $H$ denotes the Hermitian transpose. The column vectors of $\mathbf{U}$ and $\mathbf{V}$ are known as the left and right singular vectors, respectively. Denoting these column vectors as $\mathbf{u}_j$ and $\mathbf{v}_j$, we can also write the singular triplet $(\mathbf{u}_j, \mathbf{v}_j, \sigma_j)$, where $\sigma_j$ is $\boldsymbol{\Sigma}_{jj}$. The matrices $\mathbf{U}$ and $\mathbf{V}$ are unitary, meaning that $\{\mathbf{u}_j\}_{j=0}$ and $\{\mathbf{v}_j\}_{j=0}$ form orthogonal bases.

By definition we then have that $\mathbf{A}\mathbf{v}_j=\sigma_j\mathbf{u}_j$, showing that the action of $\mathbf{A}$ on any vector can be approximated well by the sum of a handful of vectors $\mathbf{u}_j$ provided the singular values decay quickly. In other words, if we have the ordering $\sigma_0>\sigma_1>...$, then if $\sigma_0\gg\sigma_1\gg...$ the SVD can be used to create a low-rank approximation to $\mathbf{A}$.

To illustrate this, let's consider compressing an image.

We first load an image of a flower, and rescale the integer red green blue (RGB) data to be floats between zero and one.

In [None]:
# Load a flower image, and rescale the RGB channels to lie within [0, 1]
flower = np.float32(load_sample_image('flower.jpg')/255)
channels = ['red', 'green', 'blue']

In [None]:
def plot_image(image: np.ndarray, ax=None, title=None):
    """
    Plots an image from RGB data.
    
    Parameters
    ----------  
    image: array with shape (number of pixels in y-direction,
                             number of pixels in x-direction,
                             channels).
    ax: axis to plot the image in.
    title: title for the plot.
    """
    if ax is None:
        fig, ax = plt.subplots(1)
    ax.imshow(image)
    ax.axis('off')
    ax.set_title(title)

In [None]:
plot_image(flower, title='Original image')

In [None]:
print('Shape of our image:', flower.shape)
print('Memory of flower: %.2f MB ' % (np.prod(flower.shape)*32/1024/1024))

This image takes $427\times640\times3\times\textrm{size of  float}\approx 25 ~\mathrm{MB}$. Let's see if we can compress the image by retaining a low-rank approximation where only some of the singular values are kept.

In [None]:
channels = ['red', 'green', 'blue']

In [None]:
def image_reduced(image, rank=None):
    """
    Return a low-rank approximation for an image.

    Parameters
    ----------
    image: array with shape (number of pixels in y-direction,
                             number of pixels in x-direction,
                             number of channels).
    rank: How many dominant singular values to keep (default: all of them).

    Returns
    -------
    reduced_image: A dictionary with size, red, green, and blue keys. The
                   entry of size contains the shape of the image.
                   Each channel (red, green, blue) entry is a tuple 
                   (U, Sigma, VH), such that our low rank approximation 
                   for that channel is U@Sigma@VH.
    """
    reduced_image = {'size': image.shape}
    if rank == None:
        rank = np.min(image.shape[:2]) - 1
    # Loop over RGB channels
    for i, channel in enumerate(channels):
        U, S, VH = sp.linalg.svds(flower[:, :, i], k=rank)
        reduced_image[channel] = (U, S, VH)
    return reduced_image

In [None]:
def reconstruct_image(low_rank_image):
    """
    Reconstructs the image from the low rank approximation.

    Parameters
    ----------
    low_rank_image: Low rank approximation obtained from image_reduced.
    """
    # Reconstruct image
    reconstructed_image = np.empty(shape=low_rank_image['size'], dtype=np.float32)
    for i, channel in enumerate(channels):
        (U, S, VH) = low_rank_image[channel]
        reconstructed_image[:, :, i] = U @ np.diag(S) @ VH
    return reconstructed_image

In [None]:
ranks = [1, 10, 40, 100]
reduced_images = []
for rank in ranks:
    reduced_images.append(image_reduced(flower, rank=rank))
fig, ax = plt.subplots(2, 2)
for i, rank in enumerate(ranks):
    image = reconstruct_image(reduced_images[i])
    plot_image(image, ax=ax.flatten()[i], title=r'Rank=%d' % rank)

From the above images we see that increasing the rank improves our low-rank approximation. Recall, that the full rank of the original image is 427. Despite this, even a rank 40 approximation is pretty good - albeit with some artifacts. At rank 100 the image is indistinguishable (at least to me). The memory requirement for a rank-100 approximation is 9.78 MB, significantly smaller than the original 25.02 MB image.

In [None]:
def memory_of_approximation(r):
    """
    Returns the memory requirement for a rank-r approximation.
    
    Parameters
    ----------
    r: rank of approximation.

    Returns
    -------
    mem: Memory in MB needed to store the approximation.
    """
    # Have r floats, r vectors of size 427, r vectors of size 640 and 3 channels
    mem = 3*((r*427) + (r*640) + r)*32/1024/1024
    return mem

print('Memory of rank 100 approximation %.2f MB ' % (memory_of_approximation(100)))

A more systematic way, than looking by eye, to gauge what rank is needed is to look at the singular values. We can do this by obtaining a full-rank approximation, and then adding up the singular values for each channel and normalising.

In [None]:
flower_rank_full = image_reduced(flower)

In [None]:
# Let's look at the singular values
svd_sum = 0
for channel in channels:
    (U, S, VH) = flower_rank_full[channel]
    plt.semilogy(S[::-1]/np.max(S), color=channel)
    svd_sum += S[::-1]
# Normalise svd_sum
svd_sum /= np.max(svd_sum)
plt.semilogy(svd_sum, color='black')
plt.grid(which='both')

In [None]:
rank_99 = np.argmax(svd_sum < 0.01)
print(rank_99)

We see that the singular values drop off quickly, indicating that a low-rank approximation will be possible. A good rule of thumb is to set the rank such that the singular values have dropped to below 99% of the their maximum value. For the flower this obtained at 65.

### Exercise
Explore low-rank approximations for another image, e.g. `china.jpg`, in the scikit-image library.

## Part 2: POD

So far we've seen that the SVD is able to capture the essence of a complicated dataset through reducing it to a low-rank approximation. Let's now consider the application of the SVD to fluid mechanics datasets. To this end, let's consider the classic example of flow past a cylinder (at Reynolds number 100). The dataset is stored on Hugging Face and can be downloaded with the `download_data` helper function. First, let's examine the data.

In [None]:
# Load data
download_data()
data = np.load('data/cylinder_flow_data.npz')

xu = data['xu']
yu = data['yu']
u = data['u']

xv = data['xv']
yv = data['yv']
v = data['v']
t = data['t']
lift  = data['lift']
lift_time  = data['lift_time']

In [None]:
plt.plot(lift_time[1:], lift[1:])
plt.xlabel(r'$t$')
plt.ylabel(r'Lift')

Based on the above image, lets truncate our flow data to the vortex street which starts around $t=100$.

In [None]:
t_start = np.argmin(np.abs(t-300))
u = u[t_start:]
print(u.shape)

In [None]:
plt.contourf(xu, yu, u[-1], levels=40, cmap='plasma')
plt.xlim([-1, 8])
plt.ylim([-2, 2])
plt.gca().set_aspect(True)
plt.colorbar()

In [None]:
u_bar = np.mean(u, axis=0)

In [None]:
plt.contourf(xu, yu, u_bar, levels=40, cmap='plasma')
plt.xlim([-1, 8])
plt.ylim([-2, 2])
plt.gca().set_aspect(True)
plt.colorbar()
plt.title('Mean flow')

In [None]:
def POD(X, weight=None):
    """
    Computes the POD using the method of snapshots.
    
    Parameters
    ----------
    X: Snapshot matrix. Can be multidimensional, but time first be the last axis.
    weight: Weight matrix for weighting the snapshots.
    
    Returns
    -------
    pod_modes: Matrix of pod_modes (space, mode_index).
    eigenvaues: eigenvalues corresponding to the pod_modes.
    """
    # Store the spatial shape
    orig_shape = X.shape[:-1]
    if X.ndim != 2:
        # Must flatten spatial dimensions before SVD
        X_snaps = X.reshape((np.prod(orig_shape), -1))
    else:
        X_snaps = X
    # Form the covariance matrix
    C = X_snaps.T @ X_snaps
    # Perform the eigenvalue decompositions
    eigval, eigvec = sp.linalg.eigs(C, k=24)
    # Reconstruct the POD modes
    pod_modes = X_snaps @ eigvec
    # Make mode_index the first dimension
    pod_modes = pod_modes.T
    # Unflatten the spatial dimension
    pod_modes = pod_modes.reshape((-1,) + orig_shape)
    return eigval, pod_modes

In [None]:
POD_data = (u-u_bar).transpose((1, 2, 0))

In [None]:
eigval, pod_modes = POD(POD_data)

In [None]:
print(pod_modes.shape)

In [None]:
plt.semilogy(eigval/eigval[0], 'x')

In [None]:
plt.contourf(xu, yu, pod_modes[0].real, levels=40, cmap='bwr')
plt.xlim([-1, 8])
plt.ylim([-2, 2])
plt.gca().set_aspect(True)
plt.colorbar()
plt.figure()

## Part 3: DMD

In [None]:
DMD_data = u.transpose(1, 2, 0)

In [None]:
def DMD(snapshots, rank=10):
    """
    Performs DMD on snapshot data, where time is the last axis
    and snapshots are separated by a constant time interval.

    Parameters
    ----------
    snapshots: Snapshot matrix. Can be multidimensional, but time first be the last axis.
    
    Returns
    -------
    eigval: eigenvalues corresponding to the DMD_modes.
    DMD_modes: Matrix of DMD_modes (space, mode_index).
    """
    orig_shape = snapshots.shape[:-1]
    if snapshots.ndim != 2:
        snapshots_flattened = snapshots.reshape((np.prod(orig_shape), -1))
    else:
        snapshots_flattened = snapshots
    X = snapshots_flattened[:,:-1]
    Y = snapshots_flattened[:,1:]
    U, S, VH = sp.linalg.svds(X, k=rank)
    Abar = U.conj().T @ Y @ VH.conj().T @ np.diag(1/S)
    eigval, eigvec = sp.linalg.eigs(Abar, k=rank)
    DMD_modes = Y @ VH.conj().T @ np.diag(1/S) @ eigvec
    DMD_modes = DMD_modes.T
    DMD_modes = DMD_modes.reshape((-1,) + orig_shape)
    return eigval, DMD_modes

In [None]:
e, m = DMD(POD_data)

In [None]:
fig, ax = plt.subplots(1, 2)
theta = np.linspace(0, 2*np.pi, 100)
ax[0].plot(e.imag, e.real, 'x')
ax[0].plot(np.cos(theta), np.sin(theta))
ax[0].set_xlim([-1, 1])
ax[0].set_ylim([-1, 1])
ax[0].set_aspect('equal')

mu = np.log(e)
ax[1].plot(mu.imag, mu.real, 'x')
ax[1].set_ylim([-0.1, 0.1])
fig.tight_layout()

In [None]:
plt.contourf(xu, yu, m[-1].real, levels=40, cmap='bwr')
plt.xlim([-1, 8])
plt.ylim([-2, 2])
plt.gca().set_aspect(True)
plt.colorbar()
plt.figure()

## References

   de Silva, Brian M., Kathleen Champion, Markus Quade,
   Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton.
   *PySINDy: a Python package for the sparse identification of
   nonlinear dynamics from data.* arXiv preprint arXiv:2004.08424 (2020)
   [arXiv](https://arxiv.org/abs/2004.08424)

   Kaptanoglu, Alan A., Brian M. de Silva, Urban Fasel, Kadierdan Kaheman, Andy J. Goldschmidt
   Jared L. Callaham, Charles B. Delahunt, Zachary G. Nicolaou, Kathleen Champion,
   Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton.
   *PySINDy: A comprehensive Python package for robust sparse system identification.*
   arXiv preprint arXiv:2111.08481 (2021).
   [arXiv](https://arxiv.org/abs/2111.08481)

   Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz.
   *Discovering governing equations from data by sparse identification
   of nonlinear dynamical systems.* Proceedings of the National
   Academy of Sciences 113.15 (2016): 3932-3937.
   [DOI](http://dx.doi.org/10.1073/pnas.1517384113)

   Champion, K., Zheng, P., Aravkin, A. Y., Brunton, S. L., & Kutz, J. N. (2020).
   *A unified sparse optimization framework to learn parsimonious physics-informed
   models from data.* IEEE Access, 8, 169259-169271.
   [DOI](https://doi.org/10.1109/ACCESS.2020.3023625)

   Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz.
   *Sparse identification of nonlinear dynamics with control (SINDYc).*
   IFAC-PapersOnLine 49.18 (2016): 710-715.
   [DOI](https://doi.org/10.1016/j.ifacol.2016.10.249)

   Kaheman, K., Kutz, J. N., & Brunton, S. L. (2020).
   *SINDy-PI: a robust algorithm for parallel implicit sparse identification
   of nonlinear dynamics.* Proceedings of the Royal Society A, 476(2242), 20200279.
   [DOI](https://doi.org/10.1098/rspa.2020.0279)

   Kaptanoglu, A. A., Callaham, J. L., Aravkin, A., Hansen, C. J., & Brunton, S. L. (2021).
   *Promoting global stability in data-driven models of quadratic nonlinear dynamics.*
   Physical Review Fluids, 6(9), 094401.
   [DOI](https://doi.org/10.1103/PhysRevFluids.6.094401)