<img src="assets/header_notebook.jpg" />
<hr style="color:#5A7D9F;">
<p align="center">
    <b style="font-size:2vw; color:#5A7D9F; font-weight:bold;">
    <center>Ocean subgrid parameterizations in an idealized model using machine learning</center>
    </b>
</p>
<hr style="color:#5A7D9F;">

In [None]:
# -----------------
#     Librairies
# -----------------
#
# --------- Standard ---------
import os
import sys
import json
import glob
import math
import torch
import random
import fsspec
import matplotlib
import fourierflow
import numpy                   as np
import pandas                  as pd
import xarray                  as xr
import seaborn                 as sns
import matplotlib.pyplot       as plt
import matplotlib.gridspec     as gridspec

from matplotlib.colorbar     import Colorbar
from argparse                import ArgumentParser
from scipy.stats             import gaussian_kde
from torch.utils.tensorboard import SummaryWriter

# --------- PYQG ---------
import pyqg
import pyqg.diagnostic_tools
from   pyqg.diagnostic_tools import calc_ispec         as _calc_ispec
import pyqg_parameterization_benchmarks.coarsening_ops as coarsening

calc_ispec = lambda *args, **kwargs: _calc_ispec(*args, averaging = False, truncate = False, **kwargs)

# --------- PYQG Benchmark ---------
from pyqg_parameterization_benchmarks.utils             import *
from pyqg_parameterization_benchmarks.utils_TFE         import *
from pyqg_parameterization_benchmarks.plots_TFE         import *
from pyqg_parameterization_benchmarks.configurations    import *
from pyqg_parameterization_benchmarks.online_metrics    import diagnostic_differences, diagnostic_similarities
from pyqg_parameterization_benchmarks.neural_networks   import NN_Parameterization, NN_Parameterization_Handler
from pyqg_parameterization_benchmarks.nn_analytical     import BackscatterBiharmonic, Smagorinsky, HybridSymbolic
from pyqg_parameterization_benchmarks.nn_kaskade        import Kaskade
from pyqg_parameterization_benchmarks.nn_fcnn           import FullyCNN
from pyqg_parameterization_benchmarks.nn_fno            import FourierNO
from pyqg_parameterization_benchmarks.nn_ffno           import FactorizedFNO
from pyqg_parameterization_benchmarks.nn_unet           import UNet

# --------- Jupyter ---------
%matplotlib inline
plt.rcParams.update({'font.size': 13})

# Making sure modules are reloaded when modified
%reload_ext autoreload
%autoreload 2

# Moving to correct folder
%cd ../src/pyqg_parameterization_benchmarks/

<hr style="color:#5A7D9F;">
<p align="center">
    <b style="font-size:1.5vw; color:#5A7D9F;">
    <center>PYQG - Generating & Saving dataset<\center>
    </b>
</p>
<hr style="color:#5A7D9F;">

<p align="justify">
    In this section, one will be able to generate:
</p>

- A **high resolution** (= HR) simulation from a quasi-geostrophic model (PYQG), 
        
- A **low resolution** (= LR) simulation.
        
- An **augmented low resolution** (= ALR) simulations.
        
Furthermore, one will be able to:
        
- Observe the **state** and **subgrid variables** associated to the dataset in the corresponding folder.

- **Save** the datasets on the hard drive. 

<hr style="color:#5A7D9F; width: 100%;" align="left">
<p align="center">
	<b style="font-size:1vw;">
	<center>Simulation type</center>
	</b>
</p>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<table style="width: 100%;" border="1">
	<tbody>
		<tr>
			<td style="width: 15%;" align="center"><b>INDEX</b></td>
			<td style="width: 13%;" align="center">0</td>
			<td style="width: 13%;" align="center">1</td>
			<td style="width: 15%;" align="center">2</td>
			<td style="width: 15%;" align="center">3</td>
			<td style="width: 16%;" align="center">4</td>
			<td style="width: 18%;" align="center">5</td>
		</tr>
		<tr>
			<td style="width: 15%;"align="center"><b>TYPE</b></td>
			<td style="width: 13%;" align="center">Eddies</td>
			<td style="width: 13%;" align="center">Jets  </td>
			<td style="width: 15%;" align="center">Eddies (Debug)</td>
			<td style="width: 15%;" align="center">Jets   (Debug)</td>
			<td style="width: 16%;" align="center">Eddies (Random)</td>
			<td style="width: 18%;" align="center">Jets (Random)</td>
		</tr>
	</tbody>
</table>

<br>

<hr style="color:#5A7D9F; width: 100%;" align="left">
<p align="center">
	<b style="font-size:1vw;">
	<center>Parameters</center>
	</b>
</p>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<table style="width: 100%;" border="1">
	<tbody>
		<tr style="height: 21px;">
			<td style="width: 16%;" align="center"><b>PARAMETERS</b></td>
			<td style="width: 12%;" align="center">nx</td>
			<td style="width: 10%;" align="center">dt</td>
			<td style="width: 12%;" align="center">tmax</td>
			<td style="width: 12%;" align="center">tavestart</td>
			<td style="width: 12%;" align="center">rek</td>
			<td style="width: 12%;" align="center">Δ</td>
			<td style="width: 14%;" align="center">β</td>
		</tr>
		<tr style="height: 21.5px;">
			<td style="width: 16%;" align="center"><b>DESCRIPTION</b></td>
			<td style="width: 12%;" align="center">Number of real space grid points in the x directions</td>
			<td style="width: 10%;" align="center">Numerical timestep (in hours)</td>
			<td style="width: 12%;" align="center">Total time of integration (in years)</td>
			<td style="width: 12%;" align="center">Start time for averaging (in years)</td>
			<td style="width: 12%;" align="center">Linear drag in lower layer</td>
			<td style="width: 12%;" align="center">Layer thickness ratio (H1/H2)</td>
			<td style="width: 14%;" align="center">Gradient of coriolis parameter.</td>
		</tr>
		<tr style="height: 21.5px;">
			<td style="width: 16%;" align="center"><b>EDDIES</b></td>
			<td style="width: 12%;" align="center">256</td>
			<td style="width: 10%;" align="center">1</td>
			<td style="width: 12%;" align="center">10</td>
			<td style="width: 12%;" align="center">5</td>
			<td style="width: 12%;" align="center">5.789e-7</td>
			<td style="width: 12%;" align="center">0.25</td>
			<td style="width: 14%;" align="center">1.5 * 1e-11</td>
		</tr>
		<tr style="height: 21px;">
			<td style="width: 16%;" align="center"><b>JETS</b></td>
			<td style="width: 12%;" align="center">256</td>
			<td style="width: 10%;" align="center">1</td>
			<td style="width: 12%;" align="center">10</td>
			<td style="width: 12%;" align="center">5</td>
			<td style="width: 12%;" align="center">7e-08</td>
			<td style="width: 12%;" align="center">0.1</td>
			<td style="width: 14%;" align="center">1e-11</td>
		</tr>
		<tr style="height: 21.5px;">
			<td style="width: 16%;" align="center"><b>EDDIES (Debug)</b></td>
			<td style="width: 12%;" align="center">256</td>
			<td style="width: 10%;" align="center">1</td>
			<td style="width: 12%;" align="center">2</td>
			<td style="width: 12%;" align="center">1</td>
			<td style="width: 12%;" align="center">5.789e-7</td>
			<td style="width: 12%;" align="center">0.25</td>
			<td style="width: 14%;" align="center">1.5 * 1e-11</td>
		</tr>
		<tr style="height: 21px;">
			<td style="width: 16%;" align="center"><b>JETS (Debug)</b></td>
			<td style="width: 12%;" align="center">256</td>
			<td style="width: 10%;" align="center">1</td>
			<td style="width: 12%;" align="center">2</td>
			<td style="width: 12%;" align="center">1</td>
			<td style="width: 12%;" align="center">7e-08</td>
			<td style="width: 12%;" align="center">0.1</td>
			<td style="width: 14%;" align="center">1e-11</td>
		</tr>
		<tr style="height: 21.5px;">
			<td style="width: 16%;" align="center"><b>EDDIES (Random)</b></td>
			<td style="width: 12%;" align="center">256</td>
			<td style="width: 10%;" align="center">1</td>
			<td style="width: 12%;" align="center">10</td>
			<td style="width: 12%;" align="center">5</td>
			<td style="width: 12%;" align="center">[5.7, 5.9] * 1e-7</td>
			<td style="width: 12%;" align="center">0.25</td>
			<td style="width: 14%;" align="center">[1.45, 1.55] * 1e-11</td>
		</tr>
		<tr style="height: 21px;">
			<td style="width: 16%;" align="center"><b>JETS (Random)</b></td>
			<td style="width: 12%;" align="center">256</td>
			<td style="width: 10%;" align="center">1</td>
			<td style="width: 12%;" align="center">10</td>
			<td style="width: 12%;" align="center">5</td>
			<td style="width: 12%;" align="center">[6.9, 7.1] * 1e-8</td>
			<td style="width: 12%;" align="center">0.1</td>
			<td style="width: 14%;" align="center">[0.95, 1.05] * 1e-11</td>
		</tr>
	</tbody>
</table>

<br>

<hr style="color:#5A7D9F; width: 100%;" align="left">
<p align="center">
	<b style="font-size:1vw;">
	<center>Coarsening operators</center>
	</b>
</p>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<table style="width: 100%;" border="1">
	<tbody>
		<tr>
			<td style="width: 10%;" align="center"><b>OPERATOR</b></td>
			<td style="width: 15%;" align="center">&nbsp;1</td>
			<td style="width: 15%;" align="center">&nbsp;2</td>
			<td style="width: 17%;" align="center">3</td>
		</tr>
		<tr>
			<td style="width: 10%;"align="center"><b>DESCRIPTION</b></td>
			<td style="width: 15%;"align="center">Spectral Truncation, Sharp Filter</td>
			<td style="width: 15%;"align="center">Spectral Truncation, Gaussian Filter</td>
			<td style="width: 17%;"align="center">GCM Filter, Averaging and Coarsening</td>
		</tr>
	</tbody>
</table>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<p align="center">
	<b style="font-size:1vw;">
	<center>Documentation</center>
	</b>
</p>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<table style="width: 100%;" border="1">
	<tbody>
		<tr>
			<td style="width: 15%;" align="center"><b>save_folder</b></td>
			<td style="width: 13%;" align="center">Name of the folder used to save the datasets</td>
		</tr>
		<tr>
			<td style="width: 15%;"align="center"><b>nb_threads</b></td>
			<td style="width: 13%;" align="center">Number of threads used to run the simulation</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>simulation_type</b></td>
			<td style="width: 13%;" align="center">Type of simulation used to generate the dataset</td>
		</tr>
		<tr>
			<td style="width: 15%;"align="center"><b>memory</b></td>
			<td style="width: 13%;" align="center">Total number of memory allocated [GB] (used for security purpose)</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>skipped_time</b></td>
			<td style="width: 13%;" align="center">Time [year] at which the sampling of the simulation starts</td>
		</tr>
		<tr>
			<td style="width: 15%;"align="center"><b>save_high_res</b></td>
			<td style="width: 13%;" align="center">Choose if the whole high resolution is saved or just the last sample (memory saving)</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>operator_cf</b></td>
			<td style="width: 13%;" align="center">Coarsening and filtering operator applied on the high resolution simulation</td>
		</tr>
		<tr>
			<td style="width: 15%;"align="center"><b>target_sample_size</b></td>
			<td style="width: 13%;" align="center">Number of samples expected to be in the datasets (nb_sample >= target_sample_size)</td>
		</tr>
	</tbody>
</table>

In [None]:
# --------------------------- GENERATING DATASET -------------------------------
%run generate_dataset.py --save_folder            NOTEBOOK_DATA                                                                                                                                   \
                         --simulation_type                    1                                                                                                                                   \
                         --target_sample_size                 5                                                                                                                                   \
                         --operator_cf                        1                                                                                                                                   \
                         --skipped_time                       1                                                                                                                                   \
                         --nb_threads                         2                                                                                                                                   \
                         --memory                            32                                                                                                                                   \
                         --save_high_res                   True

[comment]: <> (Section)
<hr style="color:#5A7D9F;">
<p align="center">
    <b style="font-size:1.5vw; color:#5A7D9F;">
    <center>Ocean subgrid parameterization - Learning</center>
    </b>
</p>
<hr style="color:#5A7D9F;">

<p align="justify">
    In this section, one will be able to:
</p>
 
- **Create** and **train** a new parameterization using the datasets created previously.

<hr style="color:#5A7D9F; width: 100%;" align="left">
<p align="center">
	<b style="font-size:1vw;">
	<center>Documentation</center>
	</b>
</p>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<table style="width: 100%;" border="1">
	<tbody>
		<tr>
			<td style="width: 15%;" align="center"><b>folder_training</b></td>
			<td style="width: 13%;" align="center">Folder used to load data as training data</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>folder_validation</b></td>
			<td style="width: 13%;" align="center">Folder used to load data as validation data</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>save_directory</b></td>
			<td style="width: 13%;" align="center">Folder used to load data as training data</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>inputs</b></td>
			<td style="width: 13%;" align="center">Type of inputs given to the parameterization for training</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>targets</b></td>
			<td style="width: 13%;" align="center">Parameterization ouptut</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>learning_rate </b></td>
			<td style="width: 13%;" align="center">The learning rate value used as starting point during training</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>batch_size</b></td>
			<td style="width: 13%;" align="center">The number of samples flowing forward before performing the back-propagation</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>optimizer</b></td>
			<td style="width: 13%;" align="center">The optimizer used during training</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>scheduler</b></td>
			<td style="width: 13%;" align="center">The scheduler used during training</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>configuration</b></td>
			<td style="width: 13%;" align="center">Choose the default configuration or one of many in configurations.py</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>num_epochs</b></td>
			<td style="width: 13%;" align="center">Number of epochs made by the paremeterization while training</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>zero_mean</b></td>
			<td style="width: 13%;" align="center">Type of pre-processing made on the datasets</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>padding</b></td>
			<td style="width: 13%;" align="center">Type of padding used by the parameterization</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>memory</b></td>
			<td style="width: 13%;" align="center">Total number of memory allocated [GB] (used for security purpose)</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>param_type</b></td>
			<td style="width: 13%;" align="center">Choose the type of parameterization used to learn closure</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>sim_type</b></td>
			<td style="width: 13%;" align="center">Type of fluid simulation studied (used to order tensorboard folders)</td>
		</tr>
	</tbody>
</table>

In [None]:
# --------------------------------- EDDIES -------------------------------------
%run train_parameterization.py --folder_training        NOTEBOOK_EDDIES_TRAINING                                                                                                         \
                               --folder_validation    NOTEBOOK_EDDIES_VALIDATION                                                                                                         \
                               --param_name                                 FFNO                                                                                                         \
                               --inputs                                    q u v                                                                                                         \
                               --targets                                q_fluxes                                                                                                         \
                               --learning_rate                             0.001                                                                                                         \
                               --batch_size                                   32                                                                                                         \
                               --optimizer                                  adam                                                                                                         \
                               --scheduler                              constant                                                                                                         \
                               --configuration                           default                                                                                                         \
                               --num_epochs                                   20                                                                                                         \
                               --zero_mean                                 False                                                                                                         \
                               --padding                                circular                                                                                                         \
                               --memory                                       32                                                                                                         \
                               --param_type                      NOTEBOOK_EDDIES                                                                                                         \
                               --sim_type                        NOTEBOOK_EDDIES

In [None]:
# ---------------------------------- JETS --------------------------------------
%run train_parameterization.py --folder_training          NOTEBOOK_JETS_TRAINING                                                                                                         \
                               --folder_validation      NOTEBOOK_JETS_VALIDATION                                                                                                         \
                               --param_name                                 FFNO                                                                                                         \
                               --inputs                                    q u v                                                                                                         \
                               --targets                                q_fluxes                                                                                                         \
                               --learning_rate                             0.001                                                                                                         \
                               --batch_size                                   32                                                                                                         \
                               --optimizer                                  adam                                                                                                         \
                               --scheduler                              constant                                                                                                         \
                               --configuration                           default                                                                                                         \
                               --num_epochs                                   20                                                                                                         \
                               --zero_mean                                 False                                                                                                         \
                               --padding                                circular                                                                                                         \
                               --memory                                       32                                                                                                         \
                               --param_type                        NOTEBOOK_JETS                                                                                                         \
                               --sim_type                          NOTEBOOK_JETS

In [None]:
# ---------------------------------- FULL --------------------------------------
%run train_parameterization.py --folder_training          NOTEBOOK_FULL_TRAINING                                                                                                         \
                               --folder_validation               FULL_VALIDATION                                                                                                         \
                               --param_name                                 FFNO                                                                                                         \
                               --inputs                                    q u v                                                                                                         \
                               --targets                                q_fluxes                                                                                                         \
                               --learning_rate                             0.001                                                                                                         \
                               --batch_size                                   32                                                                                                         \
                               --optimizer                                  adam                                                                                                         \
                               --scheduler                              constant                                                                                                        \
                               --configuration                           default                                                                                                         \
                               --num_epochs                                   20                                                                                                         \
                               --zero_mean                                 False                                                                                                         \
                               --padding                                circular                                                                                                         \
                               --memory                                       32                                                                                                         \
                               --param_type                        NOTEBOOK_FULL                                                                                                         \
                               --sim_type                          NOTEBOOK_FULL

[comment]: <> (Section)
<hr style="color:#5A7D9F;">
<p align="center">
    <b style="font-size:1.5vw; color:#5A7D9F;">
    <center>Ocean subgrid parameterization - Testing (Offline)</center>
    </b>
</p>
<hr style="color:#5A7D9F;">

<p align="justify">
    In this section, one will be able to:
</p>
 
- **Load** a trained FCNN parameterization;

- Evaluate its **offline performances** on a test set, i.e. it's ability to **predict** accurately the **subgrid forcing terms**;

<hr style="color:#5A7D9F; width: 100%;" align="left">
<p align="center">
	<b style="font-size:1vw;">
	<center>Documentation</center>
	</b>
</p>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<table style="width: 100%;" border="1">
	<tbody>
		<tr>
			<td style="width: 15%;" align="center"><b>folder_offline</b></td>
			<td style="width: 13%;" align="center">Folders used to load data as offline test data</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>folder_models</b></td>
			<td style="width: 13%;" align="center">Folder (inside the model folder) used to load all the different models to be tested</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>memory</b></td>
			<td style="width: 13%;" align="center">Total number of memory allocated [GB] (used for security purpose)</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>type_sim</b></td>
			<td style="width: 13%;" align="center">Type of fluid simulation studied (used to order tensorboard folders)</td>
		</tr>
	</tbody>
</table>

In [None]:
# ----------------------------- EDDIES --------------------------------
%run offline.py --folder_offline              NOTEBOOK_EDDIES_OFFLINE \
                --folder_models                                  hope \
                --memory                                           32 \
                --type_sim                                     EDDIES

In [None]:
# ------------------------------ JETS ---------------------------------
%run offline.py --folder_offline                         JETS_OFFLINE \
                --folder_models      PHASE_6_SENTITIVITY_ARCHITECTURE \
                --memory                                           32 \
                --type_sim                                       JETS

In [None]:
# ------------------------------ FULL ---------------------------------
%run offline.py --folder_offline                NOTEBOOK_FULL_OFFLINE \
                --folder_models                        __P5__ONLINE__ \
                --memory                                           32 \
                --type_sim                                       FULL

[comment]: <> (Section)
<hr style="color:#5A7D9F;">
<p align="center">
    <b style="font-size:1.5vw; color:#5A7D9F;">
    <center>Ocean subgrid parameterization - Testing (Online)</center>
    </b>
</p>
<hr style="color:#5A7D9F;">

<p align="justify">
    In this section, one will be able to:
</p>
 
- **Load** a trained FCNN parameterization;

- Evaluate its **online performances** on a test set, i.e. it's ability to **perform** a **meaningfull** and **accurate** simulation.

<hr style="color:#5A7D9F; width: 100%;" align="left">
<p align="center">
	<b style="font-size:1vw;">
	<center>Documentation</center>
	</b>
</p>
<hr style="color:#5A7D9F; width: 100%;" align="left">
<table style="width: 100%;" border="1">
	<tbody>
		<tr>
			<td style="width: 15%;" align="center"><b>folder_online</b></td>
			<td style="width: 13%;" align="center">Folders used to load data as offline test data</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>folder_models</b></td>
			<td style="width: 13%;" align="center">Folder (inside the model folder) used to load all the different models to be tested</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>memory</b></td>
			<td style="width: 13%;" align="center">Total number of memory allocated [GB] (used for security purpose)</td>
		</tr>
		<tr>
			<td style="width: 15%;" align="center"><b>type_sim</b></td>
			<td style="width: 13%;" align="center">Type of fluid simulation studied (used to order tensorboard folders)</td>
		</tr>
	</tbody>
</table>

In [None]:
# ----------------------------- EDDIES --------------------------------
%run online.py --folder_online                          EDDIES_ONLINE \
               --folder_models                         __P5__ONLINE__ \
               --memory                                            32 \
               --type_sim                                      EDDIES

In [None]:
# ----------------------------- JETS ----------------------------------
%run online.py --folder_online                            JETS_ONLINE \
               --folder_models                        __P4__FFNO_FULL \
               --memory                                            32 \
               --type_sim                                        JETS