# Introduction

This script is an example to run the code for the data challenge with explanations. Details of each step can be found in each section. Functions used are from the ``` ./lib ``` folder. 

The structure of the code is:
1. Data preprocessing:
    - Removing NANs in the data
    - Standardrization
2. Training: 
    - Dimension reduction with autoencoder
    - GAN/CNN for super-resolution (SR)
    
## Code structure:
- Data preprocessing:
    * ``` preprocess.py ``` provides the functions needed for data preprocessing
- GAN for super-resolution
    * ``` models.py ``` provides the models needed in GAN
    * ``` GAN_class.py ``` contains the class defined for full GAN training

# Code for running

## Import libraries

In [6]:
%reset -f
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd
import xarray as xr
import h5py

import matplotlib.pyplot as plt
import sys
sys.path.append("./lib")
from preprocess import *
from models import *
from GAN_class import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Data preprocessing

The function ``` data_preprocess ``` will take the list of data sets and parameters required in the preprocessing.

The parameters are explained below:

1. For NAN removal:
    - ``` "nan_dim_along" ```: the dimension along which we remove NAN data
    - ``` "nan_data_irrelevant" ```: the irrelevant data to detect NAN
    - ``` "output_folder" ```: folder for data output
    - ``` "file_format" ```: the file name format used for saving NAN-removed data
2. For Standardrization:
    - ``` "stat_dim" ```: the statistical dimension of the data, used for mean, stddev, etc.
    - ``` "std_file_format" ```: the file name format used for standardrized data file h5
    - ``` "std_data_format" ```: the data name format used for h5 standardrized data 
    - ``` "std_data_list" ```: list of data to be standardrized
    - ``` "std_dataset_list" ```: list of data sets to be standardrized
    - ``` "num_error_tolerance" ```: numerical tolerance for passing the standardrization test

In [9]:
#set up parameters
output_folder = "../data/preprocessed/"
file_format= "%s"
std_file_format = "np_gan_standard"
std_data_format = "np_%s"
parameters = {"nan_dim_along":"time", "nan_data_irrelevant":"absolute_height", \
              "output_folder":output_folder,"file_format":file_format, \
              "stat_dim" : "time",\
              "std_file_format":std_file_format, "std_data_format":std_data_format,\
              "std_data_list":["u","v"], \
              "std_dataset_list":["perdigao_low_res_1H_2020","perdigao_high_res_1H_2020"],\
              "num_error_tolerance":1e-5}
list_of_data_set_path=['../data/perdigao_era5_2020.nc', '../data/perdigao_low_res_1H_2020.nc', '../data/perdigao_high_res_1H_2020.nc' ]

In [10]:
#start data_preprocess
data_preprocess(list_of_data_set_path, parameters)

Creating DataSets by loading files:['../data/perdigao_era5_2020.nc', '../data/perdigao_low_res_1H_2020.nc', '../data/perdigao_high_res_1H_2020.nc']
Removing NAN indices along dimension:time
Searching NAN in DataSets: dict_keys(['perdigao_era5_2020', 'perdigao_low_res_1H_2020', 'perdigao_high_res_1H_2020'])...
Checking nan pattern of variable: u100  for DataSet: perdigao_era5_2020
Total number of NAN: (0,), along (0,) time indicies
Checking nan pattern of variable: v100  for DataSet: perdigao_era5_2020
Total number of NAN: (0,), along (0,) time indicies
Checking nan pattern of variable: t2m  for DataSet: perdigao_era5_2020
Total number of NAN: (0,), along (0,) time indicies
Checking nan pattern of variable: i10fg  for DataSet: perdigao_era5_2020
Total number of NAN: (0,), along (0,) time indicies
NAN pattern along dimension: time, is CONSISTENT for all other coords, with absolute_height excluded
Checking nan pattern of variable: std  for DataSet: perdigao_low_res_1H_2020
Total number of

### Example for loading data

In [11]:
output_folder = "../data/preprocessed/"
file_format = "np_gan_standard"
xy_keyword_dict = {"x":"low", "y":"high"}
data_xy = get_data_xy_from_h5(output_folder, file_format, xy_keyword_dict, exclude_list = ["stddev", "mean"])

Data in file ../data/preprocessed/np_gan_standard.h5 are: 
 ['np_perdigao_high_res_1H_2020_mean', 'np_perdigao_high_res_1H_2020_std', 'np_perdigao_high_res_1H_2020_stddev', 'np_perdigao_low_res_1H_2020_mean', 'np_perdigao_low_res_1H_2020_std', 'np_perdigao_low_res_1H_2020_stddev']
Examining data np_perdigao_high_res_1H_2020_mean
Examining data np_perdigao_high_res_1H_2020_std
Examining data np_perdigao_high_res_1H_2020_stddev
Examining data np_perdigao_low_res_1H_2020_mean
Examining data np_perdigao_low_res_1H_2020_std
Loading data np_perdigao_low_res_1H_2020_std from file ../data/preprocessed/np_gan_standard.h5
Data in file ../data/preprocessed/np_gan_standard.h5 are: 
 ['np_perdigao_high_res_1H_2020_mean', 'np_perdigao_high_res_1H_2020_std', 'np_perdigao_high_res_1H_2020_stddev', 'np_perdigao_low_res_1H_2020_mean', 'np_perdigao_low_res_1H_2020_std', 'np_perdigao_low_res_1H_2020_stddev']
Examining data np_perdigao_high_res_1H_2020_mean
Examining data np_perdigao_high_res_1H_2020_std
L

### Key parameters
The parameters for GAN are explained below:
1. For training, ``` parameters["train"] ```:
    - ``` "learning_rate_g" ```: the learning rate for the generator
    - ``` "learning_rate_d" ```: the learning rate for the discriminator
    - ``` "beta_1", "beta_2", "epsilon", "batch_size"```: the generic parameters for improvements
    - ``` "n_epochs_pretrain" ```: number of epochs for pretraining
    - ``` "n_epochs_GAN" ```: number of epochs for full GAN
2. For data IO and manipulation, ``` parameters["data"] ```:
    - ``` "output_folder" ```: the output folder of the data preprocess, used as input for loading data
    - ``` "file_format" ```: the file name format used for standardrized data file h5
    - ``` "xy_keyword_dict" ```: the data name key used for detecting and catogorizing x and y from h5 standardrized datasets 
    - ``` "xy_exclude_list" ```: list of dataset type to be skipped if appeared in the data set name, like stddev, mean if we only need standardized data.

In [13]:
parameters = dict()

parameters["train"] = {"learning_rate_g": 1e-4, 
                       "learning_rate_d": 1e-4,
                       "beta_1": 0.9,
                       "beta_2": 0.999,
                       "epsilon": 1e-08,
                       "batch_size": 128,
                       "alpha_advers": 1e-3,
                       "n_epochs_pretrain":1, 
                       "n_epochs_GAN":1}
parameters["data"] = {'output_folder': "../data/preprocessed/",
                      'file_format': "np_gan_standard",
                      'xy_keyword_dict': {"x":"low", "y":"high"},
                      'xy_exclude_list': ["stddev", "mean"]}

### Step by step running example

In [14]:
#create a GAN model class with doing_pretrain = True 
model = AWWSM4_SR_GAN(parameters=parameters, is_GAN=True, doing_pretrain=True) 
# load and split data based on the parameters["data"]
model.load_data()
model.split_data()
# pretrain and save the model
model.pretrain() #use default epoch value = 20
model.save_gen_model("./temp_v0_gen.h5")
# OPTIONAL: create another model which loaded the pretrained weights and can continue doing pretrain
model2 = AWWSM4_SR_GAN(parameters=parameters, is_GAN=True, doing_pretrain=True)
model2.load_gen_model("./temp_v0_gen.h5")
model2.load_data()
model2.split_data()
# continue to work on GAN
model.reset_working_mode(doing_pretrain=False)
model.train_GAN(epochs=10)

Model: "model_4"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_5 (InputLayer)           [(None, 96, 96, 2)]  0           []                               
                                                                                                  
 conv2d_transpose_72 (Conv2DTra  (None, 96, 96, 64)  1216        ['input_5[0][0]']                
 nspose)                                                                                          
                                                                                                  
 activation_36 (Activation)     (None, 96, 96, 64)   0           ['conv2d_transpose_72[0][0]']    
                                                                                                  
 conv2d_transpose_73 (Conv2DTra  (None, 96, 96, 64)  36928       ['activation_36[0][0]']    

2022-08-25 08:12:33.231599: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8302


Epoch generator training loss = 0.494695, val loss = 0.107219
Epoch took 28.75 seconds

Epoch: 2
Epoch generator training loss = 0.085219, val loss = 0.071352
Epoch took 18.50 seconds

Epoch: 3
Epoch generator training loss = 0.062746, val loss = 0.056159
Epoch took 18.51 seconds

Epoch: 4
Epoch generator training loss = 0.050780, val loss = 0.046858
Epoch took 18.52 seconds

Epoch: 5
Epoch generator training loss = 0.043160, val loss = 0.040646
Epoch took 18.50 seconds

Epoch: 6
Epoch generator training loss = 0.038076, val loss = 0.036467
Epoch took 18.55 seconds

Epoch: 7
Epoch generator training loss = 0.034599, val loss = 0.033492
Epoch took 18.50 seconds

Epoch: 8
Epoch generator training loss = 0.032145, val loss = 0.031380
Epoch took 18.51 seconds

Epoch: 9
Epoch generator training loss = 0.030259, val loss = 0.029816
Epoch took 18.52 seconds

Epoch: 10
Epoch generator training loss = 0.028777, val loss = 0.028529
Epoch took 18.51 seconds

Epoch: 11
Epoch generator training los

AttributeError: 'AWWSM4_SR_GAN' object has no attribute 'set_work_mode'

In [15]:
model.reset_working_mode(doing_pretrain=False)
model.train_GAN(epochs=10)

Reset the working mode!
Working mode before reset:
Current working mode is:
	 auto_run: False
	 is_GAN: True
	 doing_pretrain: True
	 loading_pretrain: False
Working mode after reset:
Current working mode is:
	 auto_run: False
	 is_GAN: True
	 doing_pretrain: False
	 loading_pretrain: True
Training network ...
Epoch: 1
Using d_loss_ideal_range:[0.45, 0.65], g_reflect_max: 20, d_reflect_max: 20
Using d_loss_ideal_range:[0.45, 0.65], g_reflect_max: 20, d_reflect_max: 20
Using d_loss_ideal_range:[0.45, 0.65], g_reflect_max: 20, d_reflect_max: 20
Epoch generator loss = 0.023027, discriminator loss = 0.519160, g_count = 268, d_count = 58
Epoch val: g_loss = 0.021964, d_loss = 0.442368, content_loss = 0.021083, advers_loss = 0.880422
Epoch took 471.49 seconds

Epoch: 2
Epoch generator loss = 0.020886, discriminator loss = 0.431552, g_count = 625, d_count = 40
Epoch val: g_loss = 0.020655, d_loss = 0.390905, content_loss = 0.019417, advers_loss = 1.238017
Epoch took 217.97 seconds

Epoch: 3
E

(<keras.engine.functional.Functional at 0x7fc6dc5a1910>,
 <keras.engine.functional.Functional at 0x7fc6dc55c220>)