tgb - 12/20/2021- The goal is to directly derive the climate-invariant dataset using the custom generator to avoid inconsistencies in the formulation of the relative humidity, plume buoyancy, and scaled latent heat flux rescalings. This dataset can then be used for the causal discovery project led by Nando Iglesias. 

# Imports

In [1]:
from cbrain.climate_invariant import *
from cbrain.climate_invariant_utils import *

import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU') 
tf.config.experimental.set_memory_growth(physical_devices[0], True)
tf.config.experimental.set_memory_growth(physical_devices[1], True)
tf.config.experimental.set_memory_growth(physical_devices[2], True)
import os
os.environ["CUDA_VISIBLE_DEVICES"]="2"

/nfspool-0/home/tbeucler/CBRAIN-CAM/notebooks/tbeucler_devlog


In [2]:
path_data = '/DFS-L/DATA/pritchard/tbeucler/SPCAM/SPCAM_PHYS/'

# Define data generators

## Below is how we would build a standard or "brute-force" data generator `train_gen_BF`.

1. We would first specify the input variables `in_vars`, the output variables `out_vars`, and the path of the training set `path_train`. 

In [3]:
#in_vars = ['QBP','TBP','PS','SOLIN','SHFLX','LHFLX'] # We take the large-scale climate state as inputs
in_vars = ['QBP','TBP','PS','SOLIN','SHFLX','LHFLX']
out_vars = ['PHQ','TPHYSTND','FSNT','FSNS','FLNT','FLNS', 'PRECT'] # and we output the response of clouds/storms to these climate conditions
#path_train = path_data + 'Aqua_0K_withVBP/2021_09_02_TRAIN_For_Nando.nc'
path_train = path_data + '2022_01_10_TRAIN_For_Nando_t-dt.nc'

2. To make sure all outputs have the same units (in our case W/m$^2 $), we multiply the raw outputs by the right physical constants, stored in a dictionary called `scale_dict`. 

In [7]:
import pickle
scale_dict = pickle.load(open(path_data+'CIML_Zenodo/009_Wm2_scaling.pkl','rb'))

3. We scale the inputs to [-1,1] by subtracting their mean before dividing them by their range. The means and ranges are stored in a normalization file stored in `path_input_norm`. 

In [5]:
path_input_norm = path_data + '2022_01_10_Norm_Outputs_t-dt.nc'

4. We are now ready to build our first data generator!

In [6]:
N_batch = 8192

In [7]:
train_gen_BF = DataGeneratorCI(
    data_fn = path_train,
    input_vars = in_vars,
    output_vars = out_vars,
    norm_fn = path_input_norm,
    input_transform = ('mean', 'maxrs'),
    output_transform = scale_dict,
    shuffle = False,
    batch_size=N_batch
)

In [8]:
train_gen_BF[50][0].shape

(8192, 64)

In [9]:
train_gen_BF[50][1].shape

(8192, 65)

## Now, we would like to build a "climate-invariant" data generator `train_gen_CI`, which requires a few more steps

### First, we have to create one standard generator per input rescaling. This will help us renormalize the inputs to [-1,1] every time we feed them to the neural network. 

1. First, let's define the path to the three normalization files for the three input rescalings:
Relative humidity `RH`, plume buoyancy `BMSE`, and normalized latent heat flux `LHF_nsDELQ` 

In [10]:
path_norm_RH = path_data + '2021_02_01_NORM_O3_RH_small.nc'
path_norm_BMSE = path_data + '2021_06_16_NORM_BMSE_small.nc'
path_norm_LHF_nsDELQ = path_data + '2021_02_01_NORM_O3_LHF_nsDELQ_small.nc'

2. We can now define one data generator per input rescaling

In [11]:
def train_gen_rescaling(input_rescaling):
    return DataGeneratorCI(
        data_fn = path_train,
        input_vars = input_rescaling,
        output_vars = out_vars,
        norm_fn = path_input_norm,
        input_transform = ('mean', 'maxrs'),
        output_transform = scale_dict)

In [12]:
train_gen_RH = train_gen_rescaling(in_vars)
train_gen_BMSE = train_gen_rescaling(in_vars)
train_gen_LHF_nsDELQ = train_gen_rescaling(in_vars)

### Then, the normalization factors of these generators can be combined to form a "climate-invariant" data generator `train_gen_CI`

In [13]:
train_gen_CI = DataGeneratorCI(
    data_fn = path_train,
    input_vars = in_vars,
    output_vars = out_vars,
    norm_fn = path_input_norm,
    input_transform = ('mean','maxrs'),
    output_transform = scale_dict,
    shuffle = False,
    batch_size=N_batch,
    Qscaling = 'RH',
    Tscaling = 'BMSE',
    LHFscaling = 'LHF_nsDELQ',
    hyam=hyam, hybm=hybm, # Arrays to define mid-levels of hybrid vertical coordinate
    inp_sub_Qscaling=train_gen_RH.input_transform.sub, # What to subtract from RH inputs
    inp_div_Qscaling=train_gen_RH.input_transform.div, # What to divide RH inputs by
    inp_sub_Tscaling=train_gen_BMSE.input_transform.sub,
    inp_div_Tscaling=train_gen_BMSE.input_transform.div,
    inp_sub_LHFscaling=train_gen_LHF_nsDELQ.input_transform.sub,
    inp_div_LHFscaling=train_gen_LHF_nsDELQ.input_transform.div
)

# Regenerate the scaled dataset

## Create new training file

In [14]:
#path_train = path_data + 'Aqua_0K_withVBP/2021_09_02_TRAIN_For_Nando.nc'
path_train = path_data + '2022_01_10_TRAIN_For_Nando_t-dt.nc'

In [15]:
train_raw = xr.open_dataset(path_train)

### var_names

In [16]:
train_raw['var_names'].values

array(['QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP',
       'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP',
       'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP', 'QBP',
       'QBP', 'QBP', 'QBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP',
       'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP',
       'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP',
       'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'TBP', 'VBP', 'VBP', 'VBP',
       'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP',
       'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP',
       'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP', 'VBP',
       'PS', 'SOLIN', 'SHFLX', 'LHFLX', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ',
       'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ',
       'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ',
       'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'TPHYSTND',
       'TPHYSTND'

In [17]:
train_raw_CI = train_raw.copy()

In [18]:
var_names_CI = train_raw_CI['var_names'].values
for i in range(60):
    if i<30: var_names_CI[i] = 'RH'
    else: var_names_CI[i] = 'BMSE'

In [19]:
var_names_CI[93] = 'LHF_nsDELQ'

In [20]:
train_raw_CI.assign_coords({'var_names':var_names_CI})

### vars

In [21]:
train_gen_CI.output_transform.scale.shape

(65,)

In [22]:
train_gen_CI[0][0].shape

(8192, 64)

In [23]:
train_raw_CI['vars'][0].values.shape

(224,)

In [24]:
train_gen_BF_0_pu = (train_gen_BF[0][0]*train_gen_BF.input_transform.div+train_gen_BF.input_transform.sub)

In [25]:
train_gen_BF_0_pu[1000][:30]

array([1.7381838e-06, 1.6579336e-06, 1.4306893e-06, 1.4215012e-06,
       1.2831629e-06, 1.2795708e-06, 1.2820553e-06, 1.2995041e-06,
       1.3052462e-06, 1.3062929e-06, 1.3133019e-06, 1.4116081e-06,
       2.0213665e-06, 3.4514924e-06, 6.4445849e-06, 1.3533645e-05,
       2.5911606e-05, 5.6728924e-05, 1.2745283e-04, 2.6673404e-04,
       4.5801210e-04, 9.5409341e-04, 1.2293411e-03, 1.8398142e-03,
       2.1690321e-03, 2.4146475e-03, 2.6721214e-03, 2.8284965e-03,
       2.8885789e-03, 3.1328364e-03], dtype=float32)

In [26]:
train_raw_CI['vars'].shape

(47177728, 224)

In [27]:
train_raw_CI['vars'][1000][:30].values

array([1.7381838e-06, 1.6579336e-06, 1.4306893e-06, 1.4215012e-06,
       1.2831629e-06, 1.2795708e-06, 1.2820553e-06, 1.2995041e-06,
       1.3052462e-06, 1.3062929e-06, 1.3133019e-06, 1.4116080e-06,
       2.0213670e-06, 3.4514912e-06, 6.4445876e-06, 1.3533647e-05,
       2.5911615e-05, 5.6728903e-05, 1.2745282e-04, 2.6673407e-04,
       4.5801213e-04, 9.5409347e-04, 1.2293413e-03, 1.8398144e-03,
       2.1690319e-03, 2.4146475e-03, 2.6721214e-03, 2.8284967e-03,
       2.8885792e-03, 3.1328364e-03], dtype=float32)

In [28]:
train_gen_CI_0_pu = (train_gen_CI[0][0]*train_gen_CI.input_transform.div+train_gen_CI.input_transform.sub)

In [29]:
train_gen_CI_0_pu[1000][:30]

array([1.9735626e-05, 1.3552811e-04, 1.5667443e-03, 2.3940048e-04,
       7.2050310e-04, 1.4135052e-03, 2.1991380e-03, 2.9280463e-03,
       4.2576068e-03, 5.6687603e-03, 7.3259524e-03, 1.0606945e-02,
       2.1620281e-02, 4.9715336e-02, 1.1197579e-01, 2.3079984e-01,
       3.2419598e-01, 4.1257945e-01, 4.6153700e-01, 4.8926058e-01,
       5.1830405e-01, 7.3986965e-01, 6.8840361e-01, 8.7931424e-01,
       9.3197358e-01, 9.1838229e-01, 9.1030759e-01, 8.6289239e-01,
       8.0696636e-01, 8.0320734e-01], dtype=float32)

In [30]:
new_values = np.zeros(train_raw_CI['vars'].shape)

In [31]:
new_values.shape

(47177728, 224)

In [32]:
train_raw['var_names'].values[94:]

array(['PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ',
       'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ',
       'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ', 'PHQ',
       'PHQ', 'PHQ', 'PHQ', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND',
       'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND',
       'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND',
       'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND',
       'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND',
       'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND', 'TPHYSTND',
       'TPHYSTND', 'TPHYSTND', 'FSNT', 'FSNS', 'FLNT', 'FLNS', 'PRECT',
       'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt',
       'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt',
       'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt',
       'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt', 'PHQt-dt',
       'PHQt-dt', 'PHQt-dt'

In [33]:
for ibatch in range((train_gen_CI.n_samples)//N_batch):
    if ibatch % 10==0: print('progress=','%2.2f' % (100*ibatch/((train_gen_CI.n_samples)//N_batch)),
                              '%','               ',end='\r')
    train_gen_CI_pu = (train_gen_CI[ibatch][0]*train_gen_CI.input_transform.div+\
                       train_gen_CI.input_transform.sub)
    new_values[ibatch*N_batch:((1+ibatch)*N_batch),:] = np.concatenate(
        (train_gen_CI_pu[:,:60],
        train_raw_CI['vars'][ibatch*N_batch:((1+ibatch)*N_batch),60:90],
        train_gen_CI_pu[:,60:],
        train_raw_CI['vars'][ibatch*N_batch:((1+ibatch)*N_batch),94:]),
        axis=1
    )

progress= 99.84 %                

In [34]:
new_values.shape

(47177728, 224)

In [35]:
train_raw_CI['vars'].values = new_values

### Save new training dataset

In [36]:
path_save_dir = '/DFS-L/DATA/pritchard/tbeucler/SPCAM/SPCAM_PHYS/'

In [37]:
train_raw_CI.to_netcdf(path_save_dir+'2022_01_13_TRAIN_For_Nando_CI_t-dt.nc',mode='w')

In [38]:
train_raw_CI['var_names'][90:95].values

array(['PS', 'SOLIN', 'SHFLX', 'LHF_nsDELQ', 'PHQ'], dtype=object)

## Create new normalization file

In [39]:
norm_RH_dataset = xr.open_dataset(path_norm_RH)
norm_BMSE_dataset = xr.open_dataset(path_norm_BMSE)
norm_LHF_nsDELQ_dataset = xr.open_dataset(path_norm_LHF_nsDELQ) 

In [40]:
norm_dataset = xr.open_dataset(path_input_norm)

In [41]:
new_norm_dataset = norm_dataset.copy()

### Coordinates

In [42]:
var_names_full = norm_dataset['var_names'].values
var_names_full_single = norm_dataset['var_names_single'].values

In [43]:
for i in range(30): var_names_full = np.append(var_names_full,'RH')
for i in range(30): var_names_full = np.append(var_names_full,'BMSE')
var_names_full = np.append(var_names_full,'LHF_nsDELQ')

var_names_full_single = np.append(var_names_full_single,'RH')
var_names_full_single = np.append(var_names_full_single,'BMSE')
var_names_full_single = np.append(var_names_full_single,'LHF_nsDELQ')

In [44]:
var_names_full.shape

(648,)

In [45]:
var_names_full_single.shape

(39,)

In [46]:
new_coor = {}
new_coor['var_names'] = var_names_full
new_coor['var_names_single'] = var_names_full_single

### Data

#### Full profiles

In [47]:
KEY = ['mean','std','min','max']

In [48]:
norm_data = {}

In [49]:
for key in KEY:
    norm_data[key] = norm_dataset[key].values

In [50]:
norm_RH_dataset['var_names'][:30].values

array(['RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH',
       'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH',
       'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH', 'RH'], dtype=object)

In [51]:
for key in KEY:
    norm_data[key] = np.append(norm_data[key],norm_RH_dataset[key][:30].values)

In [52]:
norm_BMSE_dataset['var_names'][30:60].values

array(['BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE'], dtype=object)

In [53]:
for key in KEY:
    norm_data[key] = np.append(norm_data[key],norm_BMSE_dataset[key][30:60].values)

In [54]:
norm_LHF_nsDELQ_dataset['var_names'][93].values

array('LHF_nsDELQ', dtype='<U10')

In [55]:
for key in KEY:
    norm_data[key] = np.append(norm_data[key],norm_LHF_nsDELQ_dataset[key][93].values)

#### One std per variable

In [56]:
key0 = 'std_by_var'

In [57]:
norm_data[key0] = norm_dataset[key0].values

In [58]:
norm_dataset[key0].values.shape

(36,)

In [59]:
norm_RH_dataset['var_names_single'][0].values

array('RH', dtype='<U2')

In [60]:
norm_data[key0] = np.append(norm_data[key0],
                            norm_RH_dataset[key0][0].values)

In [61]:
norm_BMSE_dataset['var_names_single'][1].values

array('BMSE', dtype='<U4')

In [62]:
norm_data[key0] = np.append(norm_data[key0],
                            norm_BMSE_dataset[key0][1].values)

In [63]:
norm_LHF_nsDELQ_dataset['var_names_single'][6].values

array('LHF_nsDELQ', dtype='<U10')

In [64]:
norm_data[key0] = np.append(norm_data[key0],
                            norm_LHF_nsDELQ_dataset[key0][6].values)

In [65]:
for key in norm_data.keys():
    print(key+str(norm_data[key].shape))

mean(648,)
std(648,)
min(648,)
max(648,)
std_by_var(39,)


In [66]:
norm_data_dict = {}

In [67]:
for key in KEY:
    norm_data_dict[key] = (['var_names'],norm_data[key])
norm_data_dict[key0] = (['var_names_single'],norm_data[key0])

### Combine coordinates and data into a new xarray dataset

In [68]:
new_norm = xr.Dataset(
    data_vars = norm_data_dict,
    coords = new_coor
)

### Check that new normalization file was created correctly

#### Full profiles

In [69]:
new_norm['var_names'][-31:-1].values

array(['BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE'], dtype=object)

In [70]:
new_norm['min'][-31:-1].values

array([-1.31116964e+01, -1.10533083e+01, -9.24844476e+00, -7.55641638e+00,
       -6.33420238e+00, -5.38641029e+00, -4.73032847e+00, -4.26705961e+00,
       -3.90010160e+00, -3.51368711e+00, -3.11597357e+00, -2.67607478e+00,
       -2.22679080e+00, -1.75694275e+00, -1.44067605e+00, -1.30229558e+00,
       -1.06961250e+00, -9.08340034e-01, -7.54189932e-01, -5.73731456e-01,
       -4.37089447e-01, -3.42989613e-01, -2.62232670e-01, -2.35843909e-01,
       -2.21079333e-01, -2.12816024e-01, -1.93379778e-01, -1.65845811e-01,
       -1.37980631e-01, -4.34919917e-04])

In [71]:
norm_BMSE_dataset['var_names'][30:60].values

array(['BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE',
       'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE', 'BMSE'], dtype=object)

In [72]:
norm_BMSE_dataset['min'][30:60].values

array([-1.31116964e+01, -1.10533083e+01, -9.24844476e+00, -7.55641638e+00,
       -6.33420238e+00, -5.38641029e+00, -4.73032847e+00, -4.26705961e+00,
       -3.90010160e+00, -3.51368711e+00, -3.11597357e+00, -2.67607478e+00,
       -2.22679080e+00, -1.75694275e+00, -1.44067605e+00, -1.30229558e+00,
       -1.06961250e+00, -9.08340034e-01, -7.54189932e-01, -5.73731456e-01,
       -4.37089447e-01, -3.42989613e-01, -2.62232670e-01, -2.35843909e-01,
       -2.21079333e-01, -2.12816024e-01, -1.93379778e-01, -1.65845811e-01,
       -1.37980631e-01, -4.34919917e-04])

#### One std per variable

In [73]:
new_norm['var_names_single'][-3].values

array('RH', dtype='<U2')

In [74]:
new_norm['std_by_var'][-3].values

array(0.39380189)

In [75]:
norm_RH_dataset['var_names_single'][0].values

array('RH', dtype='<U2')

In [76]:
norm_RH_dataset['std_by_var'][0].values

array(0.39380189)

## Save new norm file

In [77]:
new_norm

In [78]:
norm_dataset

In [79]:
new_norm.to_netcdf(path_save_dir+'2022_01_13_NORM_For_Nando_CI_t-dt.nc',mode='w')

# Check that training is now stable

## Climate invariant, without outputs [t-dt]

In [None]:
path_save_dir = '/DFS-L/DATA/pritchard/tbeucler/SPCAM/SPCAM_PHYS/Aqua_0K_ClimInv_withVBP/'

In [None]:
path_train = '2021_12_22_TRAIN_For_Nando_CI.nc'
path_newnorm = '2021_12_22_NORM_For_Nando_CI.nc'

In [None]:
test_train = xr.open_dataset(path_save_dir+path_train)

In [None]:
test_train['var_names'][90:95]

In [None]:
in_vars = ['RH','BMSE','PS', 'SOLIN', 'SHFLX', 'LHF_nsDELQ']
out_vars = ['PHQ','TPHYSTND','FSNT','FSNS','FLNT','FLNS','PRECT']

In [None]:
train_gen_Nando = DataGeneratorCI(
    data_fn = path_save_dir+path_train,
    input_vars = in_vars,
    output_vars = out_vars,
    norm_fn = path_save_dir+path_newnorm,
    input_transform = ('mean', 'maxrs'),
    output_transform = scale_dict,
    batch_size=N_batch
)

In [None]:
train_gen_Nando[0][0]

In [None]:
inp = Input(shape=(64,)) ## input after rh and tns transformation
densout = Dense(128, activation='linear')(inp)
densout = LeakyReLU(alpha=0.3)(densout)
for i in range (6):
    densout = Dense(128, activation='linear')(densout)
    densout = LeakyReLU(alpha=0.3)(densout)
dense_out = Dense(65, activation='linear')(densout)
model = tf.keras.models.Model(inp, dense_out)

In [None]:
model.summary()

In [None]:
model.compile(tf.keras.optimizers.Adam(), loss=mse)

In [None]:
# Where to save the model
path_HDF5 = '/DFS-L/DATA/pritchard/tbeucler/SPCAM/HDF5_DATA/'
save_name = '2021_12_22_Test_Nando'

In [None]:
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
mcp_save_pos = ModelCheckpoint(path_HDF5+save_name+'.hdf5',save_best_only=True, monitor='val_loss', mode='min')

In [None]:
Nep = 20
model.fit_generator(train_gen_Nando, epochs=Nep, validation_data=train_gen_Nando,\
                    callbacks=[earlyStopping, mcp_save_pos])

## Climate invariant, with tendencies [t-dt]

In [3]:
path_save_dir = '/DFS-L/DATA/pritchard/tbeucler/SPCAM/SPCAM_PHYS/'

In [20]:
path_train = '2022_01_13_TRAIN_For_Nando_CI_t-dt.nc'
path_valid = '2022_01_13_VALID_For_Nando_CI_t-dt.nc'
path_newnorm = '2022_01_13_NORM_For_Nando_CI_t-dt.nc'

In [21]:
in_vars = ['RH','BMSE','PS', 'SOLIN', 'SHFLX', 'LHF_nsDELQ',
          'PHQt-dt','TPHYSTNDt-dt','FSNTt-dt','FSNSt-dt',
           'FLNTt-dt','FLNSt-dt','PRECTt-dt']
out_vars = ['PHQ','TPHYSTND','FSNT','FSNS','FLNT','FLNS','PRECT']

In [22]:
N_batch = 8192

In [23]:
train_gen_Nando = DataGeneratorCI(
    data_fn = path_save_dir+path_train,
    input_vars = in_vars,
    output_vars = out_vars,
    norm_fn = path_save_dir+path_newnorm,
    input_transform = ('mean', 'maxrs'),
    output_transform = scale_dict,
    batch_size=N_batch
)

In [24]:
valid_gen_Nando = DataGeneratorCI(
    data_fn = path_save_dir+path_valid,
    input_vars = in_vars,
    output_vars = out_vars,
    norm_fn = path_save_dir+path_newnorm,
    input_transform = ('mean', 'maxrs'),
    output_transform = scale_dict,
    batch_size=N_batch
)

In [13]:
train_gen_Nando[0][0].shape

(8192, 129)

In [12]:
train_gen_Nando[0][1]

array([[ 0.00000000e+00,  0.00000000e+00, -4.36275468e-06, ...,
         1.68086456e+02,  1.49382973e+01,  9.23653030e+01],
       [ 0.00000000e+00,  0.00000000e+00, -3.89850584e-06, ...,
         1.67319870e+02,  1.46057901e+01,  9.23800430e+01],
       [ 0.00000000e+00,  0.00000000e+00, -3.86164402e-06, ...,
         1.66681015e+02,  1.45049973e+01,  8.66826935e+01],
       ...,
       [ 0.00000000e+00,  0.00000000e+00, -9.70067777e-05, ...,
         2.12936111e+02,  6.35897522e+01,  1.22346325e+01],
       [ 0.00000000e+00,  0.00000000e+00, -9.47348162e-05, ...,
         2.18299789e+02,  9.45541534e+01,  6.73200469e-04],
       [ 0.00000000e+00,  0.00000000e+00, -9.27014407e-05, ...,
         2.19012955e+02,  1.04582176e+02,  0.00000000e+00]], dtype=float32)

In [14]:
inp = Input(shape=(129,)) ## input after rh and tns transformation
densout = Dense(128, activation='linear')(inp)
densout = LeakyReLU(alpha=0.3)(densout)
for i in range (6):
    densout = Dense(128, activation='linear')(densout)
    densout = LeakyReLU(alpha=0.3)(densout)
dense_out = Dense(65, activation='linear')(densout)
model = tf.keras.models.Model(inp, dense_out)

In [15]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 129)]             0         
_________________________________________________________________
dense (Dense)                (None, 128)               16640     
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               16512     
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               16512     
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 128)               0     

In [16]:
model.compile(tf.keras.optimizers.Adam(), loss=mse)

In [17]:
# Where to save the model
path_HDF5 = '/DFS-L/DATA/pritchard/tbeucler/SPCAM/HDF5_DATA/'
save_name = '2022_01_14_Test_Nando_CI_t-dt'

In [25]:
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
mcp_save_pos = ModelCheckpoint(path_HDF5+save_name+'.hdf5',save_best_only=True, monitor='val_loss', mode='min')

In [19]:
Nep = 20
model.fit_generator(train_gen_Nando, epochs=Nep, validation_data=train_gen_Nando,\
                    callbacks=[earlyStopping, mcp_save_pos])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7fc4a02d3550>

In [None]:
Nep = 20
model.fit_generator(train_gen_Nando, epochs=Nep, validation_data=valid_gen_Nando,\
                    callbacks=[earlyStopping, mcp_save_pos])

Epoch 1/20
1243/5759 [=====>........................] - ETA: 4:10 - loss: 402.7415