# Train the StarNet Model

This notebook takes you through the steps of how to train a StarNet Model
- Required Python packages: `numpy h5py keras sklearn`
- Required data files: training_data.h5, mean_and_std.npy

Note: We use tensorflow for the keras backend. For error propagation, the model must be trained with Theano as the backend, although this has been found to be much slower without the proper set-up of Theano

In [None]:
import numpy as np
import h5py

from keras.models import Sequential
from keras.layers import Dense, InputLayer, Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import StratifiedKFold

datadir = '/home/ubuntu/starnet_data/'

**Load data for normalizing labels**

In [None]:
mean_and_std = np.load(datadir + 'mean_and_std.npy')
mean_labels = mean_and_std[0]
std_labels = mean_and_std[1]
num_labels = mean_and_std.shape[1]

** Normalize labels**

Define function to normalize labels to approximately have a mean of zero and unit variance
NOTE: this is necessary to put output labels on a similar scale in order for the model to train properly, this process is reversed in the test stage to give the output labels their proper units

In [None]:
def normalize(lb):
    return (lb-mean_labels) / std_labels

** Obtain reference set**

In [None]:
training_set = 'training_data.h5'

In [None]:
with h5py.File(datadir + training_set, 'r') as F:
    spectra = F['spectra'][:]
    labels = np.column_stack((F['TEFF'][:],F['LOGG'][:],F['FE_H'][:]))
    labels = normalize(labels)
print('Reference set includes ' + str(len(spectra)) + ' individual visit spectra.')

In [None]:
# define the number of wavelength bins, i.e. flux values
num_flux = spectra.shape[1]
print('Each spectra contains ' + str(num_flux) + ' wavelength bins')

**Randomize order and separate into training and cross-validation sets**

We shuffle around the data to avoid local effects. The data is then split the reference set into training and cross-validation sets

In [None]:
# number of spectra used in training
num_train = 41000
# SF: use the whole set to do k-fold validation
# num_train = len(spectra)

reference_data = np.column_stack((spectra, labels))
np.random.shuffle(reference_data)

train_spectra = reference_data[0:num_train, 0:num_flux]
train_spectra = train_spectra.reshape(train_spectra.shape[0], train_spectra.shape[1], 1)
train_labels = reference_data[0:num_train, num_flux:]

cv_spectra = reference_data[num_train:, 0:num_flux]
cv_spectra = cv_spectra.reshape(cv_spectra.shape[0], cv_spectra.shape[1], 1)
cv_labels = reference_data[num_train:, num_flux:]

print('Training set includes ' + str(len(train_spectra)) + ' spectra and the cross-validation set includes ' + str(len(cv_spectra))+' spectra')

**Build the StarNet model architecture**

The StarNet architecture is built with:
- input layer
- 2 convolutional layers
- 1 maxpooling layer followed by flattening for the fully connected layer
- 2 fully connected layers
- output layer

First, let's define some model variables.

In [None]:
# activation function used following every layer except for the output layers
activation = 'relu'

# model weight initializer
initializer = 'he_normal'

# shape of input spectra that is fed into the input layer
input_shape = (None, num_flux, 1)

# number of filters used in the convolutional layers
num_filters = [4,16]

# length of the filters in the convolutional layers
filter_length = 8

# length of the maxpooling window 
pool_length = 4

# number of nodes in each of the hidden fully connected layers
num_hidden = [256,128]

# number of spectra fed into model at once during training
batch_size = 64

# maximum number of interations for model training
max_epochs = 30

# initial learning rate for optimization algorithm
lr = 0.0007
    
# exponential decay rate for the 1st moment estimates for optimization algorithm
beta_1 = 0.9

# exponential decay rate for the 2nd moment estimates for optimization algorithm
beta_2 = 0.999

# a small constant for numerical stability for optimization algorithm
optimizer_epsilon = 1e-08

In [None]:
model = Sequential([
    InputLayer(batch_input_shape=input_shape),
    Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[0], kernel_size=filter_length),
    Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[1], kernel_size=filter_length),
    MaxPooling1D(pool_size=pool_length),
    Flatten(),
    Dense(units=num_hidden[0], kernel_initializer=initializer, activation=activation),
    Dense(units=num_hidden[1], kernel_initializer=initializer, activation=activation),
    Dense(units=num_labels, activation="linear", input_dim=num_hidden[1]),
])

**More model techniques**
* The `Adam` optimizer is the gradient descent algorithm used for minimizing the loss function
* `EarlyStopping` uses the cross-validation set to test the model following every iteration and stops the training if the cv loss does not decrease by `min_delta` after `patience` iterations
* `ReduceLROnPlateau` is a form of learning rate decay where the learning rate is decreased by a factor of `factor` if the training loss does not decrease by `epsilon` after `patience` iterations unless the learning rate has reached `min_lr`

In [None]:
# Default loss function parameters
early_stopping_min_delta = 0.0001
early_stopping_patience = 4
reduce_lr_factor = 0.5
reuce_lr_epsilon = 0.0009
reduce_lr_patience = 2
reduce_lr_min = 0.00008

# loss function to minimize
loss_function = 'mean_squared_error'

# compute accuracy and mean absolute deviation
metrics = ['accuracy', 'mae']

In [None]:
optimizer = Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=optimizer_epsilon, decay=0.0)

early_stopping = EarlyStopping(monitor='val_loss', min_delta=early_stopping_min_delta, 
                                       patience=early_stopping_patience, verbose=2, mode='min')

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5, epsilon=reuce_lr_epsilon, 
                                  patience=reduce_lr_patience, min_lr=reduce_lr_min, mode='min', verbose=2)

**Compile model**

In [None]:
model.compile(optimizer=optimizer, loss=loss_function, metrics=metrics)

**Train model**

Apply k-folding (with 3-fold) cross validation test harness.

In [None]:
# fix random seed for reproducibility
#seed = 7
#np.random.seed(seed)

# this stuff is still a bit buggy
#kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=seed)
#cv_scores = []
#for train, test in kfold.split(train_spectra, train_labels):
#    model.fit(train_spectra[train], train_labels[train], epochs=max_epochs, 
#              batch_size=batch_size, callbacks=[reduce_lr,early_stopping],
#              verbose=2)
#    scores = model.evaluate(train_spectra[test], train_labels[test], verbose=2)
#    print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
#    cv_scores.append(scores)

In [None]:
model.fit(train_spectra, train_labels, validation_data=(cv_spectra, cv_labels),
          epochs=max_epochs, batch_size=batch_size, verbose=2,
          callbacks=[reduce_lr,early_stopping])

**Save model**

In [None]:
starnet_model = 'starnet_cnn.h5'
model.save(datadir + starnet_model)