# Train Model

This notebook takes you through the steps of how to train a StarNet Model.

IMPORTANT: If you do not have access to a sufficient computing power (ie. Virtual Machines, GPU, etc.) use this notebook instead of $4\_Train\_Model\_Keras\_1.ipynb$. The only difference between the two is that this notebook will use fewer training examples in attempt to decrease the computational load and will require more training iterations to compensate for this. It should be noted that using fewer training examples will likely result in less precise predictions during the test phase.

## required packages:
- numpy
- h5py
- random
- Keras 1
  - for Keras 2 users, see $3\_Train\_Model\_Keras\_2.ipynb$
- Tensorflow or Theano (backend for Keras)

NOTE: for error propagation, the model must be trained with Theano as the backend, although this has been found to be much slower without the proper set-up of Theano

## required data files:
- training_data.h5
  - can be created in $2\_Preprocessing\_of\_Training\_Data.ipynb$ or downloaded  in $1\_Download\_Data.ipynb$
- mean_and_std.npy
  - can be created in $3\_Preprocessing\_of\_Test\_Data.ipynb$ or downloaded in $1\_Download\_Data.ipynb$

In [1]:
import numpy as np
import random
import h5py

from keras.models import Sequential
from keras.layers import Dense, InputLayer, Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ReduceLROnPlateau

Using TensorFlow backend.


## Load data for normalizing labels

In [2]:
mean_and_std = np.load('mean_and_std.npy')
mean_labels = mean_and_std[0]
std_labels = mean_and_std[1]
num_labels = mean_and_std.shape[1]

## Define function to normalize labels to approximately have a mean of zero and unit variance
NOTE: this is necessary to put output labels on a similar scale in order for the model to train properly, this process is reversed in the test stage to give the output labels their proper units

In [3]:
def normalize(lb):
    return (lb-mean_labels)/std_labels

## Obtain reference set
### Default:
num_ref = 10000

In [4]:
savename = 'training_data.h5'

In [5]:
num_ref = 10000

In [6]:
with h5py.File(savename,"r") as F:
    spectra = F["spectra"][0:num_ref]
    labels = np.column_stack((F["TEFF"][0:num_ref],F["LOGG"][0:num_ref],F["FE_H"][0:num_ref]))
    # Normalize labels
    labels = normalize(labels)
print('Reference set includes '+str(len(spectra))+' individual visit spectra.')

Reference set includes 10000 individual visit spectra.


In [7]:
# define the number of wavelength bins (typicall 7214)
num_fluxes = spectra.shape[1]
print('Each spectra contains '+str(num_fluxes)+' wavelength bins')

Each spectra contains 7214 wavelength bins


# Randomize order and separate into training and CV sets
This is done to randomize order for proper training
The data is then split the reference set into training and cross-validation sets
### Default:
- $num\_train$ = 90% of reference set

In [8]:
num_train=int(0.9*num_ref)

In [9]:
reference_data = np.column_stack((spectra,labels))
np.random.shuffle(reference_data)

train_spectra = reference_data[0:num_train,0:num_fluxes]
# Reshape spectra for convolutional layers
train_spectra = train_spectra.reshape(train_spectra.shape[0], train_spectra.shape[1], 1)
train_labels = reference_data[0:num_train,num_fluxes:]

cv_spectra = reference_data[num_train:,0:num_fluxes]
cv_spectra = cv_spectra.reshape(cv_spectra.shape[0], cv_spectra.shape[1], 1)
cv_labels = reference_data[num_train:,num_fluxes:]

print('Training set includes '+str(len(train_spectra))+' spectra and the cross-validation set includes '+str(len(cv_spectra))+' spectra')

Training set includes 9000 spectra and the cross-validation set includes 1000 spectra


## Define some model variables

In [10]:
# activation function used following every layer except for the output layers
activation = 'relu'

# model weight initializer
initializer = 'he_normal'

# shape of input spectra that is fed into the input layer
input_shape = (None,num_fluxes,1)

# number of filters used in the convolutional layers
num_filters = [4,16]

# length of the filters in the convolutional layers
filter_length = 8

# length of the maxpooling window 
pool_length = 4

# number of nodes in each of the hidden fully connected layers
num_hidden_nodes = [256,128]

# number of spectra fed into model at once during training
batch_size = 64

# maximum number of interations for model training
max_epochs = 45

# initial learning rate for optimization algorithm
lr = 0.0007
    
# exponential decay rate for the 1st moment estimates for optimization algorithm
beta_1 = 0.9

# exponential decay rate for the 2nd moment estimates for optimization algorithm
beta_2 = 0.999

# a small constant for numerical stability for optimization algorithm
optimizer_epsilon = 1e-08

# Build Model Architecture:
- input layer
- 2 convolutional layers
- 1 maxpooling layer followed by flattening for the fully connected layer
- 2 fully connected layers
- output layer

In [11]:
model = Sequential([

    InputLayer(batch_input_shape=input_shape),
        
    Conv1D(init=initializer, activation=activation, border_mode="same", nb_filter=num_filters[0], filter_length=filter_length),

    Conv1D(init=initializer, activation=activation, border_mode="same", nb_filter=num_filters[1], filter_length=filter_length),

    MaxPooling1D(pool_length=pool_length),

    Flatten(),

    Dense(output_dim=num_hidden_nodes[0], init=initializer, activation=activation),
        
    Dense(output_dim=num_hidden_nodes[1], init=initializer, activation=activation),

    Dense(output_dim=num_labels, activation="linear", input_dim=num_hidden_nodes[1]),
])

## More model techniques:
- The $Adam$ optimizer is the gradient descent algorithm used for minimizing the loss function
- $EarlyStopping$ uses the cross-validation set to test the model following every iteration and stops the training if the cv loss does not decrease by $min\_delta$ after $patience$ iterations
- $ReduceLROnPlateau$ is a form of learning rate decay where the learning rate is decreased by a factor of $factor$ if the training loss does not decrease by $epsilon$ after $patience$ iterations unless the learning rate has reached $min\_lr$

In [12]:
early_stopping_min_delta = 0.0001
early_stopping_patience = 4
reduce_lr_factor = 0.5
reuce_lr_epsilon = 0.0009
reduce_lr_patience = 2
reduce_lr_min = 0.00008

In [13]:
optimizer = Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=optimizer_epsilon, decay=0.0)

early_stopping = EarlyStopping(monitor='val_loss', min_delta=early_stopping_min_delta, 
                                       patience=early_stopping_patience, verbose=2, mode='min')

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5, epsilon=reuce_lr_epsilon, 
                                  patience=reduce_lr_patience, min_lr=reduce_lr_min, mode='min', verbose=2)

# compile model
### Default loss function:
- loss_function = mean squared error

In [14]:
loss_function = 'mean_squared_error'

In [15]:
model.compile(optimizer=optimizer, loss=loss_function)

# train model

In [16]:
model.fit(train_spectra, train_labels, validation_data=(cv_spectra, cv_labels),
          nb_epoch=max_epochs, batch_size=batch_size, verbose=2,
          callbacks=[reduce_lr,early_stopping])

Train on 9000 samples, validate on 1000 samples
Epoch 1/45
19s - loss: 0.6099 - val_loss: 0.5748
Epoch 2/45
19s - loss: 0.2165 - val_loss: 0.0574
Epoch 3/45
19s - loss: 0.0498 - val_loss: 0.0270
Epoch 4/45
19s - loss: 0.0302 - val_loss: 0.0206
Epoch 5/45
19s - loss: 0.0335 - val_loss: 0.0155
Epoch 6/45
19s - loss: 0.0154 - val_loss: 0.0196
Epoch 7/45
19s - loss: 0.0182 - val_loss: 0.0207
Epoch 8/45
19s - loss: 0.0154 - val_loss: 0.0121
Epoch 9/45
20s - loss: 0.0109 - val_loss: 0.0106
Epoch 10/45
20s - loss: 0.0126 - val_loss: 0.0146
Epoch 11/45
20s - loss: 0.0117 - val_loss: 0.0099
Epoch 12/45
19s - loss: 0.0071 - val_loss: 0.0076
Epoch 13/45
19s - loss: 0.0051 - val_loss: 0.0071
Epoch 14/45
19s - loss: 0.0046 - val_loss: 0.0071
Epoch 15/45
20s - loss: 0.0045 - val_loss: 0.0075
Epoch 16/45

Epoch 00015: reducing learning rate to 0.00034999998752.
19s - loss: 0.0047 - val_loss: 0.0082
Epoch 17/45
20s - loss: 0.0041 - val_loss: 0.0062
Epoch 18/45
19s - loss: 0.0031 - val_loss: 0.0062
Epo

<keras.callbacks.History at 0x7f807d50ff10>

# Save model

In [17]:
savename = 'Model_0'
model.save(savename+'.h5')
print('Model saved as '+savename+'.h5')

Model saved as Model_0.h5
