# Train Model

This notebook takes you through the steps of how to train a StarNet Model

## required packages:
- numpy
- h5py
- random
- Keras 2
  - for Keras 1 users, see $3\_Train\_Model\_Keras\_1.ipynb$
- Tensorflow or Theano
  - backend for Keras

NOTE: for error propagation, the model must be trained with Theano as the backend, although this has been found to be much slower without the proper set-up of Theano

## required data files:
- training_data.h5
  - can be created in $2\_Preprocessing\_of\_Training\_Data.ipynb$ or downloaded  in $1\_Download\_Data.ipynb$
- mean_and_std.npy
  - can be created in $3\_Preprocessing\_of\_Test\_Data.ipynb$ or downloaded in $1\_Download\_Data.ipynb$

In [1]:
import numpy as np
import random
import h5py

from keras.models import Sequential
from keras.layers import Dense, InputLayer, Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ReduceLROnPlateau

Using TensorFlow backend.


## Load data for normalizing labels## Obtain data for normalizing labels

In [2]:
mean_and_std = np.load('mean_and_std.npy')
mean_labels = mean_and_std[0]
std_labels = mean_and_std[1]
num_labels = mean_and_std.shape[1]

## Define function to normalize labels to approximately have a mean of zero and unit variance
NOTE: this is necessary to put output labels on a similar scale in order for the model to train properly, this process is reversed in the test stage to give the output labels their proper units

In [3]:
def normalize(lb):
    return (lb-mean_labels)/std_labels

## Obtain reference set

In [4]:
savename = 'training_data.h5'

In [5]:
with h5py.File(savename,"r") as F:
    spectra = F["spectra"][:]
    labels = np.column_stack((F["TEFF"][:],F["LOGG"][:],F["FE_H"][:]))
    # Normalize labels
    labels = normalize(labels)
print('Reference set includes '+str(len(spectra))+' individual visit spectra.')

Reference set includes 50000 individual visit spectra.


In [6]:
# define the number of wavelength bins (typicall 7214)
num_fluxes = spectra.shape[1]
print('Each spectra contains '+str(num_fluxes)+' wavelength bins')

Each spectra contains 7214 wavelength bins


# Randomize order and separate into training and CV sets
This is done to randomize order for proper training
The data is then split the reference set into training and cross-validation sets
### Default:
- $num\_train$ = 45000 spectra

In [7]:
num_train=45000

In [8]:
reference_data = np.column_stack((spectra,labels))
np.random.shuffle(reference_data)

train_spectra = reference_data[0:num_train,0:num_fluxes]
# Reshape spectra for convolutional layers
train_spectra = train_spectra.reshape(train_spectra.shape[0], train_spectra.shape[1], 1)
train_labels = reference_data[0:num_train,num_fluxes:]

cv_spectra = reference_data[num_train:,0:num_fluxes]
cv_spectra = cv_spectra.reshape(cv_spectra.shape[0], cv_spectra.shape[1], 1)
cv_labels = reference_data[num_train:,num_fluxes:]

print('Training set includes '+str(len(train_spectra))+' spectra and the cross-validation set includes '+str(len(cv_spectra))+' spectra')

Training set includes 45000 spectra and the cross-validation set includes 5000 spectra


## Define some model variables

In [9]:
# activation function used following every layer except for the output layers
activation = 'relu'

# model weight initializer
initializer = 'he_normal'

# shape of input spectra that is fed into the input layer
input_shape = (None,num_fluxes,1)

# number of filters used in the convolutional layers
num_filters = [4,16]

# length of the filters in the convolutional layers
filter_length = 8

# length of the maxpooling window 
pool_length = 4

# number of nodes in each of the hidden fully connected layers
num_hidden_nodes = [256,128]

# number of spectra fed into model at once during training
batch_size = 64

# maximum number of interations for model training
max_epochs = 30

# initial learning rate for optimization algorithm
lr = 0.0007
    
# exponential decay rate for the 1st moment estimates for optimization algorithm
beta_1 = 0.9

# exponential decay rate for the 2nd moment estimates for optimization algorithm
beta_2 = 0.999

# a small constant for numerical stability for optimization algorithm
optimizer_epsilon = 1e-08

# Build Model Architecture:
- input layer
- 2 convolutional layers
- 1 maxpooling layer followed by flattening for the fully connected layer
- 2 fully connected layers
- output layer

In [10]:
model = Sequential([

    InputLayer(batch_input_shape=input_shape),
        
    Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[0], kernel_size=filter_length),

    Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[1], kernel_size=filter_length),
        
    MaxPooling1D(pool_size=pool_length),

    Flatten(),

    Dense(units=num_hidden_nodes[0], kernel_initializer=initializer, activation=activation),
        
    Dense(units=num_hidden_nodes[1], kernel_initializer=initializer, activation=activation),

    Dense(units=num_labels, activation="linear", input_dim=num_hidden_nodes[1]),
]) 

## More model techniques:
- The $Adam$ optimizer is the gradient descent algorithm used for minimizing the loss function
- $EarlyStopping$ uses the cross-validation set to test the model following every iteration and stops the training if the cv loss does not decrease by $min\_delta$ after $patience$ iterations
- $ReduceLROnPlateau$ is a form of learning rate decay where the learning rate is decreased by a factor of $factor$ if the training loss does not decrease by $epsilon$ after $patience$ iterations unless the learning rate has reached $min\_lr$

In [11]:
# compile model
## Default loss function:
## loss_function = mean squared errorearly_stopping_min_delta = 0.0001
early_stopping_patience = 4
reduce_lr_factor = 0.5
reuce_lr_epsilon = 0.0009
reduce_lr_patience = 2
reduce_lr_min = 0.00008

In [12]:
optimizer = Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=optimizer_epsilon, decay=0.0)

early_stopping = EarlyStopping(monitor='val_loss', min_delta=early_stopping_min_delta, 
                                       patience=early_stopping_patience, verbose=2, mode='min')

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5, epsilon=reuce_lr_epsilon, 
                                  patience=reduce_lr_patience, min_lr=reduce_lr_min, mode='min', verbose=2)

# compile model
### Default loss function:
- loss_function = mean squared error

In [13]:
loss_function = 'mean_squared_error'

In [14]:
model.compile(optimizer=optimizer, loss=loss_function)

# train model

In [15]:
model.fit(train_spectra, train_labels, validation_data=(cv_spectra, cv_labels),
          epochs=max_epochs, batch_size=batch_size, verbose=2,
          callbacks=[reduce_lr,early_stopping])

Train on 45000 samples, validate on 5000 samples
Epoch 1/30
178s - loss: 0.0985 - val_loss: 0.0423
Epoch 2/30
172s - loss: 0.0146 - val_loss: 0.0176
Epoch 3/30
171s - loss: 0.0098 - val_loss: 0.0102
Epoch 4/30
171s - loss: 0.0088 - val_loss: 0.0074
Epoch 5/30
171s - loss: 0.0069 - val_loss: 0.0075
Epoch 6/30
170s - loss: 0.0073 - val_loss: 0.0072
Epoch 7/30
170s - loss: 0.0064 - val_loss: 0.0080
Epoch 8/30
170s - loss: 0.0057 - val_loss: 0.0069
Epoch 9/30
170s - loss: 0.0054 - val_loss: 0.0087
Epoch 10/30
170s - loss: 0.0048 - val_loss: 0.0066
Epoch 11/30

Epoch 00010: reducing learning rate to 0.00034999998752.
170s - loss: 0.0049 - val_loss: 0.0057
Epoch 12/30
170s - loss: 0.0036 - val_loss: 0.0048
Epoch 13/30
170s - loss: 0.0033 - val_loss: 0.0058
Epoch 14/30
170s - loss: 0.0033 - val_loss: 0.0065
Epoch 15/30

Epoch 00014: reducing learning rate to 0.00017499999376.
170s - loss: 0.0032 - val_loss: 0.0056
Epoch 16/30
169s - loss: 0.0027 - val_loss: 0.0039
Epoch 17/30
169s - loss: 0.0

<keras.callbacks.History at 0x7f83238bb750>

# Save model

In [16]:
savename = 'Model_0'
model.save(savename+'.h5')
print('Model saved as '+savename)

Model saved as Model_0
