# Train the StarNet Model

This notebook takes you through the steps of how to train a StarNet Model
- Required Python packages: `numpy h5py keras`
- Required data files: training_data.h5, mean_and_std.npy

Note: We use tensorflow for the keras backend.

In [1]:
import numpy as np
import h5py
import random

from keras.models import Sequential
from keras.layers import Dense, InputLayer, Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ReduceLROnPlateau

datadir = '/home/ubuntu/starnet_data/'

Using TensorFlow backend.


** Batch the Training Data **

Define a function that will return a batch of data from an h5 file in the form of an HDF5 matrix. This enables the user to only load a batch of spectra at a time while training.

Within this function the labels will be normalized to approximately have a mean of zero and unit variance.

NOTE: This is necessary to put output labels on a similar scale in order for the model to train properly, this process is reversed in the test stage to give the output labels their proper units

In [2]:
def load_batch_from_h5(data_file, num_objects, batch_size, indx, mu_std=''):
        
        mean_and_std = np.load(mu_std)
        mean_labels = mean_and_std[0]
        std_labels = mean_and_std[1]
            
        # Generate list of random indices (within the relevant partition of the main data file, e.g. the
        # training set) to be used to index into data_file
        indices = random.sample(range(indx, indx+num_objects), batch_size)
        indices = np.sort(indices)
        
        # load data
        F = h5py.File(data_file, 'r')
        X = F['spectrum']
        teff = F['TEFF']
        logg = F['LOGG']
        fe_h = F['FE_H']
        
        X = X[indices,:]
        
        y = np.column_stack((teff[:][indices],
                                 logg[:][indices],
                                 fe_h[:][indices]))
        
        # Normalize labels
        normed_y = (y-mean_labels) / std_labels
        
        # Reshape X data for compatibility with CNN
        X = X.reshape(len(X), 7214, 1)
        
        return X, normed_y

** Create Batch Generators for the Training and Cross-Validation Data **

In [3]:
def generate_train_batch(data_file, num_objects, batch_size, indx, mu_std):
    
    while True:
        x_batch, y_batch = load_batch_from_h5(data_file, 
                                                   num_objects, 
                                                   batch_size, 
                                                   indx, 
                                                   mu_std)
        yield (x_batch, y_batch)

def generate_cv_batch(data_file, num_objects, batch_size, indx, mu_std):
    
    while True:
        x_batch, y_batch = load_batch_from_h5(data_file, 
                                                   num_objects, 
                                                   batch_size, 
                                                   indx, 
                                                   mu_std)
        yield (x_batch, y_batch)

** Obtain information from the reference set and the normalization data**

In [4]:
training_set = 'training_data.h5'
normalization_data = 'mean_and_std.npy'

In [5]:
# Define the number of output labels
num_labels = np.load(datadir+'mean_and_std.npy').shape[1]

# Define the number of training spectra
num_train = 41000

with h5py.File(datadir+training_set, 'r') as F:
    spectra = F['spectrum']
    num_flux = spectra.shape[1]
    num_cv = spectra.shape[0]-num_train
print('Each spectrum contains ' + str(num_flux) + ' wavelength bins')
print('Training set includes ' + str(num_train) + ' spectra and the cross-validation set includes ' + str(num_cv)+' spectra')

Each spectrum contains 7214 wavelength bins
Training set includes 41000 spectra and the cross-validation set includes 3784 spectra


**Build the StarNet model architecture**

The StarNet architecture is built with:
- input layer
- 2 convolutional layers
- 1 maxpooling layer followed by flattening for the fully connected layer
- 2 fully connected layers
- output layer

First, let's define some model variables.

In [6]:
# activation function used following every layer except for the output layers
activation = 'relu'

# model weight initializer
initializer = 'he_normal'

# shape of input spectra that is fed into the input layer
input_shape = (None, num_flux, 1)

# number of filters used in the convolutional layers
num_filters = [4,16]

# length of the filters in the convolutional layers
filter_length = 8

# length of the maxpooling window 
pool_length = 4

# number of nodes in each of the hidden fully connected layers
num_hidden = [256,128]

# number of spectra fed into model at once during training
batch_size = 64

# maximum number of interations for model training
max_epochs = 30

# initial learning rate for optimization algorithm
lr = 0.0007
    
# exponential decay rate for the 1st moment estimates for optimization algorithm
beta_1 = 0.9

# exponential decay rate for the 2nd moment estimates for optimization algorithm
beta_2 = 0.999

# a small constant for numerical stability for optimization algorithm
optimizer_epsilon = 1e-08

In [7]:
model = Sequential([
    InputLayer(batch_input_shape=input_shape),
    Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[0], kernel_size=filter_length),
    Conv1D(kernel_initializer=initializer, activation=activation, padding="same", filters=num_filters[1], kernel_size=filter_length),
    MaxPooling1D(pool_size=pool_length),
    Flatten(),
    Dense(units=num_hidden[0], kernel_initializer=initializer, activation=activation),
    Dense(units=num_hidden[1], kernel_initializer=initializer, activation=activation),
    Dense(units=num_labels, activation="linear", input_dim=num_hidden[1]),
])

**More model techniques**
* The `Adam` optimizer is the gradient descent algorithm used for minimizing the loss function
* `EarlyStopping` uses the cross-validation set to test the model following every iteration and stops the training if the cv loss does not decrease by `min_delta` after `patience` iterations
* `ReduceLROnPlateau` is a form of learning rate decay where the learning rate is decreased by a factor of `factor` if the training loss does not decrease by `epsilon` after `patience` iterations unless the learning rate has reached `min_lr`

In [8]:
# Default loss function parameters
early_stopping_min_delta = 0.0001
early_stopping_patience = 4
reduce_lr_factor = 0.5
reuce_lr_epsilon = 0.0009
reduce_lr_patience = 2
reduce_lr_min = 0.00008

# loss function to minimize
loss_function = 'mean_squared_error'

# compute accuracy and mean absolute deviation
metrics = ['accuracy', 'mae']

In [9]:
optimizer = Adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=optimizer_epsilon, decay=0.0)

early_stopping = EarlyStopping(monitor='val_loss', min_delta=early_stopping_min_delta, 
                                       patience=early_stopping_patience, verbose=2, mode='min')

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.5, epsilon=reuce_lr_epsilon, 
                                  patience=reduce_lr_patience, min_lr=reduce_lr_min, mode='min', verbose=2)

**Compile model**

In [10]:
model.compile(optimizer=optimizer, loss=loss_function, metrics=metrics)

**Train model**

In [11]:
model.fit_generator(generate_train_batch(datadir+training_set, 
                                         num_train, batch_size, 0, 
                                         datadir+normalization_data),
                                  steps_per_epoch = num_train/batch_size,
                                  epochs=max_epochs,
                                  validation_data=generate_cv_batch(datadir+training_set, 
                                                                    num_cv, batch_size, num_train,
                                                                    datadir+normalization_data),
                                  max_q_size=10, verbose=2,
                                  callbacks=[early_stopping, reduce_lr],
                                 validation_steps=num_cv/batch_size)

Epoch 1/30
70s - loss: 0.2687 - acc: 0.7139 - mean_absolute_error: 0.2693 - val_loss: 0.0482 - val_acc: 0.7924 - val_mean_absolute_error: 0.1706
Epoch 2/30
70s - loss: 0.0358 - acc: 0.7787 - mean_absolute_error: 0.1372 - val_loss: 0.0229 - val_acc: 0.7918 - val_mean_absolute_error: 0.1085
Epoch 3/30
70s - loss: 0.0266 - acc: 0.7935 - mean_absolute_error: 0.1135 - val_loss: 0.0184 - val_acc: 0.8183 - val_mean_absolute_error: 0.0970
Epoch 4/30
70s - loss: 0.0162 - acc: 0.8260 - mean_absolute_error: 0.0902 - val_loss: 0.0129 - val_acc: 0.8535 - val_mean_absolute_error: 0.0760
Epoch 5/30
70s - loss: 0.0120 - acc: 0.8463 - mean_absolute_error: 0.0775 - val_loss: 0.0115 - val_acc: 0.8795 - val_mean_absolute_error: 0.0831
Epoch 6/30
70s - loss: 0.0092 - acc: 0.8735 - mean_absolute_error: 0.0689 - val_loss: 0.0094 - val_acc: 0.8530 - val_mean_absolute_error: 0.0748
Epoch 7/30
71s - loss: 0.0081 - acc: 0.8871 - mean_absolute_error: 0.0638 - val_loss: 0.0103 - val_acc: 0.8713 - val_mean_absolute

<keras.callbacks.History at 0x7fe5a8a0c6d0>

**Save model**

In [12]:
starnet_model = 'starnet_cnn.h5'
model.save(datadir + starnet_model)
print(starnet_model+' saved.')

starnet_cnn.h5 saved.
