# Training recurrent neural networks
This notebook contains the sample code to train a recurrent neural networks to predict the total power output for a day of a solar panel. The dataset is preprocessed and available with this notebook. You can however regenerate the dataset using the notebook "Prepare the dataset.ipynb" which is in the same folder as this notebook.

In [1]:
from cntk.losses import squared_error
from cntk.io import CTFDeserializer, MinibatchSource, INFINITELY_REPEAT, StreamDefs, StreamDef
from cntk.learners import adam
from cntk.logging import ProgressPrinter
from cntk.train import TestConfig

The notebook uses a set of constants to control various settings.
The most important settings are the batch size, epoch size and number of epochs to train for.

We've normalized the training data based on the maximum total power generated by the solar panel. 
This value is stored as a constant here to denormalize the output of the neural network normal usage.

In [2]:
BATCH_SIZE = 14 * 10
EPOCH_SIZE = 12434
EPOCHS = 10

# This value is required to convert the normalized values back to their original value.
# You can obtain this value by looking at the maximum value for the solar.total column
NORMALIZE = 19100

## Building the model
The model we're using is a recurrent neural network with an LSTM as the implementation for the recurrent layer in the network. We've wrapped the LSTM in a Fold layer because we're only interested in the final output of the recurrent layer. 
The output of the network is generated using a final Dense layer.

Note, the input features for the model are stored in a sequence input variable. This is required since we're working with sequences rather than single samples. The target output is stored in a regular input variable as we're only interested in predicting a single output.

In [3]:
from cntk import sequence, default_options, input_variable
from cntk.layers import Recurrence, LSTM, Dropout, Dense, Sequential, Fold
features = sequence.input_variable(1)

with default_options(initial_state = 0.1):
    model = Sequential([
        Fold(LSTM(15)),
        Dense(1)
    ])(features)
    
target = input_variable(1, dynamic_axes=model.dynamic_axes)

## Training the model
The model is trained using a mean squared error loss function. The data for the model is coming from a set of CTF Files containing sequences of measurements per day. 

In [4]:
from cntk import Function

@Function
def criterion_factory(z, t):
    loss = squared_error(z, t)
    metric = squared_error(z, t)    
    
    return loss, metric

loss = criterion_factory(model, target)
learner = adam(model.parameters, lr=0.005, momentum=0.9)

In order to load data into the training process we need to deserialize sequences from a set of CTF files. The `create_datasource` function is a useful utility function to create both the training and test datasources. 

In [5]:
def create_datasource(filename, sweeps=INFINITELY_REPEAT):
    target_stream = StreamDef(field='target', shape=1, is_sparse=False)
    features_stream = StreamDef(field='features', shape=1, is_sparse=False)

    deserializer = CTFDeserializer(filename, StreamDefs(features=features_stream, target=target_stream))
    datasource = MinibatchSource(deserializer, randomize=True, max_sweeps=sweeps)    
    
    return datasource

In [6]:
train_datasource = create_datasource('solar_train.ctf')
test_datasource = create_datasource('solar_val.ctf', sweeps=1)

Now that we've setup the data sources, model, and loss function let's start the training process.
Please be aware, this takes a long time on a computer with just a CPU. If you can, use a GPU to train this model.

In [7]:
progress_writer = ProgressPrinter(0)
test_config = TestConfig(test_datasource)

input_map = {
    features: train_datasource.streams.features,
    target: train_datasource.streams.target
}

history = loss.train(
    train_datasource, 
    epoch_size=EPOCH_SIZE,
    parameter_learners=[learner], 
    model_inputs_to_streams=input_map,
    callbacks=[progress_writer, test_config],
    minibatch_size=BATCH_SIZE,
    max_epochs=EPOCHS)

 average      since    average      since      examples
    loss       last     metric       last              
 ------------------------------------------------------
Learning rate per minibatch: 0.005
      0.4        0.4        0.4        0.4            19
      0.4        0.4        0.4        0.4            59
    0.452      0.495      0.452      0.495           129
     0.43      0.411       0.43      0.411           275
    0.394      0.362      0.394      0.362           580
    0.354      0.314      0.354      0.314          1150
    0.281      0.207      0.281      0.207          2298
    0.164     0.0495      0.164     0.0495          4643
   0.0934     0.0235     0.0934     0.0235          9328
   0.0141     0.0141     0.0141     0.0141            20
   0.0229     0.0285     0.0229     0.0285            52
   0.0219     0.0212     0.0219     0.0212           122
   0.0209     0.0201     0.0209     0.0201           268
    0.021      0.021      0.021      0.021           552

## Making predictions
You can use any CNTK model as a function, that's how we make our predictions in this notebook too. The model function accepts a numpy array as input. The shape of the array is defined as `<batch>x<timesteps>x<features>`. We're using a number of samples stored as a pickle file which we load and then feed into the model.

In [8]:
import pickle

with open('test_samples.pkl', 'rb') as test_file:
    test_samples = pickle.load(test_file)
    
model(test_samples) * NORMALIZE

  (sample.dtype, var.uid, str(var.dtype)))


array([[ 8081.7905],
       [16597.693 ],
       [13335.17  ],
       ...,
       [11275.804 ],
       [15621.697 ],
       [16875.555 ]], dtype=float32)