<h1 style="font-size:40px;"><center>Exercise II:<br> Convolutional and Recurrent Neural Networks
</center></h1>

## Short summary
In this exercise you will: 

* Train CNN for a binary classification problem
* Experiment with CNN network configuration
* Visualize how CNN interprets the data
* Train RNN on a time series prediction problem
* Visualize how RNN hidden node activities
* Sample from a RNN character model

In this lab we will look at network architectures that are designed to handle specific kinds of data. Convolutional Neural Networks for image processing and Recurrent Neural Networks for time series processing

**Deadline for submitting the report: December 21, 12:00 (2017)**

## The data
The first dataset consists of simple computer generated pictures, each of which will contain either a square or a circle. The task is to train a CNN to classify whether a picture contains a circle (class 0) or a square (class 1).

Part of the MNIST database is also used for an optional exercise.

The second dataset consists of pairs of times series. The input time series is a train of rectangle pulses, and the output is triangles, i.e. an up-ramp followed by a down-ramp. For more details see the cell *Ex3-1*. The task is to train a recurrent network that predicts the triangle time series from the pulse time series.

The last exercise is using the Tensorflow source code (C++) represented as a long sequence of characters. See that cell for more details.

## The exercises
As for the previous lab all exercises are found below.

## The different 'Cells'
This notebook contains several cells with python code, together with the markdown cells (like this one) with only text. Each of the cells with python code has a "header" markdown cell with information about the code. The table below provides a short overview of the code cells. 

| #  |  CellName | CellType | Comment |
| :--- | :-------- | :-------- | :------- |
| 1 | Init | Needed | Sets up the environment|
| 2 | Data | Needed | Loading images for the CNN exercise |
| 3 | PlotImg | Information  | View some of the images |
| 4 | Stats | Needed | Compute classification results |
| 5 | Ex1 | Exercise | For question 1-2 |
| 6 | Ex2 | Exercise | For question 3-4 |
| 7 | Ex3-1 | Exercise | For question 5-7 |
| 8 | Ex3-2 | Exercise | For question 5-7 |
| 9 | Ex3-3 | Exercise | For question 5-7 |
| 10 | Ex4-1 | Exercise | For question 8-9 |
| 11 | Ex4-2 | Exercise | For question 8-9 |


In order for you to start with the exercise you need to run all cells with the CellType "Needed". The very first time you start with this exercise we suggest that you enter each of the needed cells, read the cell instruction and run the cell. It is important that you do this in the correct order, starting from the top and work you way down the cells. Later when you have started to work with the notebook it may be easier to use the command "Run All" found in the "Cell" dropdown menu.

## Writing the report
First the report should be written within this notebook. We have prepared the last cell in this notebook for you where you should write the report. The report should contain 4 parts:

* Name:
* Introduction: A **few** sentences where you give a small introduction of what you have done in the lab.
* Answers to questions: For each of the questions provide an answer. It can be short answers or a longer ones depending on the nature of the questions, but try to be effective in your writing.
* Conclusion: Summarize your findings in a few sentences.

It is important that you write the report in this last cell and **not** after each question! Also when uploading your report to Live@Lund, name the file according to:

**lab1_Surname_Firstname.ipynb**

## Last but not least
Have fun again!




### CellName: Init (#1)
### CellType: Needed
### Cell instruction: Initializing the libraries
In the cell below, we import all the libraries that are needed for this exercises. There is one configuration parameter that you can change in this cell

* Inline or "pop out" plots.

See comments in the cell for more information. Run the cell by entering into the cell and press "CTRL Enter".

**Note!** If you get an error running this cell concerning Keras and not finding 'RNN' you may have to update Keras. Do that by the following command (in the terminal or in the Anaconda shell):

pip install --upgrade --no-deps keras


In [1]:
import numpy as np
import tensorflow as tf
import time

import keras
from keras.models import Model, Sequential
from keras.layers import Dense, Dropout, Flatten, Input, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import TimeDistributed
from keras.layers import Lambda, concatenate
from keras.layers import LSTM, GRU, SimpleRNN
from keras.layers import RNN

from keras.optimizers import SGD, Adam, RMSprop, Nadam
from keras import backend as K

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

# To have the plots inside the notebook "inlin" should be True. 
# If "inlin" = False, then plots will pop out of the notebook
inlin = True # True/False
if inlin:
    %matplotlib inline
else:
    %matplotlib

import matplotlib
import matplotlib.pyplot as plt


ModuleNotFoundError: No module named 'tensorflow'

### CellName: Data (#2)
### CellType: Needed
### Cell instruction: Loading images for the CNN exercise

This cell loads the image data needed for the CNN exercise. Note! If it can't find the files, make sure the "images/" folder is available in the same directory as this notebook file. After you run this cell the training and test data are stored in the variables 

Run the cell by entering into the cell and press "CTRL Enter".

In [None]:
def loadImages(N=250):
    from scipy import misc
    def load_pics(folder,n):
        imgs = []
        for i in range(n):
            img = misc.imread(folder+"img_{:05}.png".format(i+1))
            ch = img[:,:,0]
            imgs.append(ch)
        return np.array(imgs)

    def load_labels(fn):
        return np.loadtxt(fn, usecols=0)

    base = "images/"
    trainpic = load_pics(base+"imgTrn/", 1000)
    testpic = load_pics(base + "imgTst/", 1000)
    ntrain, width, height = trainpic.shape

    xtrain = (trainpic/np.float32(255)).reshape(1000, width, height, 1)
    xtest = (testpic/np.float32(255)).reshape(1000, width, height, 1)

    ytrain = load_labels(base+"Trn_trg.csv")
    ytest = load_labels(base+"Tst_trg.csv")

    xtrain = xtrain[:250]
    ytrain = ytrain[:250]
    
    return xtrain, ytrain, xtest, ytest, width, height

def loadMNIST():
    xtrain, ytrain, xtest, ytest = np.load("mnist.npy")
    width, height = xtrain.shape[1:3]
    return xtrain, ytrain, xtest, ytest, width, height


### CellName: PlotImg (#3)
### CellType: Information
### Cell instruction: Show some of the images

Here we look at the first ten pictures in the training set, and their respective targets. Run the cell by entering into the cell and press "CTRL Enter".

In [None]:
xtrain, ytrain, xtest, ytest, width, height = loadImages(10)
plt.figure(1, figsize=(15,10))
plt.imshow(xtrain[:10,:,:].reshape(10*width,height).T,cmap="gray")
plt.axis("off")
plt.show()

print(ytrain[:10])


### CellName: Stats (#4)
### CellType: Needed
### Cell instruction: Get binary classification results

Run the cell by entering into the cell and press "CTRL Enter".

In [None]:
def binary_pred_stats(ytrue, ypred, threshold=0.5):
    one_correct = np.sum((ytrue==1)*(ypred > threshold))
    zero_correct = np.sum((ytrue==0)*(ypred <= threshold))
    sensitivity = one_correct / np.sum(ytrue==1)
    specificity = zero_correct / np.sum(ytrue==0)
    accuracy = (one_correct + zero_correct) / len(ytrue)
    return sensitivity, specificity, accuracy



### CellName: Ex1 (#5)
### CellType: Exercise
### Cell instruction: Instructions for question 1-2

## CNN for image classification

In this first exercise you are going to train a CNN that can separate between circles and rectangles, i.e. using the *img1* dataset. We are going to use 250 training images and 1000 test images. To start with we have a proposed CNN that can solve this problem. It consists of the following:
* First convolutional layer consisting of 8 kernels of size 3x3, with ReLU activation
* Maxpooling of size 2x2
* Second convolutional layer of 8 kernels of size 3x3, with ReLU activation
* Maxpooling of size 2x2
* Third convolutional layer consisting of 8 kernels of size 3x3, with ReLU activation
* Maxpooling of size 2x2
* Special layer *Flatten()*, just transforms the all of the max pooled filter outputs to a linear vector of outputs
* *Dense* layer, meaning a fully connected MLP like layer with 10 nodes, again ReLU activation
* Final output layer consisting of one single output node with sigmoid activation function because we have a binary classification problem.

The default is to use *stride* = 1 and no *padding*. 

#### Question 1
Make sure you understand the definition of the CNN model in the cell below and train it. **What is your test set performance in terms of the accuracy?**

#### Question 2
This image classification problem should be relatively easy since a circle is very different from a rectangle, and the fact that we do not have any noise in the images. Experiment with the architecture of the CNN model and try to make it smaller, but with the same almost perfect test accuracy. **How many parameters do you have in your trimmed model and state your architecture?**

**Hint:** There is of course very many ways you can make a smaller architecture. You do not need to test all of them!


In [None]:
# Load the dataset, Rectangles and Circles
xtrain, ytrain, xtest, ytest, width, height = loadImages(250)

# Uncomment below to load parts of the MNIST database instead
# NOTE! When using MNIST, comment out third Conv2D/MaxPooling2D pair!
# xtrain, ytrain, xtest, ytest, width, height = loadMNIST()

# The size of the images
input_shape = (width, height, 1)

# Define the CNN model
model = Sequential([
    Conv2D(8, kernel_size=(3, 3),
           activation='relu',input_shape=input_shape),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(8, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(8, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(10, activation='relu'),
    #Dropout(0.5),
    Dense(1),
    Activation('sigmoid')
])

# We use cross entropy error and the adam optimizer
adam = Adam(lr=0.005)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
model.summary()


# Now train the model
estimator = model.fit(xtrain, ytrain, 
                      epochs=30, 
                      batch_size=50,
                      verbose=0)

# Plot the training error
plt.plot(estimator.history['loss'])
plt.title('Model training')
plt.ylabel('training error')
plt.xlabel('epoch')
plt.legend(['train'], loc=0)
plt.show()

# Get the training predictions and results for those
predtrain = model.predict(xtrain)[:,0]
sensitivity, specificity, accuracy = binary_pred_stats(ytrain, predtrain)
print("train set:", sensitivity, specificity, accuracy)

# Get the test predictions and the results for those
predtest = model.predict(xtest)[:,0]
sensitivity, specificity, accuracy = binary_pred_stats(ytest, predtest)
print("test set: ", sensitivity, specificity, accuracy)


### CellName: Ex2 (#6)
### CellType: Exercise
### Cell instruction: Instructions for question 3-4

You are now going to take a look into the CNN model. There are many attempts to visualize how the CNN model is making classifications. We will here just look at the different filter outputs given an input image. So the code in the cell below will do the following:
* Select an image
* Make a forward pass through the CNN remembering all intermediate values.
* Plot all of the filters for each of the layers.
* One can select to plot before or after the MaxPooling.

This cell relies on the fact that you have run the cell above (Ex1) so that test data has been defined and you have a trained model.

#### Question 3
Train a CNN for the *img1* dataset! As a suggestion use the following CNN

*6x(3x3 kernel)-maxpool-4x(3x3 kernel)-maxpool-4x(3x3 kernel)-maxpool-(Flatten)-Dense(5)-Dense(1)*

Make sure that your trained model gives good test results (i.e. > 95% accuracy). Having such a model, you can run the cell below. There are two parameters in the cell, *post_pool* or *idx*. The post_pool variable can be set to *True* meaning that filters will be shown after MaxPooling. The image to show is selected by the *idx* variable. As an example, the following values represent,
* idx=8 small rectangle
* idx=11 small circle
* idx=15 large circle
* idx=21 large rectangle

**Can you find some property in the filters that makes sense when it comes to separating circles from rectangles?**

Hint! If you repeat the training you may get new filters!

#### Question 4
Try to train a really small model with only 1-2 filters in each of the convolutional layers. **Again, can one understand what the filters are looking for?**

#### Bonus Question
You do not have to answer this one! If you go back to cell *Ext1* you can load parts of the MNIST database instead. It is the digits "2" and "3". Rerun the model and again look at the filters! Note that you should only use two convolutional layers for MNIST.


In [None]:
# if True then Maxpooling will be applied before showing the filter
post_pool = False

# The image index to show
idx = 21

kind = MaxPooling2D if post_pool else Conv2D
outs = [model.layers[0].input] + [l.output for l in model.layers if isinstance(l, kind)]
intermediate = K.function([model.layers[0].input, K.learning_phase()], outs)
print(ytest[idx])
states = intermediate([xtest[idx:idx+1], 0])
plt.figure(figsize=(18,12))                    
for k,s in enumerate(states):
    plt.figure(figsize=(18,12))
    plt.subplot(len(outs),1,k+1)
    pics = s[0]
    pics = np.rollaxis(pics,2,0)
    rows = 2 if pics.shape[0] > 8 else 1
    cols = pics.shape[0]//rows
    imgshape = pics.shape[1:]
    pics = pics.reshape((rows,cols)+imgshape)
    pics = pics.swapaxes(1,2)
    pics = pics.reshape((pics.shape[0]*pics.shape[1], pics.shape[2]*pics.shape[3]))
    extent = (0,cols*imgshape[0], 0,rows*imgshape[1])
    plt.imshow(pics,cmap='gray',extent=extent)
    for r in range(1,rows):
        plt.plot([0,cols*imgshape[0]], [r*imgshape[1], r*imgshape[1]], color='r', linestyle='-', linewidth=1)
    for c in range(1,cols):
        plt.plot([c*imgshape[0], c*imgshape[0]], [0,rows*imgshape[1]], color='r', linestyle='-', linewidth=1)



### CellName: Ex3-1 (#7)
### CellType: Exercise
### Cell instruction: Instructions for question 5-7

## RNN as a pulse converter
We will now look at recurrent networks! This exercise is using divided into three cells below. We start by loading and visualizing the data. **Note!** The actual questions for this part can be found in cell *Ex3-3* below.

### Loading and visualizing the data
The cell below loads the training data and the test data from existing binary python files and plots one set of training/test data, both the input sequence and the target sequence. Run the cell by entering into the cell and press "CTRL Enter".

How is data generated? The input sequence consists of square pulses with varying length and height. The waiting time between the pulses is also varying within some predefined ranges. The lower limit is 2 times the length of the previous pulse. The target triangle pulse sequence is built from the input sequence as follows:
* the triangle pulse start when the input square pulse have ended.
* the width of the triangle (at the base) is twice the width of the square pulse.
* the height of the triangle is the same as the height of the square pulse.

The task is now to learn this mapping using a recurrent neural network. There are 100 input/target sequences in the training data and 100 in the test data.


In [None]:
# Load data from files
x,y = np.load("rnn_traindata.npy")
xtest,ytest = np.load("rnn_testdata.npy")

# If this is set to True, then we have the reverse problem. Input triangle pulse, target square puls.
if False:
    y,x = x[:,::-1],y[:,::-1]
    ytest,xtest = xtest[:,::-1], ytest[:,::-1]

ns,tlen = x.shape
t = np.arange(tlen)

# The training /validation case to look at
i = 3

# Plot
plt.figure(figsize=(15,10))
plt.subplot(4,1,1)
plt.plot(t,x[i,:])
plt.legend(['Training, input sequence'], loc=0)

plt.subplot(4,1,2)
plt.plot(t,y[i,:])
plt.legend(['Training, target sequence'], loc=0)

plt.subplot(4,1,3)
plt.plot(t,xtest[i,:])
plt.legend(['Test, input sequence'], loc=0)

plt.subplot(4,1,4)
plt.plot(t,ytest[i,:])
plt.legend(['Test, target sequence'], loc=0)


### CellName: Ex3-2 (#8)
### CellType: Exercise
### Cell instruction: Instructions for question 5-7

## RNN as a pulse converter
### define the model and train
Here we are going to setup the model and train it. There are three different models to choose from: 
*SimpleRNN: Simple feedback weights where the output from a node is feeding back to itself. For several hidden nodes there are feedback weights to all other nodes in the layer.
* LSTM: The LSTM unit
* GRU: The GRU unit

The standard choice of activation function is *tanh*, but you can also test *relu*. When it comes to training this model we are goint to use a truncated BPTT approach. The support in Keras for doing this is somewhat limited so here it is implemented manually. In short we have 100 training sequences and we define a mini-batch size *mb* that selects *mb* of these sequences to train using the normal stochastic gradient descent idea. Then we have a variable *batchlen* that is the size of the sequence to use in truncated BPTT. The default values for these are *mb=10* and *batchlen=25*. 

During training we print the normalized training and test error. Normalized means here that the loss (=MSE) is divided by the variance of the target signal. So that a normalized error of 1 is not so good, but if we get below 0.1 (or so) it means that the error is much smaller than the signal itself.

What you need to do in this cell is to define your model and train it.

In [None]:
%%time
ns,tlen = x.shape

# Parameters defining the mini-batch size and 
# the sequence length for truncated BPTT
mb = 10
nmb = ns//mb
batchlen = 25
ntsteps = tlen//batchlen

# The network type
net = SimpleRNN
#net = GRU
#net = LSTM

# Number of hidden nodes
nh = 5

# The activation function
activation = 'tanh'
#activation = 'relu'

# The number of epochs
nE = 20

#Start defining the model
model = Sequential()
model.add(net(nh, 
              batch_input_shape=(mb,batchlen,1), 
              stateful=True, 
              return_sequences=True, 
              activation=activation))
model.add(TimeDistributed(Dense(1)))

adam = Adam(lr=0.005)
model.compile(optimizer=adam,loss='mean_squared_error')
model.summary()
#print(model.get_config())

# Now the training part
trnTrgVar = np.var(y[:,:])        # Variance for train target signal
testTrgVar = np.var(ytest[:,:])   # Variance for test target signal
ndone = 0

print('Epoch', 'Time/Epoch', ' Train-Err', '  Test-Err')
for ne in range(nE):
    t0 = time.time()
    sumloss = 0
    for batch in range(nmb):
        i1,i2 = batch*mb,(batch+1)*mb
        model.reset_states()
        for tstep in range(ntsteps):
            t1,t2 = tstep*batchlen,(tstep+1)*batchlen
            loss = model.train_on_batch(x[i1:i2,t1:t2,None], y[i1:i2,t1:t2,None])
            sumloss += loss
    meanloss = sumloss/(nmb*ntsteps)

    # Test error
    sumlossvalid = 0
    for batch in range(nmb):
        i1,i2 = batch*mb,(batch+1)*mb
        model.reset_states()
        for tstep in range(ntsteps):
            t1,t2 = tstep*batchlen,(tstep+1)*batchlen
            loss = model.evaluate(xtest[i1:i2,t1:t2,None], ytest[i1:i2,t1:t2,None],batch_size=mb,verbose=0)
            sumlossvalid += loss
    meanlossvalid = sumlossvalid/(nmb*ntsteps)
    t1 = time.time()
    ndone += 1
    print(ndone, "    {:.2f}        {:.5f}     {:.5f}".format(t1-t0, meanloss/trnTrgVar, meanlossvalid/testTrgVar))


### CellName: Ex3-3 (#9)
### CellType: Exercise
### Cell instruction: Instructions for question 5-7

## RNN as a pulse converter
### Plot the result
In this cell we just plot the result for one of the first 10 test sequences. You can select which of these ones by an index (see the code). Also, the last graph shows the hidden node activation for all of the hidden nodes. **Note:** For the GRU and simpleRNN models this all of the hidden activity there is, but for the LSTM there is also the memory signal. This one is not shown!

### Questions
We are now finally at the point of asking questions. Whenever you define a new model and train it, you need to run the  cell below in order to show the result for the newly trained model. 

**Hint!** For all of the questions below you are going to train different models. Keep an eye on how the training error is developing. If you see large fluctuations, you may to change the learning rate. The default value of 0.003 should be OK for most trainings. 

#### Question 5
(Just to get started!) Define a simpleRNN model with 5 hidden nodes and train it for about 20 epochs. **What test error do you obtain?** 

Hint: The test error can be found during "training" as the error for the last epoch.
Hint: You may have to train a couple of times to make sure that you did not end up in a "bad" local minima the first time.

#### Question 6
Test different models! Train three different models with the same number of hidden nodes (e.g. 4) and decide which of them that works best? **So, out of the three different models, *simpleRNN, GRU och LSTM*, which one worked best using the same number of hidden nodes?**

Comment: Of course the different models uses different amount of weights, so one can argue that it is not a fair comparison!

#### Question 7
Interpretation! You are now going to interpret the hidden node outputs. Remember that the actual output for each time is just a linear combination of the hidden node outputs. As said before you can see the hidden nodes output in the last plot. The hidden nodes outputs are shown such that the final output is a linear combination of hidden nodes output with **positive** weights. Train a *GRU* model with 3 hidden nodes for about 20 epochs. **Try to explain what the different hidden nodes are actually detecting**.

Comment: This is of course a question with no definite true answer. We just want you to interpret what the different nodes are doing.

#### Bonus question 1
You do not need to answer this one. Since the sequences do not contain so much noise, you should be able to train a model with very small test error. **Try training a large model to see that you can get a very small error**.

#### Bonus question 2
You do not need to answer this one. If you look at the top of cell *Ex3-1* you can, by changing False -> True, define the reverse problem. That is, input is the triangle pulse and target is the square pulse. This should be a more difficult problem! **Why?** **Experiment with your RNN model to "solve" this problem.**


In [None]:
xshow = xtest[:mb]
yshow = ytest[:mb]
yout = np.zeros((mb,tlen))
hidden = np.zeros((mb,tlen,nh))

rnn = model.layers[0]
dense = model.layers[1]
sign = K.sign(dense.layer.kernel)[None,None,:,0]
intermediate = K.function([rnn.input, K.learning_phase()], [sign*rnn.output])

for tstep in range(ntsteps):
    t1,t2 = tstep*batchlen,(tstep+1)*batchlen
    inp = xshow[:,t1:t2,None]
    hi, = intermediate([inp,1])
    hidden[:,t1:t2:,:] = hi
    yi = model.predict(xshow[:,t1:t2,None])
    yout[:,t1:t2] = yi[:,:,0]

t = np.arange(tlen)

# Selection of test sequnce. i=0 is rather easy, i=1 is a difficult one
i = 1

plt.figure(figsize=(15,10))

plt.subplot(3,1,1)
plt.plot(t,xshow[i],'-',marker='.')
plt.legend(['Training, input sequence'], loc=0)

plt.subplot(3,1,2)
plt.plot(t,yshow[i],'-',marker='.')
plt.plot(t,yout[i],'-',marker='.')
plt.legend(['Test, target sequnce', 'Test, predicted sequence'], loc=0)

plt.subplot(3,1,3)
plt.plot(t,hidden[i],'-',marker='.')
plt.title('Hidden node outputs')


### CellName: Ex4-1 (#10)
### CellType: Exercise
### Cell instruction: Instructions for question 8-9

## RNN: Sampling from a character model
As a final exercise we are going to look into an example where a recurrent network is used to predict a sequence of characters. The model is autoregressive, meaning that the previous values $\ldots, x_{i-2}, x_{i-1}$ is used to predict $x_i$. In the model characters are represented as vectors with the same number of elements as the number of unique characters in the sequence. The input in each sequence step is a single character, represented using "one-hot" coding, that is exactly one vector element is one, representing that particular character. The output is on the other hand a probability vector over all characters, which can be used to sample a character from the predictive distribution. To sample long sequences of characters, we feed one sampled character back into the network, to predict the character after that and so on.

The sequence of characters that we are going to train on is the source code of Tensorflow (C++). The size of the downloaded source code is approximately 14 Mb, which means about 14 million characters in the sequence. The number of unique characters is 103. Therefore the output layer consists of 103 nodes with a softmax activation function. The RNN model itself is rather large, it consists of two layers of 1024 LSTM nodes in each layer. In addition to that there are skip-layer connections from input to second hidden layer and skip connections from first hidden layer to the output layer. In total there are about 13.5 million weights in this model.

This model take too long time to train as part of this lab. It takes days rather than minutes to train! Therefore pre-trained weights are handed out as part of the lab material. This is the file that you needed to download from an external URL!

Given such a model we can now "sample" from it. Given some initial sequence of characters, as a seed, we can run the model for a number of sequence steps in order to find the sequence of predicted characters. But we can do more! Since the output is a vector of probabilities for new characters we can sample from it. As an example assume we only have 5 characters in our vocabulary and the prediction for a new character is:

(0.1, 0.3, 0.5, 0.0, 0.1)

So the character represented by the third position would be selected since it has the largest probability. But if we treat these number as probabilities we can say that half of the times we are going to select the third character, 30% of the time we select the character represented by the second position and so on. To even make things more random we can modify these probabilities such that they become more equal (high temperature) or that the largest probability becomes even larger (small temperature). This temperature parameter *temp* can be changed so that the sampling becomes completely random (very high *temp*) or completely deterministic (very low *temp*).

The code cell below just defines the model and loads the pre-trained weights onto the model. Run the cell! It can take some seconds to do that!


In [None]:
# set of all chars appearing in tensorflow source code:
chars = ['\t', '\n'] + [chr(x) for x in range(32,127)] + ['°', 'θ', 'μ', 'ν', '’', '\ufeff']
char_indices = dict((c, i) for i, c in enumerate(chars))

# build the model: two layers of LSTM
nh = 1024
arch = LSTM
inp = Input(batch_shape=(1, 1, len(chars)))
h1layer = arch(nh, return_sequences=True, stateful=True)
h2layer = arch(nh, stateful=True)
is_skip = True
h1 = h1layer(inp)
i2 = concatenate([inp, h1])
h2 = h2layer(i2)
h1last = Lambda(lambda h: h[:,-1,:])(h1)
rnnout = concatenate([h1last, h2])
beta_var = K.variable(1.0)
d = Dense(len(chars))(rnnout)
d = Lambda(lambda d: d * beta_var)(d)
out = Activation('softmax')(d)
m = Model([inp], [out])
m.compile(loss='categorical_crossentropy', optimizer='adam')
m.summary()

# Load the weights
m.load_weights("charmodel.h5")


### CellName: Ex4-2 (#11)
### CellType: Exercise
### Cell instruction: Instructions for question 8-9

## RNN: Sampling from a character model
In the cell below a function is defined that perform the actual sampling. It takes three parameters as input. (i) the starting sequence *seed*, (ii) the length of the sequence to generate *seqLen* and (iii) the temperature used during sampling *temp*. 

#### Question 8
Run the cell below with the given parameters. **Does it look like C++ code?**

#### Question 9
Change the seed to something else! **What happens when you decrease/increase the temperature (e.g 0.7-1.5)?**


In [None]:
def sampletxt(seed=" ", n=50, t=1.0):
    m.reset_states()
    K.set_value(beta_var, 1/t)

    for c in seed:
        x_pred = np.zeros((1, 1, len(chars)))
        x_pred[0, 0, char_indices[c]] = 1.
        p = m.predict(x_pred, verbose=[0])
        
    txt = []
    for i in range(n):
        preds = p[0].astype('float64')
        preds = preds/np.sum(preds) # some numericol issue
        probas = np.random.multinomial(1, preds, 1)
        next_index = np.argmax(probas)
        txt.append(chars[next_index])
        p = m.predict(probas[None,:,:], verbose=0)

    return ''.join(txt)

#seed = " "
#seed = "void "
seed = "for (int"
seqLen = 500
temp = 1.0

print(seed + sampletxt(seed,seqLen,temp))


# The report!


### Name

### Introduction

### Answers to questions

### Summary
  