# Task 1: XOR

In [3]:
# Import modules
from __future__ import print_function
import tensorflow as tf
import numpy as np
from numpy.random import shuffle
import time
import matplotlib.pyplot as plt

# Plot configurations
% matplotlib inline

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
% load_ext autoreload
% autoreload 2

## Task 1, Part 1: Backpropagation through time (BPTT)

**Question:** Consider a simple RNN network shown in the following figure, where __ _wi, wh, b, a, c_ __ are the scalar parameters of the network. The loss function is the **mean squared error (MSE)**. Given input (x0, x1) = (1, 0), ground truth (g1, g2) = (1, 1), h0 = 0, (wi, wh, b, a, c) = (1, 1, 1, 1, 1), compute __ _(dwi, dwh, db, da, dc)_ __, which are the gradients of loss with repect to 5 parameters __ _(wi, wh, b, a, c)_ __.

![bptt](./img/bptt.png)

<span style="color:red">TODO:</span>

Answer the above question. 

* **[fill in here: Enter your derivations and the computational process]**
* You can use LATEX to edit the equations, and Jupyter notebook can recognize basic LATEX syntax. Alternatively, you can edit equations in some other environment and then paste the screenshot of the equations here.

## Task 1, Part 2: Use tensorflow modules to create XOR network

In this part, you need to build and train an XOR network that can learn the XOR function. It is a very simple implementation of RNN and will give you an idea how RNN is built and how to train it.

### XOR network

XOR network can learn the XOR $\oplus$ function

As shown in the figure below, and for instance, if input $(x0, x1, x2)$=(1,0,0), then output $(y1, y2, y3)$=(1,1,1). That is, $y_n = x_0\oplus x_1 \oplus ... \oplus x_{n-1}$

![xor_net](./img/xor.png)

### Create data set
This function provides you the way to generate the data which is required for the training process. You should utilize it when building your training function for the LSTM. Please read the source code for more information.

In [1]:
from ecbm4040.xor.utils import create_dataset

### Build a network using a Tensorlow LSTMCell
This section shows an example how to build a RNN network using an LSTM cell. LSTM cell is an inbuilt class in tensorflow which implements the real behavior of the LSTM neuron. 

Reference: [TensorFlow LSTM cell](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMCell)

In [4]:
from tensorflow.contrib.rnn import LSTMCell

tf.reset_default_graph()

# Input shape: (num_samples, seq_length, input_dimension)
# Output shape: (num_samples, output_ground_truth), and output_ground_truth is 0/1. 
input_data = tf.placeholder(tf.float32,shape=[None,None,1])
output_data = tf.placeholder(tf.int64,shape=[None,None])

# define LSTM cell
lstm_units = 64
cell = LSTMCell(lstm_units,num_proj=2,state_is_tuple=True)

# create LSTM network: you can also choose other modules provided by tensorflow, like static_rnn etc.
out,_ = tf.nn.dynamic_rnn(cell,input_data,dtype=tf.float32)
pred = tf.argmax(out,axis=2)

# loss function
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=output_data,logits=out))

# optimization
optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(loss)

# accuracy
correct_num = tf.equal(output_data,pred)
accuracy = tf.reduce_mean(tf.cast(correct_num,tf.float32))

### Training 

<span style='color:red'>TODO:</span> 
1. Build your training funciton for RNN; 
2. Plot the cost during the traning

In [12]:
import random

In [18]:
#Create Data

#Create dataset
num_samples=100
fraction_validation=0.2
X_train, y_train = create_dataset(num_samples)

# Data organizations:

num_validation = int(fraction_validation*num_samples)

X_val = X_train[-num_validation:, :, :]
y_val = y_train[-num_validation:, :]

X_train = X_train[:-num_validation, :, :]
y_train = y_train[:-num_validation, :]

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)

Train data shape:  (80, 8, 1)
Train labels shape:  (80, 8)
Validation data shape:  (20, 8, 1)
Validation labels shape:  (20, 8)


In [37]:
epoch=20
batch_size=5

iters = int(X_train.shape[0] / batch_size)
print('number of batches for training: {}'.format(iters))

iter_total = 0
best_acc = 0

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epc in range(epoch):
        print("epoch {} ".format(epc + 1))

        for itr in range(iters):

            training_batch_x = X_train[itr * batch_size: (1 + itr) * batch_size]
            training_batch_y = y_train[itr * batch_size: (1 + itr) * batch_size]

            _, cur_loss = sess.run([optimizer, loss], feed_dict={input_data: training_batch_x, output_data: training_batch_y})
            print('{}/{} loss: {}'.format(
                    batch_size * (itr + 1),
                    X_train.shape[0],
                    cur_loss))
print("Traning ends.")

number of batches for training: 16
epoch 1 
5/80 loss: 0.6941953897476196
10/80 loss: 0.8495915532112122
15/80 loss: 0.6931100487709045
20/80 loss: 0.6944262385368347
25/80 loss: 0.6917951107025146
30/80 loss: 0.6796250343322754
35/80 loss: 0.63965904712677
40/80 loss: 0.7177687883377075
45/80 loss: 0.6189892292022705
50/80 loss: 0.6636059880256653
55/80 loss: 0.7714413404464722
60/80 loss: 0.7221395373344421
65/80 loss: 0.6739863157272339
70/80 loss: 0.6852643489837646
75/80 loss: 0.672961413860321
80/80 loss: 0.6915510296821594
epoch 2 
5/80 loss: 0.6815937757492065
10/80 loss: 0.6616198420524597
15/80 loss: 0.6795204281806946
20/80 loss: 0.7202610969543457
25/80 loss: 0.7321807146072388
30/80 loss: 0.6548932194709778
35/80 loss: 0.5797474980354309
40/80 loss: 0.6449407935142517
45/80 loss: 0.6097747087478638
50/80 loss: 0.576815128326416
55/80 loss: 0.7297480702400208
60/80 loss: 0.5665927529335022
65/80 loss: 0.6809547543525696
70/80 loss: 0.7391417622566223
75/80 loss: 0.681183040

40/80 loss: 5.020599201088771e-05
45/80 loss: 5.598127609118819e-05
50/80 loss: 4.209169128444046e-05
55/80 loss: 5.9529604186536744e-05
60/80 loss: 4.663585059461184e-05
65/80 loss: 5.9103582316311076e-05
70/80 loss: 4.625150540960021e-05
75/80 loss: 6.414293602574617e-05
80/80 loss: 4.502667434280738e-05
epoch 17 
5/80 loss: 6.877963460283354e-05
10/80 loss: 5.2735624194610864e-05
15/80 loss: 6.648792623309419e-05
20/80 loss: 5.071224950370379e-05
25/80 loss: 6.194672459969297e-05
30/80 loss: 4.589684976963326e-05
35/80 loss: 4.4832944695372134e-05
40/80 loss: 4.637372330762446e-05
45/80 loss: 5.1773553423117846e-05
50/80 loss: 3.9072871004464105e-05
55/80 loss: 5.471113763633184e-05
60/80 loss: 4.324163819546811e-05
65/80 loss: 5.467252412927337e-05
70/80 loss: 4.299436477595009e-05
75/80 loss: 5.921419869991951e-05
80/80 loss: 4.178145172772929e-05
epoch 18 
5/80 loss: 6.327582377707586e-05
10/80 loss: 4.8670983233023435e-05
15/80 loss: 6.12761577940546e-05
20/80 loss: 4.6930708776

## Task 1, Part 3 :  Build your own LSTMCell
In this part, you need to build your own LSTM cell to achieve the LSTM functionality. 

<span style="color:red">TODO:</span> 
1. Finish class **MyLSTMCell** in ecbm4040/xor/rnn.py;
2. Write the training function for your RNN;
3. Plot the cost during training.

In [None]:
from ecbm4040.xor.rnn import MyLSTMCell

# recreate xor netowrk with your own LSTM cell
tf.reset_default_graph()

#Input shape: (num_samples,seq_length,input_dimension)
#Output shape: (num_samples, output_ground_truth), and output_ground_truth is 0/1. 
input_data = tf.placeholder(tf.float32,shape=[None,None,1])
output_data = tf.placeholder(tf.int64,shape=[None,None])

# recreate xor netowrk with your own LSTM cell
lstm_units = 64
cell = MyLSTMCell(lstm_units,num_proj=2)

# create LSTM network: you can also choose other modules provided by tensorflow, like static_rnn etc.
out,_ = tf.nn.dynamic_rnn(cell,input_data,dtype=tf.float32)
pred = tf.argmax(out,axis=2)

# loss function
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=output_data,logits=out))
# optimization
optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(loss)
# accuracy
correct = tf.equal(output_data,pred)
accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))

### Training

In [None]:
# YOUR TRAINING AND PLOTTING CODE HERE

In [2]:
import tensorflow as tf
print (tf.__file__)

C:\Users\zoran\Anaconda3\envs\dlWorksA3\lib\site-packages\tensorflow\__init__.py


In [36]:
print(tf.__file__)

/Users/ADI/anaconda/envs/dlWorks/lib/python3.5/site-packages/tensorflow/__init__.py
