# Task 1: XOR

In [1]:
# Import modules
from __future__ import print_function
import tensorflow as tf
import numpy as np
from numpy.random import shuffle
import time
import matplotlib.pyplot as plt

# Plot configurations
% matplotlib inline

# Notebook auto reloads code. (Ref: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython)
% load_ext autoreload
% autoreload 2

## Task 1, Part 1: Backpropagation through time (BPTT)

**Question:** Consider a simple RNN network shown in the following figure, where _wx, wh, b1, w, b2_ are the scalar parameters of the network. The loss function is the **mean squared error (MSE)**. Given input _(x1, x2) = (1, 0)_, ground truth _(g1, g2) = (1, 1), h0 = 0, (wx, wh, b1, w, b2) = (1, 1, 1, 1, 1)_, compute _(dwx, dwh, db1, dw, db2)_, which are the gradients of loss with repect to 5 parameters _(wx, wh, b1, w, b2)_.

![bptt](./img/bptt2.jpg)

<span style="color:red">TODO:</span>

Answer the above question. 

* **[fill in here: Enter your derivations and the computational process![task1](./img/task1.png)]**
* You can use LATEX to edit the equations, and Jupyter notebook can recognize basic LATEX syntax. Alternatively, you can edit equations in some other environment and then paste the screenshot of the equations here.

## Task 1, Part 2: Use tensorflow modules to create XOR network

In this part, you need to build and train an XOR network that can learn the XOR function. It is a very simple implementation of RNN and will give you an idea how RNN is built and how to train it.

### XOR network

XOR network can learn the XOR $\oplus$ function

As shown in the figure below, and for instance, if input $(x0, x1, x2)$=(1,0,0), then output $(y1, y2, y3)$=(1,1,1). That is, $y_n = x_0\oplus x_1 \oplus ... \oplus x_{n-1}$

![xor_net](./img/xor.png)

### Create data set
This function provides you the way to generate the data which is required for the training process. You should utilize it when building your training function for the GRU. Please read the source code for more information.

In [2]:
from ecbm4040.xor.utils import create_dataset

### Build a network using a Tensorlow GRUCell
This section shows an example how to build a RNN network using an GRU cell. GRU cell is an inbuilt class in tensorflow which implements the real behavior of the GRU neuron. 

Reference: 
1. [TensorFlow GRU cell](https://www.tensorflow.org/versions/r1.8/api_docs/python/tf/contrib/rnn/GRUCell)
2. [Understanding GRU networks](https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be)

In [3]:
from tensorflow.contrib.rnn import GRUCell

In [4]:
tf.reset_default_graph()

# Input shape: (num_samples, seq_length, input_dimension)
# Output shape: (num_samples, output_ground_truth), and output_ground_truth is 0/1.
input_data = tf.placeholder(tf.float32, shape=[None,None,1])
output_data = tf.placeholder(tf.int64, shape=[None,None])

# define GRU cell
num_units = 64
cell = GRUCell(num_units)

# create GRU network: you can also choose other modules provided by tensorflow, like static_rnn etc.
hidden, _ = tf.nn.dynamic_rnn(cell, input_data, dtype=tf.float32)

# generate output from the hidden information
output_shape = 2
out = tf.layers.dense(hidden, output_shape)
pred = tf.argmax(out, axis=2)

# loss function
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=output_data,logits=out))

# optimization
optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(loss)

# accuracy
correct_num = tf.equal(output_data,pred)
accuracy = tf.reduce_mean(tf.cast(correct_num,tf.float32))

### Training 

<span style='color:red'>TODO:</span> 
1. Build your training funciton for RNN; 
2. Plot the cost during the traning

In [5]:
# YOUR TRAINING AND PLOTTING CODE HERE
#generate data
#def create_dataset(num_samples, seq_len=8):
#'''
X_train,y_train=create_dataset(num_samples=4000,seq_len=8)

epoch=30
training_loss=[]

with tf.Session() as sess:
    #init variable
    sess.run(tf.global_variables_initializer())
    for epc in range(epoch):
        print("epoch {} ".format(epc + 1))
        _,cur_loss=sess.run([optimizer,loss],feed_dict={input_data:X_train,output_data:y_train})
        print("loss {}".format(cur_loss))
        training_loss.append(cur_loss)
#'''
'''
# YOUR TRAINING AND PLOTTING CODE HERE
X_train, y_train = create_dataset(num_samples=3000, seq_len=8)

epoch = 20
train_loss = []

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epc in range(epoch):
        print("epoch {} ".format(epc + 1))
        _, cur_loss = sess.run([optimizer, loss], feed_dict={input_data: X_train, output_data: y_train})
        train_loss.append(cur_loss)
        print('loss: {}'.format(cur_loss))
'''
        

epoch 1 
loss: 0.694529116153717
epoch 2 
loss: 1.2783657312393188
epoch 3 
loss: 0.782167911529541
epoch 4 
loss: 0.92734694480896
epoch 5 
loss: 0.7945668697357178
epoch 6 
loss: 0.7216135859489441
epoch 7 
loss: 0.7232560515403748
epoch 8 
loss: 0.7185394167900085
epoch 9 
loss: 0.7014267444610596
epoch 10 
loss: 0.7073556184768677
epoch 11 
loss: 0.6949331760406494
epoch 12 
loss: 0.7051928639411926
epoch 13 
loss: 0.6976502537727356
epoch 14 
loss: 0.6919026970863342
epoch 15 
loss: 0.6951150894165039
epoch 16 
loss: 0.6866655945777893
epoch 17 
loss: 0.6821568012237549
epoch 18 
loss: 0.6822518706321716
epoch 19 
loss: 0.6741083860397339
epoch 20 
loss: 0.6681184768676758


## Task 1, Part 3 :  Build your own GRUCell
In this part, you need to build your own GRU cell to achieve the GRU functionality. 

<span style="color:red">TODO:</span> 
1. Finish class **MyGRUCell** in ecbm4040/xor/rnn.py;
2. Write the training function for your RNN;
3. Plot the cost during training.

In [None]:
from ecbm4040.xor.rnn import MyGRUCell

In [None]:
# recreate xor netowrk with your own GRU cell
tf.reset_default_graph()

#Input shape: (num_samples,seq_length,input_dimension)
#Output shape: (num_samples, output_ground_truth), and output_ground_truth is 0/1. 
input_data = tf.placeholder(tf.float32,shape=[None,None,1])
output_data = tf.placeholder(tf.int64,shape=[None,None])

# recreate xor netowrk with your own GRU cell
num_units = 64
cell = MyGRUCell(num_units)

# create GRU network: you can also choose other modules provided by tensorflow, like static_rnn etc.
hidden, _ = tf.nn.dynamic_rnn(cell,input_data,dtype=tf.float32)

# generate output from the hidden information
output_shape = 2
out = tf.layers.dense(hidden, output_shape)
pred = tf.argmax(out,axis=2)

# loss function
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=output_data,logits=out))
# optimization
optimizer = tf.train.AdamOptimizer(learning_rate=0.1).minimize(loss)
# accuracy
correct = tf.equal(output_data,pred)
accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))

### Training

In [None]:
# YOUR TRAINING AND PLOTTING CODE HERE