<a href="https://colab.research.google.com/github/2107shantanu/All-Things-ML-DL-AI/blob/main/Sentiment_Analysis_with_RNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis with an RNN

Notebook Primary source : https://github.com/udacity/deep-learning/tree/master/sentiment-rnn Modified: Abhishek

In this notebook, you'll implement a recurrent neural network that performs sentiment analysis. Using an RNN rather than a feedfoward network is more accurate since we can include information about the *sequence* of words. Here we'll use a dataset of movie reviews, accompanied by labels.

The architecture for this network is shown below.

![alt text](https://github.com/udacity/deep-learning/raw/master/sentiment-rnn/assets/network_diagram.png)

Here, we'll pass in words to an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the word2vec lesson. You can actually train up an embedding with word2vec and use it here. But it's good enough to just have an embedding layer and let the network learn the embedding table on it's own.

From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer here. We're using the sigmoid because we're trying to predict if this text has positive or negative sentiment. The output layer will just be a single unit then, with a sigmoid activation function.

We don't care about the sigmoid outputs except for the very last one, we can ignore the rest. We'll calculate the cost from the output of the last step and the training label.

In [None]:
pip install tensorflow==1.13.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow==1.13.1
  Downloading tensorflow-1.13.1-cp37-cp37m-manylinux1_x86_64.whl (92.6 MB)
[K     |████████████████████████████████| 92.6 MB 97 kB/s 
Collecting tensorboard<1.14.0,>=1.13.0
  Downloading tensorboard-1.13.1-py3-none-any.whl (3.2 MB)
[K     |████████████████████████████████| 3.2 MB 43.3 MB/s 
Collecting tensorflow-estimator<1.14.0rc0,>=1.13.0
  Downloading tensorflow_estimator-1.13.0-py2.py3-none-any.whl (367 kB)
[K     |████████████████████████████████| 367 kB 76.0 MB/s 
[?25hCollecting keras-applications>=1.0.6
  Downloading Keras_Applications-1.0.8-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 8.5 MB/s 
Collecting mock>=2.0.0
  Downloading mock-4.0.3-py3-none-any.whl (28 kB)
Installing collected packages: mock, tensorflow-estimator, tensorboard, keras-applications, tensorflow
  Attempting uninstall: tensorflow-estimator
    Fo

In [None]:
import numpy as np

import tensorflow as tf

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [None]:
# import tensorflow.compat.v1 as tf

# tf.disable_v2_behavior() 

## Download the required dataset

There are two files: reviews.txt and labels.txt file. The folder can be accessed at http://bit.ly/3546d3Y. 

If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed. 



In [None]:
with open('./reviews.txt', 'r') as f:
    reviews = list(filter(None, f.read().split('\n')))

with open('./labels.txt', 'r') as f:
    labels = list(filter(None, f.read().split('\n')))

In [None]:
print(len(reviews))
print(len(labels))
print()
print('First review:', reviews[0])
print('First review label:', labels[0])
print()
print('Last review:', reviews[-1])
print('Last review label:', labels[-1])

25000
25000

First review: bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   
First review label: positive

Last review: this is one of the dumbest films  i  ve ever seen . it rips off nearly ever type of thriller and 

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.

Now you're going to encode the words with integers by building a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers **start at 1, not 0**.
Also, convert the reviews to integers and store the reviews in a new list called `reviews_ints`. 

> **Exercise** Remove tokens that are either punctuation marks or stop words. What are stop words? https://en.wikipedia.org/wiki/Stop_words

In [None]:
# Step 1 
# Convert a string of review to a list of tokens.
review_tokens_list = []
for review in reviews:
    review_tokens_list.append(list(filter(None, review.split(' '))))
    
# Verify
print(reviews[0])
print(review_tokens_list[0])    



bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life  such as  teachers  . my   years in the teaching profession lead me to believe that bromwell high  s satire is much closer to reality than is  teachers  . the scramble to survive financially  the insightful students who can see right through their pathetic teachers  pomp  the pettiness of the whole situation  all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school  i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i  m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn  t   
['bromwell', 'high', 'is', 'a', 'cartoon', 'comedy', '.', 'it', 'ran', 'at', 'the', 'same', 'time', 'as', 'some', 'other', 'programs', 'about', 'school', 'life', 'such

## Converting words to integer ids

Usually words that are rare in the corpus (frequency less than say 5) are not assigned individual ids. Instead, they all are replaced with a new token ('unk') and assigned the same integer id to these rare words. 


**Exercise** Implement the rare word integer id conversion logic.

In [None]:
# Step 2
# To encode words to integer, first get all unique words used in reviews

token_set = set()
count = 0
for review_tokens in review_tokens_list:
    for token in review_tokens: 
        count +=1
        token_set.add(token)
print(count)
print("Unique tokens:", len(token_set))

6347388
Unique tokens: 74073


In [None]:
# Step 3
# Convert tokens to a unique numeric id
token_to_int = {token: integer for integer, token in enumerate(token_set, 1)}

# Verify
print(len(token_to_int))
print(token_to_int['the'])
print(token_to_int['a'])
print(token_to_int)

74073
10003
21538


In [None]:
l = ['a', 'b', 'c']

for idx, value in enumerate(l, 20):
  print(idx, value)

20 a
21 b
22 c


In [None]:
# Step 4 
# Convert review tokens to tokens of integers

review_ints_list = []
for review_tokens in review_tokens_list:
    # One liner
    review_ints_list.append([token_to_int[x] for x in review_tokens])
    # Same code can be written as 
    # 
    # review_int = []
    # for token in review_tokens:
    #     review_int.append(token_to_int[x])
    # review_ints_list.append(review_int)

# Verify
print(review_tokens_list[0])
print(review_ints_list[0])

['bromwell', 'high', 'is', 'a', 'cartoon', 'comedy', '.', 'it', 'ran', 'at', 'the', 'same', 'time', 'as', 'some', 'other', 'programs', 'about', 'school', 'life', 'such', 'as', 'teachers', '.', 'my', 'years', 'in', 'the', 'teaching', 'profession', 'lead', 'me', 'to', 'believe', 'that', 'bromwell', 'high', 's', 'satire', 'is', 'much', 'closer', 'to', 'reality', 'than', 'is', 'teachers', '.', 'the', 'scramble', 'to', 'survive', 'financially', 'the', 'insightful', 'students', 'who', 'can', 'see', 'right', 'through', 'their', 'pathetic', 'teachers', 'pomp', 'the', 'pettiness', 'of', 'the', 'whole', 'situation', 'all', 'remind', 'me', 'of', 'the', 'schools', 'i', 'knew', 'and', 'their', 'students', '.', 'when', 'i', 'saw', 'the', 'episode', 'in', 'which', 'a', 'student', 'repeatedly', 'tried', 'to', 'burn', 'down', 'the', 'school', 'i', 'immediately', 'recalled', '.', '.', '.', '.', '.', '.', '.', '.', '.', 'at', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', 'high', '.', 'a', 'classic', 

### Encoding the labels

Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.


In [None]:
labels_int = np.array([[1, 0] if label == 'positive' else [0, 1] for label in labels])

# Verify
print(labels_int.shape)
print(labels_int[0], labels[0])
print(labels_int[-1], labels[-1])

(25000, 2)
[1 0] positive
[0 1] negative


### Basic statistics of review length

In [None]:
# Calculate length list
length_list = [len(x) for x in review_tokens_list]

# Maximum review length
print(max(length_list))
# Minimun review length
print(min(length_list))
# Average
print(np.mean(length_list))
# Median 
print(np.median(length_list))

2633
11
253.89552
190.0


Now, create an array `features` that contains the data we'll pass to the network. The data should come from `review_ints`, since we want to feed integers to the network. Each row should be 200 elements long. For reviews shorter than 200 words, left pad with 0s. That is, if the review is `['best', 'movie', 'ever']`, `[117, 18, 128]` as integers, the row will look like `[0, 0, 0, ..., 0, 117, 18, 128]`. For reviews longer than 200, use on the first 200 words as the feature vector.


**Padding can be performed either at the start or in the end. However, usually it is performed in the end.**


This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.

> **Advance Exercise** Dynamic padding of batches in the end. 

In [None]:
seq_len = 200
# Create a numpy array of shape (number of reviews, maximum sequence length)
features = np.zeros((len(review_ints_list), seq_len), dtype=int)

for i, row in enumerate(review_ints_list):
    features[i, -len(row):] = np.array(row)[:seq_len]
    
# Verify
print(features[0])
print(features[-1])

[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0 58471 60924 72935
 21538 66461 60457 45486 61101 51354  7550 10003 17954 60886  9727 14736
 68819 23013 62321 72236  8041   999  9727  5583 45486 29643 50692 43224
 10003 15345  5080 10766 33329 62534 70640 50618 58471 60924 49178 29208
 72935 35091 28510 62534   406 49462 72935  5583 45486 10003 28121 62534
 14645 17309 10003 62901 35456 38523 20821  8437 19233 24439 49981 44366
  5583 60785 10003 10784 25259 10003  6993 53293 39837 73160 33329 25259
 10003 42666 62907 55957  8760 49981 35456 45486 43889 62907 49822 10003
 53764 43224 69989 21538 12240 70069 43417 62534   671 57347 10003 72236
 62907 61378 64006 45486 45486 45486 45486 45486 45486 45486 45486 45486
  7550 45486 45486 45486 45486 45486 45486 45486 45486 45486 45486 60924
 45486 21538  7175  9175 14969 62907 52906 10617 62

## Training, Validation, Test



With our data in nice shape, we'll split it into training, validation, and test sets.

Create the training, validation, and test sets here. You'll need to create sets for the features and the labels, `train_x` and `train_y` for example. Define a split fraction, `split_frac` as the fraction of data to keep in the training set. Usually this is set around 0.7 to 0.8. The rest of the data will be split in half to create the validation and testing data.

In [None]:
split_frac = 0.8
split_idx = int(len(features)*0.8)
train_x, val_x = features[:split_idx], features[split_idx:]
train_y, val_y = labels_int[:split_idx], labels_int[split_idx:]

test_idx = int(len(val_x)*0.5)
val_x, test_x = val_x[:test_idx], val_x[test_idx:]
val_y, test_y = val_y[:test_idx], val_y[test_idx:]

print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		(2500, 200)


With train, validation, and text fractions of 0.8, 0.1, 0.1, the final shapes should look like:
```
                    Feature Shapes:
Train set: 		 (20000, 200) 
Validation set: 	(2500, 200) 
Test set: 		  (2500, 200)
```

## Build the graph

Here, we'll build the graph. First up, defining the hyperparameters.

* `lstm_size`: Number of units in the hidden layers in the LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.
* `lstm_layers`: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.
* `batch_size`: The number of reviews to feed the network in one training pass. Typically this should be set as high as you can go without running out of memory.
* `learning_rate`: Learning rate

In [None]:
lstm_size = 50
lstm_layers = 1
batch_size = 200
learning_rate = 0.001

For the network itself, we'll be passing in our 200 element long review vectors. Each batch will be `batch_size` vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability.

In [None]:
n_tokens = len(token_to_int) + 1 # Adding 1 because we use 0's for padding, dictionary started at 1
inputs_ = tf.placeholder(tf.int32, [None, None], name='inputs')
labels_ = tf.placeholder(tf.int32, [None, None], name='labels')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')

### Embedding

Now we'll add an embedding layer. We need to do this because there are 74000 words in our vocabulary. It is massively inefficient to one-hot encode our classes here. You should remember dealing with this problem from the word2vec lesson. Instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using word2vec, then load it here. But, it's fine to just make a new layer and let the network learn the weights.

Create the embedding lookup matrix as a `tf.Variable`. Use that embedding matrix to get the embedded vectors to pass to the LSTM cell with [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup). This function takes the embedding matrix and an input tensor, such as the review vectors. Then, it'll return another tensor with the embedded vectors. So, if the embedding layer as 200 units, the function will return a tensor with size [batch_size, 200].

> **Exercise:** Instead of randomly initializing the embedding matrix, use a pretrained embeddings.



In [None]:
## Load a pretrained word embedding file
def load_embedding(token_to_int, file_path, embedding_dim):
    token_to_vector = {}
    with open(file_path, encoding='utf-8') as file_p:
        for line in file_p:
            splits = line.split(' ')
            if splits[-1] == '\n':
                del splits[-1]
            token = splits[0]
            if token in token_to_int:
                token_to_vector[token] = np.array([float(x) for x in splits[1:]],
                                                  dtype=np.core.numerictypes.float32)
    
    pre_trained_embeddings = np.zeros((len(token_to_int) + 1, embedding_dim),
                                      dtype=np.core.numerictypes.float32
                                     )
    for token in token_to_int:
        if token in token_to_vector:
            pre_trained_embeddings[token_to_int[token]] = token_to_vector[token]
        else:
            pre_trained_embeddings[token_to_int[token]] = np.random.uniform(-0.1, 0.1, embedding_dim)
    print('Total tokens:', len(token_to_int))
    print('Token found in pre-trained file', len(token_to_vector))
    return pre_trained_embeddings

In [None]:
# Size of the embedding vectors (number of units in the embedding layer)
embed_size = 50

embedding = tf.Variable(tf.random_uniform((n_tokens, embed_size), -0.1, 0.1))   
#tf.random.uniform(shape,minval=0,maxval=None,dtype=tf.dtypes.float32,seed=None,name=None)



# inputs will be of shape : [batch_size, max_sequence_length]
embed = tf.nn.embedding_lookup(embedding, inputs_)  #Looks up embeddings for the given ids from a list of tensors.
# embed will be of shape : [batch_size, max_sequence_length, embedding_size]
#tf.nn.embedding_lookup(params, ids, max_norm=None, name=None)

Instructions for updating:
Colocations handled automatically by placer.


Instructions for updating:
Colocations handled automatically by placer.


### LSTM cell

![alt text](https://github.com/udacity/deep-learning/raw/master/sentiment-rnn/assets/network_diagram.png)


Next, we'll create our LSTM cells to use in the recurrent network ([TensorFlow documentation](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn)). Here we are just defining what the cells look like. This isn't actually building the graph, just defining the type of cells we want in our graph.

To create a basic LSTM cell for the graph, you'll want to use `tf.contrib.rnn.BasicLSTMCell`. Looking at the function documentation:

```
tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=<function tanh at 0x109f1ef28>)
```

you can see it takes a parameter called `num_units`, the number of units in the cell, called `lstm_size` in this code. So then, you can write something like 

```
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
```

to create an LSTM cell with `num_units`. Next, you can add dropout to the cell with `tf.contrib.rnn.DropoutWrapper`. This just wraps the cell in another cell, but with dropout added to the inputs and/or outputs. It's a really convenient way to make your network better with almost no effort! So you'd do something like

```
drop = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)
```

Most of the time, you're network will have better performance with more layers. That's sort of the magic of deep learning, adding more layers allows the network to learn really complex relationships. Again, there is a simple way to create multiple layers of LSTM cells with `tf.contrib.rnn.MultiRNNCell`:

```
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
```

Here, `[drop] * lstm_layers` creates a list of cells (`drop`) that is `lstm_layers` long. The `MultiRNNCell` wrapper builds this into multiple layers of RNN cells, one for each cell in the list.

So the final cell you're using in the network is actually multiple (or just one) LSTM cells with dropout. But it all works the same from an achitectural viewpoint, just a more complicated graph in the cell.



Here is [a tutorial on building RNNs](https://www.tensorflow.org/tutorials/recurrent) that will help you out.


In [None]:
# Your basic LSTM cell


lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
    
# Add dropout to the cell
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
# Stack up multiple LSTM layers, for deep learning
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
    
# Getting an initial state of all zeros
initial_state = cell.zero_state(batch_size, tf.float32)


For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.


Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.


Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.


Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.


### RNN forward pass


![alt text](https://github.com/udacity/deep-learning/raw/master/sentiment-rnn/assets/network_diagram.png)


Now we need to actually run the data through the RNN nodes. You can use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) to do this. You'd pass in the RNN cell you created (our multiple layered LSTM `cell` for instance), and the inputs to the network.

```
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
```

Above I created an initial state, `initial_state`, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. `tf.nn.dynamic_rnn` takes care of most of the work for us. We pass in our cell and the input to the cell, then it does the unrolling and everything else for us. It returns outputs for each time step and the final_state of the hidden layer.



In [None]:

outputs, final_state = tf.nn.dynamic_rnn(cell, embed,
                                         initial_state=initial_state)

# Here Output will be of size [batch_size, max_time, cell.output_size]
# Here final_state will be a tuple of (c, h), where c is the cell state and h is the hidden state
# For last cell, h is equal to the output of the last LSTM cell.

last_cell_state, last_cell_output = final_state[0]  # This line will change if more than 1 layer LSTM network is used. 
# The last_cell_output will be of shape [batch_size, cell.output_size]

#For debugging. Pass a sample data, and print the output
#with tf.Session() as sess:
#    sess.run(tf.global_variables_initializer())
#    feed = {inputs_: train_x[0:200],
#            labels_: train_y[0:200],
#            keep_prob: 0.5
#           }
#    print(sess.run(last_cell_output, feed_dict=feed))


Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API


Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


### Output

We only care about the final output, we'll be using that as our sentiment prediction. So we need to grab the last output with `outputs[:, -1]`, the calculate the cost from that and `labels_`.

In [None]:
# Now we will apply a single layer NN.

# Input dimension = [cell.output_size]
# Output dimension = 2 [ two class classification]
Weights = tf.Variable(tf.random_normal((lstm_size, 2), stddev=0.1), trainable=True)
Bias = tf.Variable(tf.zeros([1]))



predictions = tf.nn.softmax(tf.matmul(last_cell_output, Weights) + Bias)

# Multiply true_label * log(predicted labels)
cross_entropy_step1 = tf.cast(labels_, tf.float32) * tf.log(predictions)  # Dimension = [batch_size, 2]


cross_entropy_step2 = tf.reduce_sum(cross_entropy_step1, axis=1) # Dimension = [batch_size]

cross_entropy_step3 = tf.reduce_mean(cross_entropy_step2) # dimension = [1]

cross_entropy_loss = -cross_entropy_step3

#For debugging. Pass a sample data, and print the output
#with tf.Session() as sess:
#    sess.run(tf.global_variables_initializer())
#    feed = {inputs_: train_x[0:200],
#            labels_: train_y[0:200],
#            keep_prob: 0.5
#           }
#   print(sess.run(cross_entropy_loss, feed_dict=feed))
    
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy_loss)

Instructions for updating:
Use tf.cast instead.


Instructions for updating:
Use tf.cast instead.


### Validation accuracy

Here we can add a few nodes to calculate the accuracy which we'll use in the validation pass.

In [None]:
equal_matrix = tf.equal(tf.argmax(predictions, axis=1), tf.argmax(labels_, axis=1)) # dimension = batch_size

accuracy = tf.reduce_mean(tf.cast(equal_matrix, tf.float32))

#For debugging. Pass a sample data, and print the output
#with tf.Session() as sess:
#    sess.run(tf.global_variables_initializer())
#    feed = {inputs_: train_x[0:100],
#            labels_: train_y[0:100],
#            keep_prob: 0.5
#           }
#    print(sess.run(accuracy, feed_dict=feed))


### Batching

This is a simple function for returning batches from our data. First it removes data such that we only have full batches. Then it iterates through the `x` and `y` arrays and returns slices out of those arrays with size `[batch_size]`.

> **Exercise:** Shuffle batch randomly. Why shuffle? https://www.quora.com/Does-the-order-of-training-data-matter-when-training-neural-networks, https://stackoverflow.com/questions/40816721/should-i-shuffle-the-data-to-train-a-neural-network-using-backpropagation

In [None]:
def get_batches(x, y, batch_size=100):
    
    n_batches = len(x)//batch_size
    
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

## Training

Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. Before you run this, make sure the `checkpoints` directory exists.

In [None]:
epochs = 4


saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    iteration = 1
    for e in range(epochs):        
        for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
            feed = {inputs_: x,
                    labels_: y,
                    keep_prob: 0.5
                   }
            loss, _, model_predictions = sess.run([cross_entropy_loss, optimizer, predictions], feed_dict=feed)
            

            if iteration%25==0:
                print("Epoch: {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss))

            if iteration%100==0:
                val_result = []
                for x, y in get_batches(val_x, val_y, batch_size):
                    feed = {inputs_: x,
                            labels_: y,
                            keep_prob: 1}
                    batch_result = sess.run(equal_matrix, feed_dict=feed)
                    val_result.extend(batch_result)
                print("Val acc: {:.3f}".format(np.mean(val_result)))
            iteration +=1
    saver.save(sess, "checkpoints/sentiment.ckpt")

Epoch: 0/4 Iteration: 25 Train loss: 0.685
Epoch: 0/4 Iteration: 50 Train loss: 0.578
Epoch: 0/4 Iteration: 75 Train loss: 0.536
Epoch: 0/4 Iteration: 100 Train loss: 0.401
Val acc: 0.830
Epoch: 1/4 Iteration: 125 Train loss: 0.500
Epoch: 1/4 Iteration: 150 Train loss: 0.356
Epoch: 1/4 Iteration: 175 Train loss: 0.252
Epoch: 1/4 Iteration: 200 Train loss: 0.203
Val acc: 0.849
Epoch: 2/4 Iteration: 225 Train loss: 0.324
Epoch: 2/4 Iteration: 250 Train loss: 0.281
Epoch: 2/4 Iteration: 275 Train loss: 0.175
Epoch: 2/4 Iteration: 300 Train loss: 0.108
Val acc: 0.832
Epoch: 3/4 Iteration: 325 Train loss: 0.134
Epoch: 3/4 Iteration: 350 Train loss: 0.243
Epoch: 3/4 Iteration: 375 Train loss: 0.101
Epoch: 3/4 Iteration: 400 Train loss: 0.083
Val acc: 0.823


## Testing

In [None]:
test_acc = []
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_result = []
    for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
        feed = {inputs_: x,
                labels_: y,
                keep_prob: 1}
        batch_result = sess.run(equal_matrix, feed_dict=feed)
        test_result.extend(batch_result)
    print("Test accuracy: {:.3f}".format(np.mean(test_result)))

Instructions for updating:
Use standard file APIs to check for files with this prefix.


Instructions for updating:
Use standard file APIs to check for files with this prefix.


INFO:tensorflow:Restoring parameters from checkpoints/sentiment.ckpt


INFO:tensorflow:Restoring parameters from checkpoints/sentiment.ckpt


Test accuracy: 0.816
