# Owner Categorization with an RNN

In this notebook, I will implement a recurrent neural network that categorize owners base on their name. Using an RNN rather than a feedfoward network is more accurate since we can include information about the *sequence* of words. Here we'll use a dataset of owners name from the Philipines and India.

The architecture for this network is shown below.

Here, we'll pass in words to an embedding layer.

From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer here. The output layer will just be a single unit then, with a sigmoid activation function.

We are not interested in the sigmoid outputs except for the very last one, we can ignore the rest. We'll calculate the cost from the output of the last step and the training label.

Charles Jansen

In [2]:
import tensorflow as tf 
from sklearn.model_selection import train_test_split
import csv
import codecs
import pandas as pd
import numpy as np
from tqdm import tqdm

In [3]:
%%time
TRAIN_DATA_FILE = 'F:/DS-main/Kaggle-main/Quora Question Pairs - inputs/data/finalTrain.csv'

Wall time: 0 ns


In [4]:
%%time
y_pd = pd.read_csv(TRAIN_DATA_FILE, sep='\t', encoding="utf-8", usecols=[0])
preg1_pd = pd.read_csv(TRAIN_DATA_FILE, sep='\t', encoding="utf-8", usecols=[1])
preg2_pd = pd.read_csv(TRAIN_DATA_FILE, sep='\t', encoding="utf-8", usecols=[2])

Wall time: 4.38 s


In [5]:
preg2_pd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 404290 entries, 0 to 404289
Data columns (total 1 columns):
2    404288 non-null object
dtypes: object(1)
memory usage: 3.1+ MB


In [6]:
preg2_pd.head()

Unnamed: 0,2
0,What is the step by step guide to invest in sh...
1,What would happen if the Indian government sto...
2,How can Internet speed be increased by hacking...
3,Find the remainder when [math]23^{24}[/math] i...
4,Which fish would survive in salt water?


In [7]:
print(type(y_pd))
y = y_pd.values
preg1 = preg1_pd.values
preg2 = preg2_pd.values
print(type(y))

<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>


In [8]:
y = [ligne[0] for _,ligne in enumerate(y)]
preg1 = [str(ligne[0]) for _,ligne in enumerate(preg1)]
preg2 = [str(ligne[0]) for _,ligne in enumerate(preg2)]
y[0:8]

[0, 0, 0, 0, 0, 1, 0, 1]

In [9]:
preg1temp = preg1
print(len(y))
print(len(preg1))
print(len(preg2))
y.extend(y)
preg1 = preg1 + preg2
preg2.extend(preg1temp)
print(len(y))
print(len(preg1))
print(len(preg2))
del(preg1temp)

404290
404290
404290
808580
808580
808580


In [10]:
print(preg1[:3])
print(preg2[:3])
print(y[:3])
print(preg1[404290:404293])
print(preg2[404290:404293])
print(y[404290:404293])

['What is the step by step guide to invest in share market in india?', 'What is the story of Kohinoor (Koh-i-Noor) Diamond?', 'How can I increase the speed of my internet connection while using a VPN?']
['What is the step by step guide to invest in share market?', 'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?', 'How can Internet speed be increased by hacking through DNS?']
[0, 0, 0]
['What is the step by step guide to invest in share market?', 'What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back?', 'How can Internet speed be increased by hacking through DNS?']
['What is the step by step guide to invest in share market in india?', 'What is the story of Kohinoor (Koh-i-Noor) Diamond?', 'How can I increase the speed of my internet connection while using a VPN?']
[0, 0, 0]


## Data preprocessing

Since we're using embedding layers, we'll need to encode each word with an integer.

In [11]:
all_words = ' '.join(preg1)
words = all_words.split()

In [14]:
all_words[:1000]

'What is the step by step guide to invest in share market in india? What is the story of Kohinoor (Koh-i-Noor) Diamond? How can I increase the speed of my internet connection while using a VPN? Why am I mentally very lonely? How can I solve it? Which one dissolve in water quikly sugar, salt, methane and carbon di oxide? Astrology: I am a Capricorn Sun Cap moon and cap rising...what does that say about me? Should I buy tiago? How can I be a good geologist? When do you use ? instead of ?? Motorola (company): Can I hack my Charter Motorolla DCX3400? Method to find separation of slits using fresnel biprism? How do I read and find my YouTube comments? What can make Physics easy to learn? What was your first sexual experience like? What are the laws to change your status from a student visa to a green card in the US, how do they compare to the immigration laws in Canada? What would a Trump presidency mean for current international master\x92s students on an F1 visa? What does manipulation me

In [15]:
len(words)

8944869

In [16]:
words[:20]

['What',
 'is',
 'the',
 'step',
 'by',
 'step',
 'guide',
 'to',
 'invest',
 'in',
 'share',
 'market',
 'in',
 'india?',
 'What',
 'is',
 'the',
 'story',
 'of',
 'Kohinoor']

### Encoding the words

The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our owner names into integers so they can be passed into the network.

In [17]:
%%time
from collections import Counter
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii-1 for ii, word in enumerate(vocab, 1)}

preg1_ints = []
preg2_ints = []
for each in preg1:
    preg1_ints.append([vocab_to_int[word]+1 for word in each.split()])
for each in preg2:
    preg2_ints.append([vocab_to_int[word]+1 for word in each.split()])
###+1 because I pad with 0 and I had a word 0

Wall time: 10.7 s


In [18]:
vocab[:5]

['the', 'What', 'is', 'I', 'a']

In [20]:
len(vocab)

231656

### Encoding the labels 

Our labels are "company" or "person". To use these labels in our network, we need to convert them to 0 and 1.


In [15]:
y = np.array(y)

In [16]:
print(len(preg1_ints))
print(type(preg1_ints))

808580
<class 'list'>


In [17]:
preg1_lens = Counter([len(x) for x in preg1_ints])
print("Zero-length preg1: {}".format(preg1_lens[0]))
print("Maximum preg1 length: {}".format(max(preg1_lens)))

preg2_lens = Counter([len(x) for x in preg2_ints])
print("Zero-length preg2: {}".format(preg2_lens[0]))
print("Maximum preg2 length: {}".format(max(preg2_lens)))

Zero-length preg1: 0
Maximum preg1 length: 237
Zero-length preg2: 0
Maximum preg2 length: 237


For names shorter than 28, we'll pad with 0s. 

In [18]:
seq_len = 40
features1 = np.zeros((len(preg1_ints), seq_len), dtype=int)
for i, row in enumerate(preg1_ints):
    features1[i, -len(row):] = np.array(row)[:seq_len]

In [19]:
features1[0:10]

array([[     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      2,      3,      1,   1413,     55,   1413,
          3324,      6,    515,      8,    750,    597,      8,    781],
       [     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             2,      3,      1,    775,      9,  19803,  42892,  47069],
       [     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      7,     14,      4,    193,      1

In [20]:
seq_len = 40
features2 = np.zeros((len(preg2_ints), seq_len), dtype=int)
for i, row in enumerate(preg2_ints):
    features2[i, -len(row):] = np.array(row)[:seq_len]

In [21]:
features2[0:5]

array([[     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      2,      3,      1,   1413,
            55,   1413,   3324,      6,    515,      8,    750,    905],
       [     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      2,     43,    187,     35,      1,
            89,    301,  14780,      1,  19803,  42892,   8259,    956],
       [     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      7

In [22]:
type(features1)

numpy.ndarray

In [23]:
features = np.concatenate((features1,np.zeros((len(features1), 5), dtype=int),features2), axis=1)

In [24]:
features[0:5]

array([[     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      2,      3,      1,   1413,     55,   1413,
          3324,      6,    515,      8,    750,    597,      8,    781,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      2,      3,      1,   1413,     55,   1413,   3324,
             6,    515,      8,    750,    905],
       [     0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,      0,      0,      0,      0,      0,      0,      0,
             0,

## Training, Validation, Test



With our data in nice shape, we'll split it into training, validation, and test sets.

10% randomly taken for test.

Kfold on the remaining 90% for validation and training.-->canceled. Done, but took much more time for the same 98.9 result

In [25]:
split_frac = 0.9
train_val_x, test_x, train_val_y, test_y = train_test_split(
    features, y, 
    train_size = split_frac)

#sin Kfold
train_x, val_x, train_y, val_y = train_test_split(
    train_val_x, train_val_y, 
    train_size = split_frac)
'''
#Kfold
train_x = []
val_x   = []
train_y = []
val_y   = []
train_x =  np.empty([0,max(names_lens)])
val_x   =  np.empty([0,max(names_lens)])
train_y =  np.empty(0)
val_y   =  np.empty(0)

kf = KFold(n_splits = 9, shuffle=True)
for train_index, val_index in kf.split(train_val_x):
    train_temp_x, val_temp_x = train_val_x[train_index], train_val_x[val_index]
    train_temp_y, val_temp_y = train_val_y[train_index], train_val_y[val_index]
    train_x = np.concatenate((train_x, train_temp_x), axis=0)
    val_x   = np.concatenate((val_x, val_temp_x), axis=0)
    train_y = np.concatenate((train_y, train_temp_y), axis=0)
    val_y = np.concatenate((val_y, val_temp_y), axis=0)

'''
print("\t\t\tFeature Shapes:")
print("X\nTrain set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape),
      "\nY\nTrain set: \t\t{}".format(train_y.shape), 
      "\nValidation set: \t{}".format(val_y.shape),
      "\nTest set: \t\t{}".format(test_y.shape),
     )

			Feature Shapes:
X
Train set: 		(654949, 85) 
Validation set: 	(72773, 85) 
Test set: 		(80858, 85) 
Y
Train set: 		(654949,) 
Validation set: 	(72773,) 
Test set: 		(80858,)


In [26]:
del(y_pd, preg1_pd, preg2_pd, preg1, preg2, preg1_ints, preg2_ints, features1, features2, features, y)

## Build the graph

Here, we'll build the graph. First up, defining the hyperparameters.

* `lstm_size`: Number of units in the hidden layers in the LSTM cells. 
* `lstm_layers`: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.
* `batch_size`: The number of names to feed the network in one training pass.
* `learning_rate`: Learning rate

In [27]:
lstm_size = 64
lstm_layers = 1
batch_size = 8
learning_rate = 0.01

For the network itself, we'll be passing in our 28 element long names vectors. Each batch will be `batch_size` vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability.

In [28]:
n_words = len(vocab_to_int)

# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
    inputs_ = tf.placeholder(tf.int32, [None, None], name='inputs')
    labels_ = tf.placeholder(tf.int32, [None, None], name='labels')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')

### Embedding

Now we'll add an embedding layer. 


In [29]:
len(vocab)

231656

In [30]:
vocab[:20]

['the',
 'What',
 'is',
 'I',
 'a',
 'to',
 'How',
 'in',
 'of',
 'do',
 'are',
 'and',
 'for',
 'can',
 'you',
 'Why',
 'best',
 'my',
 'on',
 'it']

In [31]:
# Size of the embedding vectors (number of units in the embedding layer)
embed_size = 300 

with graph.as_default():
    embedding = tf.Variable(tf.truncated_normal((n_words, embed_size), stddev=0.1))
    embed = tf.nn.embedding_lookup(embedding, inputs_)

### LSTM cell



Next, we'll create our LSTM cells to use in the recurrent network 

In [32]:
with graph.as_default():
    # Your basic LSTM cell
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
    
    # Add dropout to the cell
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
    
    # Getting an initial state of all zeros
    initial_state = cell.zero_state(batch_size, tf.float32)

### RNN forward pass


Now we need to actually run the data through the RNN nodes. 

Above I created an initial state, `initial_state`, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. 


In [33]:
with graph.as_default():
    outputs, final_state = tf.nn.dynamic_rnn(cell, embed,
                                             initial_state=initial_state)

### Output

We want the final output. So we need to grab the last output with `outputs[:, -1]`

In [34]:
with graph.as_default():
    predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
    cost = tf.losses.mean_squared_error(labels_, predictions)
    
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)

### Validation accuracy

Here we can add a few nodes to calculate the accuracy which we'll use in the validation pass.

In [35]:
with graph.as_default():
    correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

### Batching

This is a simple function for returning batches from our data. First it removes data such that we only have full batches. Then it iterates through the `x` and `y` arrays and returns slices out of those arrays with size `[batch_size]`.

In [36]:
def get_batches(x, y, batch_size=100):
    
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

## Training



In [37]:
'''
import os
epochs = 1

with graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    iteration = 1
    for e in range(epochs):
        state = sess.run(initial_state)
        
        for ii, (x, y) in enumerate(get_batches(train_x, train_y, 2), 1):
            print(ii)
            print(x)
            print(y)
            print(x.shape)
            print(y.shape)
            print(y[:, None])
            if ii == 5:
                raise Exception("Manual Stop")
#''' ;           

In [38]:
epochs = 10

with graph.as_default():
    saver = tf.train.Saver()

In [39]:
with tf.Session(graph=graph) as sess:
    sess.run(tf.global_variables_initializer())
    iteration = 1
    for e in range(epochs):
        state = sess.run(initial_state)
        
        for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
            feed = {inputs_: x,
                    labels_: y[:, None],
                    keep_prob: 0.5,
                    initial_state: state}
            loss, state, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)
            
            if iteration%5==0:
                print("Epoch: {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Train loss: {:.3f}".format(loss))

            if iteration%250==0:
                val_acc = []
                val_state = sess.run(cell.zero_state(batch_size, tf.float32))
                for x, y in get_batches(val_x, val_y, batch_size):
                    feed = {inputs_: x,
                            labels_: y[:, None],
                            keep_prob: 1,
                            initial_state: val_state}
                    batch_acc, val_state = sess.run([accuracy, final_state], feed_dict=feed)
                    val_acc.append(batch_acc)
                print("Val acc: {:.3f}".format(np.mean(val_acc)))
            iteration +=1
    saver.save(sess, "checkpoints/ownerNameCategQuora.ckpt")

ResourceExhaustedError: OOM when allocating tensor with shape[231656,300]
	 [[Node: Adam/update_Variable/mul_2 = Mul[T=DT_FLOAT, _class=["loc:@Variable"], _device="/job:localhost/replica:0/task:0/gpu:0"](Variable/Adam/read, Adam/beta1)]]
	 [[Node: mean_squared_error/value/_21 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_509_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'Adam/update_Variable/mul_2', defined at:
  File "D:\ProgramData\Anaconda3\envs\t\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\ProgramData\Anaconda3\envs\t\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance
    app.start()
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\ipykernel\kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tornado\ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\ipykernel\kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\ipykernel\ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\ipykernel\zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\IPython\core\interactiveshell.py", line 2683, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\IPython\core\interactiveshell.py", line 2787, in run_ast_nodes
    if self.run_code(code, result):
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\IPython\core\interactiveshell.py", line 2847, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-34-616d62ee52be>", line 5, in <module>
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\training\optimizer.py", line 325, in minimize
    name=name)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\training\optimizer.py", line 456, in apply_gradients
    update_ops.append(processor.update_op(self, grad))
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\training\optimizer.py", line 102, in update_op
    return optimizer._apply_sparse_duplicate_indices(g, self._v)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\training\optimizer.py", line 654, in _apply_sparse_duplicate_indices
    return self._apply_sparse(gradient_no_duplicate_indices, var)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\training\adam.py", line 168, in _apply_sparse
    m_t = state_ops.assign(m, m * beta1_t,
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\ops\variables.py", line 667, in _run_op
    return getattr(ops.Tensor, operator)(a._AsTensor(), *args)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\ops\math_ops.py", line 821, in binary_op_wrapper
    return func(x, y, name=name)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1044, in _mul_dispatch
    return gen_math_ops._mul(x, y, name=name)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 1434, in _mul
    result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\framework\ops.py", line 2336, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "D:\ProgramData\Anaconda3\envs\t\lib\site-packages\tensorflow\python\framework\ops.py", line 1228, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[231656,300]
	 [[Node: Adam/update_Variable/mul_2 = Mul[T=DT_FLOAT, _class=["loc:@Variable"], _device="/job:localhost/replica:0/task:0/gpu:0"](Variable/Adam/read, Adam/beta1)]]
	 [[Node: mean_squared_error/value/_21 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_509_mean_squared_error/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]


## Testing

In [None]:
test_acc = []
with tf.Session(graph=graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_state = sess.run(cell.zero_state(batch_size, tf.float32))
    for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
        feed = {inputs_: x,
                labels_: y[:, None],
                keep_prob: 1,
                initial_state: test_state}
        batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
        test_acc.append(batch_acc)
    print("Test accuracy: {:.3f}".format(np.mean(test_acc)))

## Predictions

In [31]:
x = "Dushyant Sekhar "

x_int = [vocab_to_int[word] for word in x.split()]
x_int = x_int[:28] #ignore words after 28th

x_int_sized = np.zeros((1,seq_len), dtype=int)
x_int_sized[0,-len(x_int):] = np.array(x_int)[:seq_len]
print(x_int_sized)

[[   0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0 2491 1080]]


In [32]:
fillerSize =  batch_size - 1
filler = np.tile(np.zeros((1, seq_len), dtype=int),(fillerSize,1))
#print(filler.shape)
prodBatch = np.append(x_int_sized,filler, axis=0)
#print(prodBatch)
#print(prodBatch.shape)

In [33]:
with tf.Session(graph=graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    test_state = sess.run(cell.zero_state(batch_size, tf.float32))
    feed = {inputs_: prodBatch,
            keep_prob: 1,
            initial_state: test_state}
    output = sess.run(predictions, feed_dict=feed)
    print(output[0])
    if output[0]>0.5:
        print(x,"\nCompany")
        print("probability {}%".format(np.round(output[0][0]*100,2)))
    else:
        print(x,"\nPerson")
        print("probability {}%".format(np.round(100-output[0][0]*100,2)))
        

[ 0.00042518]
Dushyant Sekhar  
Person
probability 99.96%
