*italicized text*

## Create Text Classifier in tf Python


In [0]:
!pip install tensorflow-gpu==2.0.0-alpha0
!pip install unidecode
#!pip install tensorflow==2.0.0-alpha0  
#!pip install tensorflowjs==1.0.1

Collecting unidecode
[?25l  Downloading https://files.pythonhosted.org/packages/31/39/53096f9217b057cb049fe872b7fc7ce799a1a89b76cf917d9639e7a558b5/Unidecode-1.0.23-py2.py3-none-any.whl (237kB)
[K     |████████████████████████████████| 245kB 5.1MB/s 
[?25hInstalling collected packages: unidecode
Successfully installed unidecode-1.0.23


In [41]:
import tensorflow as tf
#import tensorflowjs as tfjs
import nltk
nltk.download('punkt') # download sentence tokenizer model
import os
import re
import numpy as np
import random
import unicodedata
import time
import inspect

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!




### Proj Desc
 This notebook takes 4 books. Takes 1000 <br>
 sentences from each book.  Tokenizes each word to a unique int.<br>
 Then feed each sentence into a deep model and output<br>
 a book classification.<br>

 <br>
 DNN 1 model--> each input is a (padded) sentence with one word = unique int, so <br>
 input vec represented a sentence one cell per word. NB: this word/sentence<br>
 vectorization allows no use of context as signal in the model as words as input vec order 
 is not significant in a standard dense deep model
 
 
RNN model 1 -->  each word gets vectorized (tf.embedding) and put in the <br>
 RNN. hidden state of RNN should learn the mix of words and some of tier ordering.
 Hidden state is passed to a dense layer for classification.

<br>
 
<br>

Future considerations:
- break text into non-sentence subsections
- Maybe use output as n book classifier, and use as the loss the final <br>
 output in the  sequence... hopefully the hiddenstate will e very influential <br>
 and carry the  context of the past words<br>
- Q: what would the effect be of varying the length of the snippet of text inputed into the model 
 once trained. Can the model be made robust to varying input lengths?<br>
 It seems like variable length (with some max len) posses all sorts of logistical problems. 
 I also wonder if  predicting random snippets fixed length snippets would be worse than predicting
 sentences...? It might be better bc no padding? or worse bc their maybe sentence
 structure patterns picked up in training.

#### Load Raw book data  



In [42]:
#change this locally
#data_dir = './data' 
data_dir = './' # for colab

# Get book paths
files = sorted([i for i in os.listdir(data_dir) if not i.startswith('.')])
files = [os.path.join(data_dir, i) for i in files]
print(files)

['./Alice-in-Wonderland by Lewis Carroll.txt', './Crime-and-Punishment Dostoevsky.txt', './Mansfield-Park by Jane Austen.txt', './The-Adventures-of-Tom-Sawyer by Mark Twain.txt', './sample_data']


In [43]:
# set names manually based on the order of the files
files_dict = dict(zip(['alice','crime','mansfield','sawyer'], files))
print(files_dict)

{'alice': './Alice-in-Wonderland by Lewis Carroll.txt', 'crime': './Crime-and-Punishment Dostoevsky.txt', 'mansfield': './Mansfield-Park by Jane Austen.txt', 'sawyer': './The-Adventures-of-Tom-Sawyer by Mark Twain.txt'}


In [44]:
# load each book text strings into a dict
books = {}
for name, path in files_dict.items():
  with open(path, "rt") as file:
    books[name] = file.readlines()
    books[name] = [i for i in books[name] if i != '\n']
     
books['alice'][:400]

['\ufeffThe Project Gutenberg EBook of Alice in Wonderland, by J.C. Gorham\n',
 'This eBook is for the use of anyone anywhere at no cost and with\n',
 'almost no restrictions whatsoever.  You may copy it, give it away or\n',
 're-use it under the terms of the Project Gutenberg License included\n',
 'with this eBook or online at www.gutenberg.org\n',
 'Title: Alice in Wonderland\n',
 '       Retold in Words of One Syllable\n',
 'Author: J.C. Gorham\n',
 'Release Date: October 16, 2006 [EBook #19551]\n',
 'Language: English\n',
 '*** START OF THIS PROJECT GUTENBERG EBOOK ALICE IN WONDERLAND ***\n',
 'Produced by Chuck Greif, Jason Isbell and The Online\n',
 'Distributed Proofreading Team at http://www.pgdp.net\n',
 '[Illustration]\n',
 "ALICE'S ADVENTURES IN WONDERLAND\n",
 'RETOLD IN WORDS OF ONE SYLLABLE\n',
 'By MRS. J.C. GORHAM\n',
 '_FULLY ILLUSTRATED_\n',
 'A.L. BURT COMPANY\n',
 'PUBLISHERS, NEW YORK\n',
 'COPYRIGHT 1905\n',
 '       *       *       *       *       *\n',
 'CONTENT

In [45]:
# Cut first 300 and last 750 lines to avoid front and back matter
for i in books.keys():
  books[i] = books[i][300:-750]
books['sawyer'][:100] + ['... BREAK ...']*6 + books['sawyer'][-100:]

['The switch hovered in the air—the peril was desperate—\n',
 '“My! Look behind you, aunt!”\n',
 'The old lady whirled round, and snatched her skirts out of danger. The lad fled on the instant, scrambled up the high board-fence, and disappeared over it.\n',
 '01-018.jpg (54K)\n',
 'His aunt Polly stood surprised a moment, and then broke into a gentle laugh.\n',
 '“Hang the boy, can’t I never learn anything? Ain’t he played me tricks enough like that for me to be looking out for him by this time? But old fools is the biggest fools there is. Can’t learn an old dog new tricks, as the saying is. But my goodness, he never plays them alike, two days, and how is a body to know what’s coming? He ’pears to know just how long he can torment me before I get my dander up, and he knows if he can make out to put me off for a minute or make me laugh, it’s all down again and I can’t hit him a lick. I ain’t doing my duty by that boy, and that’s the Lord’s truth, goodness knows. Spare the rod and spile 



### Preprocess text
Split into sentences, clean text, tokenize and padded sentences


In [46]:
# split each book into sentences

sentences = {}
for name in books.keys():
  # use if books read in with readlines()
  books[name] = ''.join(books[name])
  books[name] = re.sub('\n+', ' ', books[name])
  sentences[name] = nltk.sent_tokenize(books[name])
# save memory
del books
sentences['alice'][500]

'"I don\'t know what you mean," Al-ice said.'

Consider searching for any cases where the punkt tokenizer split a sentence after<br>
"Mr."



#### Split into Test and Train
Note: not all books even have 1450 sentences. alice only has 813<br>
<br>


In [47]:
# num sentences per book
{name: len(sentences[name]) for name in sentences.keys()}

{'alice': 813, 'crime': 9044, 'mansfield': 3203, 'sawyer': 2750}

In [0]:
# create select combined train and test dataset
dataset = []
random.seed(123)
for name, text in sentences.items():
  # 1450 sentences from each book, reduce where necessary
  dataset_len = 1450 if 1450 < len(sentences[name]) else len(sentences[name])
  dataset.extend(zip([name]*dataset_len, random.sample(text, dataset_len)))
random.shuffle(dataset)
train_len = round(len(dataset)*.7)
y_train, x_train = list(zip(*dataset[:train_len]))
y_test, x_test = list(zip(*dataset[train_len:]))
# save mem
del dataset

In [0]:
class_to_int =  {name:i for i, name in enumerate(sentences.keys())}
y_test = np.array([class_to_int[i] for i in y_test])
y_train = np.array([class_to_int[i] for i in y_train])

In [50]:
print(x_train[:5])
print(y_train[:5])
print('length x_train:', len(x_train))
print('length y_train:', len(y_train))

('Devils don’t slosh around much of a Sunday, I don’t reckon.” “I never thought of that.', 'Mrs. Grant, I believe, suspects him of a preference for Julia; I have never seen much symptom of it, but I wish it may be so.', 'It seemed to him that life was but a trouble, at best, and he more than half envied Jimmy Hodges, so lately released; it must be very peaceful, he thought, to lie and slumber and dream forever and ever, with the wind whispering through the trees and caressing the grass and the flowers over the grave, and nothing to bother and grieve about, ever any more.', 'And it showed three white, startled faces, too.', 'Why it ain’t like anything.')
[3 2 3 3 3]
length x_train: 3614
length y_train: 3614



<br>
note: I have created an imbalanced data set<br>




#### Scratch work for text cleaning
Text cleaning notes:<br>

All books:<br>
- punctuation. separate from the word it follows, or is the useful? <br>
- insight, much like sentence starters might be useful info<br>

Alice preprocess issues:
- dashed words (but leave one dash in double dashs). In teh end escaped single <br>
quote ' (\')<br>
- bring it all lower case # does this loose useful info?<br>
- get rid of multiple concurrent spaces<br>

Crime prepocess issues:
- .... four dots... useful? can these be separated by word at all'<br>

Sawyer Text issues:
- img names like "01-018.jpg (54K)" are sprinkled throughout the text<br>
- story didn't start till ~ sentence 45 of `sentences`<br>


In [51]:
# ID index of longest senteces in each book
long_sents = [j for i in range(len(sentences.values())) for j in range(len(list(sentences.values())[i])) if list(sentences.values())[i][j] == max(list(sentences.values())[i], key= len)]
#[653, 7010, 1981, 434]
# alt
# np.where(np.array(sentences['sawyer']) == max(sentences['sawyer'], key = len))
[len(max(list, key= len)) for list in sentences.values()]
[max(list, key= len) for list in sentences.values()]

['She tucked it un-der her arm with its legs down, but just as she got its neck straight and thought now she could give the ball a good blow with its head, the bird would twist its neck round and give her such a queer look, that she could not help laugh-ing; and by the time she had got its head down a-gain, she found that the hedge-hog had crawled off.',
 'Sonia flushed crimson, and Katerina Ivanovna suddenly burst into tears, immediately observing that she was “nervous and silly, that she was too much upset, that it was time to finish, and as the dinner was over, it was time to hand round the tea.” At that moment, Amalia Ivanovna, deeply aggrieved at taking no part in the conversation, and not being listened to, made one last effort, and with secret misgivings ventured on an exceedingly deep and weighty observation, that “in the future boarding-school she would have to pay particular attention to die Wäsche, and that there certainly must be a good dame to look after the linen, and sec

In [52]:
# display longest sentences
[list(sentences.values())[i][j] for i, j in enumerate(long_sents)]

['She tucked it un-der her arm with its legs down, but just as she got its neck straight and thought now she could give the ball a good blow with its head, the bird would twist its neck round and give her such a queer look, that she could not help laugh-ing; and by the time she had got its head down a-gain, she found that the hedge-hog had crawled off.',
 'Sonia flushed crimson, and Katerina Ivanovna suddenly burst into tears, immediately observing that she was “nervous and silly, that she was too much upset, that it was time to finish, and as the dinner was over, it was time to hand round the tea.” At that moment, Amalia Ivanovna, deeply aggrieved at taking no part in the conversation, and not being listened to, made one last effort, and with secret misgivings ventured on an exceedingly deep and weighty observation, that “in the future boarding-school she would have to pay particular attention to die Wäsche, and that there certainly must be a good dame to look after the linen, and sec

In [53]:
# find .jpg sentence in sawyer
[i for i, sent in enumerate(sentences['sawyer']) if re.search(r".jpg" ,  sent)][:20]
# practice regex on .jpg
re.sub("\\b.*\\.jpg.*\\)", "XXX", sentences['sawyer'][357])

'XXX In due course the superintendent stood up in front of the pulpit, with a closed hymn-book in his hand and his forefinger inserted between its leaves, and commanded attention.'

convert to ascii or sort out apostrophe at sawyer

In [54]:
# convert to ascii or sort out apostrophe at sawyer
a = sentences['sawyer'][37][10:20]
# see unicode python 3 docs for inspo
b = ['%04x' % ord(i) for i in a] # why this weird modulus
list(zip(a,b))
chr(8217)
'\u2019'
#'\u2019'.encode('ascii') #error
unicodedata.normalize('NFKD', a)
import unidecode
a_new = unidecode.unidecode(a)
b_new = ['%04x' % ord(i) for i in a_new]
list(zip(a_new,b_new))

# find occurences of apostrophe char
test = {}
for name in sentences.keys():
  test[name] = sum(len(re.findall('\u2019', sent)) for sent in sentences[name])
test

{'alice': 0, 'crime': 3145, 'mansfield': 0, 'sawyer': 1738}



### Clean text


In [0]:
def clean_text(s):
  # for details, see https://www.tensorflow.org/alpha/tutorials/sequences/nmt_with_attention
  # normalizes unicode code points and removes no-space markers 
  #s = ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')
  # adds a space around any punct to allow it to be inputed as separate word
  s = re.sub(r"([?.!,¿])", r" \1 ", s) 
  s = re.sub(r'[" "]+', " ", s) # replaces quotes with spaces
  s = re.sub(r"\'", r"'", s) # remove escaped single quotes common in alice
  s = re.sub(r"--", r" ", s) # "--" --> " " for alice
  s = re.sub(r"-", r"", s) # "-" --> "" for alice
  s = re.sub("\\b.*\\.jpg.*\\)", "", s) # remove .jpgs for sawyer
  # replace all apostrophe's with single quotes and include single quotes in 
  # allowed chars (will I run into multiple quote types in unicode) to keep 
  # contractions whole. OR if I left apostophes and single quote substituted 
  # with spaces would the seq order aware RNN recognize "can t" as a common 
  # pair and interpret it as a whole word.
  #s = re.sub('\u2019', "'", s)
  s = re.sub(r"[^a-zA-Z?.!,]+", " ", s) # currently single quote ' not included
  s = s.strip()
  s = '<start> ' + s + ' <end>'
  return s

In [0]:
# apply cleaning steps
x_train = [clean_text(i) for i in x_train]
x_test = [clean_text(i) for i in x_test]
# save mem
#del sentences 



#### Tokenize and pad length 

Consider applying tokenization and add padding in one function. 
Tokenizer scope both is both test and train.

In [0]:
# Build vocabulary mapping
tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
tokenizer.fit_on_texts(x_test + x_train)

In [0]:
# apply word-to-int map
x_train = tokenizer.texts_to_sequences(x_train)
x_test = tokenizer.texts_to_sequences(x_test)


Decision Point!!: I am choosing to cap sentences after 35 words for <br>
computational ease. ~85% of sentences will be full with a 35 word cap<br>

In [59]:
# check dist of sentence lengths<br>
# note: sentences are 1 int per word so len() captures word count
a = [len(i) for i in x_train+x_test]
np.quantile(a, [0,.25,.5,.75,.8,.85,.9,1])

array([  4.,  12.,  19.,  33.,  37.,  42.,  50., 219.])

In [0]:
# pad seqs
pad = tf.keras.preprocessing.sequence.pad_sequences
x_train = pad(x_train, maxlen=35, padding='post', truncating='post')
x_test = pad(x_test, maxlen=35, padding='post', truncating='post')


### Define DNN1 model 

I beleive that word (as int) sequences
are not very powerful as a predictors here because sentence length varies,  many of the later sentences will have 0s in later positions making those positions  less predictive
Even if word order could be taken into account by this model, really the position of a word in a sentence is  not likely not very predictive . For this reason maybe a more non-linear model i.e. deeper 
would be better). 

And imprved model would be somehting that takes into account the mix of words used in <br>
a sentence -like a bag of words - at a minimum and possibly also word order (as an RNN does)... 

Q: Why is it not preferable to just put the int sentence in as input vector  instead of a flattened embedings layer as suggested in the Subclassing example model?


In [0]:
class DNN1(tf.keras.Model):
  def __init__(self, vocab_size, embedding_size = 10):
    # vocab size --> col width of input array i.e. len of input vec
    # embedding size --> size of embedding that each int in vocab will be 
    # mapped to by tf embedding
    super(DNN1, self).__init__()
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)
    self.flatten = tf.keras.layers.Flatten()
    self.dense1 = tf.keras.layers.Dense(70, activation='relu')
    self.dense2 = tf.keras.layers.Dense(25, activation= 'relu')
    self.dense3 = tf.keras.layers.Dense(4, activation= 'softmax')
  
  def call(self, input_vec):
    x = self.embedding(input_vec)
    x = self.flatten(x)
    x = self.dense1(x)
    x = self.dense2(x)
    output_vec = self.dense3(x)
    return output_vec


#### Exploring with the DNN1 architecture

In [62]:
# +1 to accomodate the reserved int 0
vocab_size = len(tokenizer.word_index) + 1

dnn1 = DNN1(vocab_size)

input_vec = x_train[:12, :]
input_vec.shape # (12, 35) == (batch_size, padded sent len)

x = dnn1.embedding(input_vec)
# one row per obvs
x.shape # [12, 35, 10] == (batch_size, padded sent len, embedding_size)

# I am assuming that flatten puts concats the depth rows sequentially for
# coherence of input structure as a sentece vectorization
x = dnn1.flatten(x)
# now each 10 cell section represents a uniqe word
x.shape # [12, 350] == (batch_size, padded sent len * embedding_size)

x = dnn1.dense1(x)
x.shape # [12, 70] == (batch_size, dense1 nodes)

x = dnn1.dense2(x)
x.shape # [12, 25] == (batch_size, dense2 nodes)

output_vec = dnn1.dense3(x)
output_vec.shape # # [12, 25] == (batch_size, num classes)

TensorShape([12, 4])



### Define RNN1 model 

Take histor recording internal state and feed it into dense

Other ideas:
take each RNN seq elem output and throw that in a dense too? hidden state dense and the rnn output dense
look in text book for RNN architectures..<br>

Note: the RNN architecture reduces 65 dim RNN output directly into <br>
4 output classes from dense. Is this too rapid of a generalization not allowing <br>
for enough non-linearity and interaction effects?


In [0]:
class RNN1(tf.keras.Model):
  def __init__(self, vocab_size, embedding_size = 10, rnn_nodes = 65):
    # vocab size --> col width of input array i.e. len of input vec
    # embedding size --> size of embedding that each int in vocab will be 
    # mapped to by tf embedding
    super(RNN1, self).__init__()
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_size)
    self.gru = tf.keras.layers.GRU(rnn_nodes, 
                                   return_sequences=True, 
                                   return_state=True)
    #self.dense1 = tf.keras.layers.Dense(20, activation='relu')
    self.dense2 = tf.keras.layers.Dense(4, activation= 'softmax')
    self.rnn_nodes = rnn_nodes
  
  def call(self, input_vec, initial_state):
    x = self.embedding(input_vec)
    x, state = self.gru(x, initial_state)
    output_vec  = self.dense2(state)
    return output_vec

Compare parameter counts

In [65]:
# compare parameter counts
rnn1 = RNN1(vocab_size)
dnn1 = DNN1(vocab_size)

# call instance to figure out input size
BATCH_SIZE = 12
initial_state = tf.zeros((BATCH_SIZE, rnn1.rnn_nodes))
x = x_test[:BATCH_SIZE,:]
result = rnn1(x, initial_state)
dnn1(x)

print(rnn1.summary()) # 99,529
print(dnn1.summary()) # 110,699

Model: "rn_n1_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      multiple                  84250     
_________________________________________________________________
unified_gru_1 (UnifiedGRU)   multiple                  15015     
_________________________________________________________________
dense_7 (Dense)              multiple                  264       
Total params: 99,529
Trainable params: 99,529
Non-trainable params: 0
_________________________________________________________________
None
Model: "dn_n1_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      multiple                  84250     
_________________________________________________________________
flatten_2 (Flatten)          multiple                  0         
_________________________________

### Define Train Step<br>
i.e. forward pass and weight update step<br>


Consider making a Trainer class with a .train_step method and attrs<br>
for epoch results .. even use it to run epochs


In [0]:
# Define loss func and initialize optimizer

# for normal tf
loss_func = tf.keras.losses.SparseCategoricalCrossentropy()
# alt for tf GPU
#loss_func = lambda labs, preds: tf.reduce_mean(tf.keras.backend.sparse_categorical_crossentropy(labs, preds)) # function
optimizer = tf.keras.optimizers.Adam()

In [0]:
#@tf.function
def train_step(x_batch, y_batch, model, state = None, is_rnn = False):
  ## Calc forward pass and record graph
  with tf.GradientTape() as graph:
    if is_rnn: # alternatively I could include an unused state arg in DNN1.call()
      assert state is not None
      # resize initial state for batch_size of final remainder batch
      if state.shape[0] != x_batch.shape[0]:
        state = tf.slice(state, [0,0], [x_batch.shape[0], state.shape[1]])
      predictions = model(x_batch, state)
    else:
      predictions = model(x_batch)
    # train_loss is a scalar, the avg loss over all obsvs in batch
    train_loss = loss_func(y_batch,  predictions) 
  
  ## Update weights (in place)
  gradients = graph.gradient(train_loss, model.trainable_variables)
  # update model parameters in place
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  
  ## Record (return) results / batch avg train loss
  return train_loss, (y_batch, predictions)

Test accuracy function


In [0]:
def accuracy(x, y, model, is_rnn = False):
  assert all(isinstance(i, np.ndarray) for i in (x,y))
  if is_rnn:
    state = tf.zeros((x.shape[0], model.rnn_nodes))
    preds = model(x, state)
  else:
    preds = model(x)
  preds = np.argmax(preds, axis = 1)
  correct = (preds == y).sum()
  return (correct/len(preds)) * 100

Initialize cumulitive metric classes

In [0]:
cumul_avg_train_loss = tf.keras.metrics.Mean()
cumul_train_acc = tf.keras.metrics.SparseCategoricalAccuracy()

Create class for logging training results

In [0]:
class logger:
  path = './train_logs'
  
  def __init__(self, model):
    self.file = 'train log - %s.txt' % time.ctime()
    os.makedirs(logger.path, exist_ok=True)
    if not os.path.exists(os.path.join(logger.path, self.file)):
      config = inspect.getsource(model.__init__)
      config += '\n' + inspect.getsource(model.call)
      with open(os.path.join(logger.path, self.file), 'wt') as file:
        file.write(config)
  
  def write(self, report):
    with open(os.path.join(logger.path, self.file), 'at') as file:
        file.write(report)

Split data into batches

<br>
What would a better batch size be?<br>


In [0]:
BATCH_SIZE = 12

# tf dataset is a convenices for batch iteration (and more if desired)
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
dataset = dataset.batch(BATCH_SIZE)

x,y = next(iter(dataset))
result = train_step(x,y, dnn1)

### Train model and report progress


In [0]:
EPOCHS = 50
REPORT_FREQ = 5
RNN = True
#MODEL = DNN1(vocab_size)
MODEL = RNN1(vocab_size)

In [80]:
start = time.time()
date_time = time.ctime()
log = logger(MODEL)
for epoch in range(EPOCHS):
    epoch_start = time.time()
    
    if RNN:
      initial_state = tf.zeros((BATCH_SIZE, MODEL.rnn_nodes))
    
    # Train Batch
    batch = 0
    for batch, (x_batch, y_batch) in enumerate(dataset):
      batch +=1
      result = train_step(x_batch, y_batch, MODEL, state = initial_state, is_rnn=RNN)
      if epoch % REPORT_FREQ == 0:
        # record batch results
        train_loss, (labels, predictions) = result
        cumul_avg_train_loss(train_loss)
        cumul_train_acc(labels, predictions)
    
    # Report Results
    if epoch % REPORT_FREQ == 0:
      epoch_time = time.time() - epoch_start
      total_time = time.time() - start
      test_acc = accuracy(x_test, y_test, MODEL, is_rnn=RNN) 
      
      template = "Epoch #: %2d, Total Time: %.2f sec, Epoch Time: %.2f sec, Train Loss: %.4f, Train Acc: %.2f, Test Acc: %.2f"
      template = template \
        % (epoch, total_time, epoch_time, cumul_avg_train_loss.result(),
         cumul_train_acc.result(), test_acc)
      
      #print to stdout
      print(template)
      #record in log
      log.write(template)
      
      # reset cumulative metrics
      cumul_avg_train_loss.reset_states()
      cumul_train_acc.reset_states()
    

Epoch #: 0, Total Time: 25.10 sec, Epoch Time: 25.10 sec, Train Loss: 1.3435, Train Acc: 0.34, Test Acc: 34.02
Epoch #: 5, Total Time: 149.22 sec, Epoch Time: 24.98 sec, Train Loss: 0.6047, Train Acc: 0.74, Test Acc: 58.42
Epoch #: 10, Total Time: 274.23 sec, Epoch Time: 25.16 sec, Train Loss: 0.3061, Train Acc: 0.90, Test Acc: 60.81
Epoch #: 15, Total Time: 397.71 sec, Epoch Time: 24.87 sec, Train Loss: 0.1266, Train Acc: 0.96, Test Acc: 63.72
Epoch #: 20, Total Time: 522.12 sec, Epoch Time: 25.57 sec, Train Loss: 0.0845, Train Acc: 0.97, Test Acc: 66.62
Epoch #: 25, Total Time: 646.65 sec, Epoch Time: 25.21 sec, Train Loss: 0.0671, Train Acc: 0.98, Test Acc: 66.56
Epoch #: 30, Total Time: 771.83 sec, Epoch Time: 25.48 sec, Train Loss: 0.0479, Train Acc: 0.98, Test Acc: 67.01
Epoch #: 35, Total Time: 897.93 sec, Epoch Time: 25.29 sec, Train Loss: 0.0338, Train Acc: 0.99, Test Acc: 67.85
Epoch #: 40, Total Time: 1022.00 sec, Epoch Time: 25.15 sec, Train Loss: 0.0365, Train Acc: 0.99, T


<br>
Consider adding false positive and false negative by class measures<br>


results per epoch:<br>
1- avg per example loss per epoch i.e cumulative avg of batch avg loss reset <br>
every epoch

2- cumulative acc on dataset over one epoch