# Overview 

Jeremy Howard and Sebastian Ruder introduced ULMFiT in 2018. Paper link : https://arxiv.org/abs/1801.06146

Key steps of ULMFiT Idea : 

* Training a language model on a large corpus
* Fine-tuning the language model on the data that will be used in the downstream task like text classification
* Adding a classifier head on top of the language model and fine-tune again for classification

There are more details : 

* The model Fastai used initially is AWD-LSTM from Steven Merity's [paper](https://arxiv.org/abs/1708.02182)
* There are many regularization methods used in their implementation like using weight dropout, embedding dropout
,activation regularization,dropout on input and output layers and so on.
* Fine-tuning is done by varying learning rates. 

In this case first we try to stick to the basics. This implementation has a language module which will be 
trained on wikitext-103 dataset initially, fine-tuned on imdb dataset and ultimately used for sentiment classification with imdb dataset. For demo I'm only using fake data to show the process, the model has not been trained yet.

![](ulmfit_approach.png)

### Imports

In [3]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.python.ops import lookup_ops
from tensorflow.python.training.tracking import tracking


from absl import app
from absl import flags
import numpy as np
import tensorflow.compat.v2 as tf
import os
import tempfile
import re
import html 

In [9]:
print(tf.__version__)

2.0.0-beta1


In [25]:
from tensorflow.keras.layers import Dense, Flatten, Embedding, LSTM,Input,Embedding
from tensorflow.keras import Model
from tensorflow.keras.layers import BatchNormalization, Dropout,GlobalMaxPooling1D,GlobalAveragePooling1D,concatenate


# Get Data for Language Model

In [11]:
# Get Wikitext 103
!wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip

--2019-08-19 09:45:58--  https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.99.117
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.99.117|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 190229076 (181M) [application/zip]
Saving to: ‘wikitext-103-v1.zip’


2019-08-19 09:49:07 (986 KB/s) - ‘wikitext-103-v1.zip’ saved [190229076/190229076]



In [12]:
!unzip wikitext-103-v1.zip

Archive:  wikitext-103-v1.zip
   creating: wikitext-103/
  inflating: wikitext-103/wiki.test.tokens  
  inflating: wikitext-103/wiki.valid.tokens  
  inflating: wikitext-103/wiki.train.tokens  


# Get IMDB Dataset 

In [13]:
!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz

--2019-08-19 09:49:21--  https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolving ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Connecting to ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘aclImdb_v1.tar.gz’


2019-08-19 09:49:23 (48.5 MB/s) - ‘aclImdb_v1.tar.gz’ saved [84125825/84125825]



In [14]:
!tar xzf aclImdb_v1.tar.gz

In [15]:
!ls

aclImdb		   Full ULMFiT.ipynb		   wikitext-103
aclImdb_v1.tar.gz  ulmfit_module_classifier.ipynb  wikitext-103-v1.zip


# Preprocess Wikitext-103


The data is in a long .txt file, so we need to split that into articles before language modelling 

In [5]:
train_path = "wikitext-103/wiki.train.tokens"
valid_path = "wikitext-103/wiki.valid.tokens"
test_path = "wikitext-103/wiki.test.tokens"

In [6]:
def istitle(line):
    return len(re.findall(r'^ = [^=]* = $', line)) != 0
    
UNK = "unk"
def read_wiki(filename):
    articles = []
    with open(filename, encoding='utf8') as f:
        lines = f.readlines()
    current_article = ''
    for i,line in enumerate(lines):
        current_article += line
        if i < len(lines)-2 and lines[i+1] == ' \n' and istitle(lines[i+2]):
            current_article = current_article.replace('<unk>', UNK)
            articles.append(current_article)
            current_article = ''
    current_article = current_article.replace('<unk>', UNK)
    articles.append(current_article)
    return articles

In [7]:
def preprocess(x):
  x = x.strip().lower()
  
  # fix html 
  re1 = re.compile(r'  +')
  x = x.replace('#39;', "'").replace('amp;', '&').replace('#146;', "'").replace(
        'nbsp;', ' ').replace('#36;', '$').replace('\\n', "\n").replace('quot;', "'").replace(
        '<br />', "\n").replace('\\"', '"').replace('<unk>',UNK).replace(' @.@ ','.').replace(
        ' @-@ ','-').replace(' @,@ ',',').replace('\\', ' \\ ')
  x=re1.sub(' ', html.unescape(x))
  
  "Add spaces around / and # in `t`. \n" 
  x=re.sub(r'([/#\n])', r' \1 ', x)
  
  "Remove multiple spaces in `t`."
  
  x=re.sub(' {2,}', ' ', x)
  
  return '<S> '+x+' <E>'

In [8]:
wiki_articles = read_wiki(train_path)
wiki_valid_articles = read_wiki(valid_path)

In [9]:
wiki_articles = [preprocess(article) for article in wiki_articles]
wiki_valid_articles = [preprocess(article) for article in wiki_valid_articles]

In [10]:
print(wiki_articles[3][0:300])

<S> = gambia women 's national football team = 
 
 the gambia women 's national football team represents the gambia in international football competition . the team , however , has not competed in a match recognised by fifa , the sport 's international governing body , despite that organised women '


## Build Vocabulary from training data

In [11]:
from collections import Counter
def build_vocab(data,max_words,min_freq):
    counter = Counter([word for para in data for word in para.split()])
    vocab = [word[0] for word in counter.most_common() if word[1] >= min_freq]
    return vocab[:max_words]
    

In [12]:
train_vocab = build_vocab(wiki_articles,max_words=10000,min_freq=3)

In [13]:
len(train_vocab)

10000

# Preprocess IMDB Dataset

labels :  0 for positive, 1 for negative.

In [14]:
dir_names = ['pos','neg']

imdb_file_paths = []
imdb_labels = []
for i, dir in enumerate(dir_names):
  file_names = [os.path.join("aclImdb/train",dir,name) for name in os.listdir("aclImdb/train/"+dir)]
  imdb_file_paths += file_names
  imdb_labels += [i]*len(os.listdir("aclImdb/train/"+dir))
  


In [15]:
np.random.seed(42)
permutation = np.random.permutation(len(imdb_file_paths))

imdb_file_paths = np.array(imdb_file_paths)[permutation]
imdb_labels = np.array(imdb_labels)[permutation]

In [16]:
print(len(imdb_file_paths),len(imdb_labels))

25000 25000


In [17]:
imdb_reviews = []
for file in imdb_file_paths:
  with open(file,encoding='utf-8') as f:
    data = f.read()
  data = preprocess(data)
  imdb_reviews.append(data)

In [18]:
print(imdb_reviews[0],imdb_labels[0])

<S> an end of an era was released here in the states in spring 2002 with "the rookie," a disney live action film that seemed to be the "best for last!!!!!" it took place right here in texas! actually, the story began in west texas, as evidenced by an area code found on a sign over there. it was about a high school coach who was so convinced by his high class baseball team that he decided to go professional!!!!! 
 
 what i liked about this movie: it was sooo nice!!!!! it was a very good sports movie, ala "the mighty ducks" trilogy. it had also taken moviegoers across texas, from somewhere between the panhandle and el paso all the way to the metroplex (where i live). i can tell because i recognize that ballpark (was "the ballpark in arlington;" now it's "ameriquest field")! it was nice to see disney's "golden age" end here in my area!!!!! 
 
 r.i.p. 
 
 golden age of disney 
 
 1920s-spring 2002 
 
 "it all started with a mouse...and it ended with baseball." (sobs) 
 
 10 / 10 <E> 0


# Create the language model

In [19]:
class LanguageModelEncoder(tf.train.Checkpoint):
    def __init__(self,vocab_size,emb_dim,state_size,n_layers):
        super(LanguageModelEncoder, self).__init__()
        self._state_size = state_size
        self.embedding_layer = Embedding(vocab_size,emb_dim)
        self._lstm_layers = [LSTM(self._state_size,return_sequences=True) for i in range(n_layers)]
        
    @tf.function(input_signature=[tf.TensorSpec([None,None], tf.dtypes.int64)])
    def __call__(self,sentence_lookup_ids):
        
        emb_output = self.embedding_layer(sentence_lookup_ids)
        lstm_output = emb_output # initialize to the input
        for lstm_layer in self._lstm_layers:
            lstm_output = lstm_layer(lstm_output)
        return lstm_output
        
        

In [20]:
def write_vocabulary_file(vocabulary):
  """Write temporary vocab file for module construction."""
  tmpdir = tempfile.mkdtemp()
  vocabulary_file = os.path.join(tmpdir, "tokens.txt")
  with tf.io.gfile.GFile(vocabulary_file, "w") as f:
    for entry in vocabulary:
      f.write(entry + "\n")
  return vocabulary_file

In [21]:
class ULMFiTModule(tf.train.Checkpoint):
  """
  Trains a language model on given sentences
  """

  def __init__(self, vocab, emb_dim, buckets, state_size,n_layers):
    super(ULMFiTModule, self).__init__()
    self._buckets = buckets
    self._vocab_size = len(vocab)
    self.emb_row_size = self._vocab_size+self._buckets
    #self._embeddings = tf.Variable(tf.random.uniform(shape=[self.emb_row_size, emb_dim]))
    self._state_size = state_size
    self.model = LanguageModelEncoder(self.emb_row_size,emb_dim,state_size,n_layers)
    self._vocabulary_file = tracking.TrackableAsset(write_vocabulary_file(vocab)) 
    self.w2i_table = lookup_ops.index_table_from_file(
                    vocabulary_file= self._vocabulary_file,
                    num_oov_buckets=self._buckets,
                    hasher_spec=lookup_ops.FastHashSpec)
    self.i2w_table = lookup_ops.index_to_string_table_from_file(
                    vocabulary_file=self._vocabulary_file, 
                    delimiter = '\n',
                    default_value="UNKNOWN")
    self._logit_layer = tf.keras.layers.Dense(self.emb_row_size)
    self.optimizer = tf.keras.optimizers.Adam()


    
  def _tokenize(self, sentences):
    # Perform a minimalistic text preprocessing by removing punctuation and
    # splitting on spaces.
    normalized_sentences = tf.strings.regex_replace(
        input=sentences, pattern=r"\pP", rewrite="")
    sparse_tokens = tf.strings.split(normalized_sentences, " ").to_sparse()

    # Deal with a corner case: there is one empty sentence.
    sparse_tokens, _ = tf.sparse.fill_empty_rows(sparse_tokens, tf.constant(""))
    # Deal with a corner case: all sentences are empty.
    sparse_tokens = tf.sparse.reset_shape(sparse_tokens)

    return (sparse_tokens.indices, sparse_tokens.values,
            sparse_tokens.dense_shape)
    
  def _indices_to_words(self, indices):
    #return tf.gather(self._vocab_tensor, indices)
    return self.i2w_table.lookup(indices)
    

  def _words_to_indices(self, words):
    #return tf.strings.to_hash_bucket(words, self._buckets)
    return self.w2i_table.lookup(words)
  
  @tf.function(input_signature=[tf.TensorSpec([None],tf.dtypes.string)])   
  def _tokens_to_lookup_ids(self,sentences):
    token_ids, token_values, token_dense_shape = self._tokenize(sentences)
    tokens_sparse = tf.sparse.SparseTensor(
        indices=token_ids, values=token_values, dense_shape=token_dense_shape)
    tokens = tf.sparse.to_dense(tokens_sparse, default_value="")

    sparse_lookup_ids = tf.sparse.SparseTensor(
        indices=tokens_sparse.indices,
        values=self._words_to_indices(tokens_sparse.values),
        dense_shape=tokens_sparse.dense_shape)
    lookup_ids = tf.sparse.to_dense(sparse_lookup_ids, default_value=0)
    return tokens,lookup_ids
        
    

  @tf.function(input_signature=[tf.TensorSpec([None], tf.dtypes.string)])
  def train(self, sentences):
    tokens,lookup_ids = self._tokens_to_lookup_ids(sentences)
    # Targets are the next word for each word of the sentence.
    tokens_ids_seq = lookup_ids[:, 0:-1]
    tokens_ids_target = lookup_ids[:, 1:]
    tokens_prefix = tokens[:, 0:-1]

    # Mask determining which positions we care about for a loss: all positions
    # that have a valid non-terminal token.
    mask = tf.logical_and(
        tf.logical_not(tf.equal(tokens_prefix, "")),
        tf.logical_not(tf.equal(tokens_prefix, "<E>")))

    input_mask = tf.cast(mask, tf.int32)

    with tf.GradientTape() as t:
      #sentence_embeddings = tf.nn.embedding_lookup(self._embeddings,tokens_ids_seq)
    
      lstm_output = self.model(tokens_ids_seq)
      lstm_output = tf.reshape(lstm_output, [-1,self._state_size])
      logits = self._logit_layer(lstm_output)
      

      targets = tf.reshape(tokens_ids_target, [-1])
      weights = tf.cast(tf.reshape(input_mask, [-1]), tf.float32)

      losses = tf.nn.sparse_softmax_cross_entropy_with_logits(
          labels=targets, logits=logits)

      # Final loss is the mean loss for all token losses.
      final_loss = tf.math.divide(
          tf.reduce_sum(tf.multiply(losses, weights)),
          tf.reduce_sum(weights),
          name="final_loss")

    watched = t.watched_variables()
    gradients = t.gradient(final_loss, watched)
    self.optimizer.apply_gradients(zip(gradients, watched))

    #for w, g in zip(watched, gradients):
    #  w.assign_sub(g)

    return final_loss
  
  @tf.function(input_signature=[tf.TensorSpec([None], tf.dtypes.string)])  
  def validate(self,sentences):
    tokens,lookup_ids = self._tokens_to_lookup_ids(sentences)
    # Targets are the next word for each word of the sentence.
    tokens_ids_seq = lookup_ids[:, 0:-1]
    tokens_ids_target = lookup_ids[:, 1:]
    tokens_prefix = tokens[:, 0:-1]

    # Mask determining which positions we care about for a loss: all positions
    # that have a valid non-terminal token.
    mask = tf.logical_and(
        tf.logical_not(tf.equal(tokens_prefix, "")),
        tf.logical_not(tf.equal(tokens_prefix, "<E>")))

    input_mask = tf.cast(mask, tf.int32)

    lstm_output = self.model(tokens_ids_seq)
    lstm_output = tf.reshape(lstm_output, [-1,self._state_size])
    logits = self._logit_layer(lstm_output)
      

    targets = tf.reshape(tokens_ids_target, [-1])
    weights = tf.cast(tf.reshape(input_mask, [-1]), tf.float32)

    losses = tf.nn.sparse_softmax_cross_entropy_with_logits(
          labels=targets, logits=logits)

    # Final loss is the mean loss for all token losses.
    final_loss = tf.math.divide(
          tf.reduce_sum(tf.multiply(losses, weights)),
          tf.reduce_sum(weights),
          name="final_validation_loss")

    return final_loss
    
  @tf.function
  def decode_greedy(self, sequence_length, first_word):

    sequence = [first_word]
    current_word = first_word
    current_id = tf.expand_dims(self._words_to_indices(current_word), 0)

    for _ in range(sequence_length):
      lstm_output = self.model(tf.expand_dims(current_id,0))
      lstm_output = tf.reshape(lstm_output, [-1,self._state_size])
      logits = self._logit_layer(lstm_output)
      softmax = tf.nn.softmax(logits)

      next_ids = tf.math.argmax(softmax, axis=1)
      next_words = self._indices_to_words(next_ids)[0]
      
      current_id = next_ids
      current_word = next_words
      sequence.append(current_word)

    return sequence


In [22]:
#module = ULMFiTModule(vocab=train_vocab,emb_dim=128,buckets=1,state_size=128,n_layers=2)

In [23]:
#for epoch in range(2):
#    train_loss = module.train(tf.constant(wiki_articles))
#    validation_loss = module.validate(tf.constant(wiki_valid_articles))
#    print("Epoch ",epoch," Train loss: ",train_loss.numpy()," Validation loss ",validation_loss.numpy())


In [26]:
sentences = ["<S> hello there <E>", "<S> how are you doing today <E>","<S> I am fine thank you <E>",
             "<S> hello world <E>", "<S> who are you? <E>"]
validation_sentences = ["<S> hello there <E>", "<S> how are you doing today <E>","<S> I am fine thank you <E>"]
vocab = [
      "<S>", "<E>", "hello", "there", "how", "are", "you", "doing", "today","I","am","fine","thank","world","who"]

module = ULMFiTModule(vocab=vocab, emb_dim=10, buckets=1, state_size=128,n_layers=1)

for epoch in range(200):
    train_loss = module.train(tf.constant(sentences))
    validation_loss = module.validate(tf.constant(validation_sentences))
    print("Epoch ",epoch," Train loss: ",train_loss.numpy()," Validation loss ",validation_loss.numpy())




W0819 10:07:01.917144 139776025814784 deprecation.py:323] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/lookup_ops.py:985: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Epoch  0  Train loss:  2.7724257  Validation loss  2.769445
Epoch  1  Train loss:  2.7691047  Validation loss  2.7662005
Epoch  2  Train loss:  2.7657092  Validation loss  2.7627726
Epoch  3  Train loss:  2.7621527  Validation loss  2.75908
Epoch  4  Train loss:  2.7583556  Validation loss  2.75504
Epoch  5  Train loss:  2.754243  Validation loss  2.75057
Epoch  6  Train loss:  2.7497408  Validation loss  2.7455854
Epoch  7  Train loss:  2.7447755  Validation loss  2.7399938
Epoch  8  Train loss:  2.739268  Validation loss  2.733692
Epoch  9  Train loss:  2.7331305  Validation loss  2.726563
Epoch  10  Train loss:  2.7262645  Validation loss  2.7184708
Epoch  11  Train loss:  2.7185562  Validation loss  2.709254
Epoch  12  Train loss:  2.7098737  Validation loss  2.6987262
Epoch  13  Train loss:  2.7000654  Validation loss  2.6866689
Epoch  14  Train loss:  2.6889555  Validation loss  2.6728287
Epoch  15  Train loss:  2.6763406  Validation loss  2.6569138
Epoch  16  Train loss:  2.6619

In [27]:
 # We have to call this function explicitly if we want it exported, because it
  # has no input_signature in the @tf.function decorator.
decoded = module.decode_greedy(sequence_length=10, first_word=tf.constant("<S> you"))
_ = [d.numpy() for d in decoded]
print(_)


[b'<S> you', b'hello', b'<E>', b'hello', b'<E>', b'hello', b'<E>', b'hello', b'<E>', b'hello', b'<E>']


In [28]:
tf.saved_model.save(module,"test")

In [None]:
module = tf.saved_model.load("test")

# Finetune on IMDB dataset 

In [None]:
for epoch in range(200):
    train_loss = module.train(tf.constant(imdb_reviews))
    #validation_loss = module.validate(tf.constant(validation_sentences))
    print("Epoch ",epoch," Train loss: ",train_loss.numpy())



# Classifier Head 


Classifier head takes in the final layer output of the languaage model and first gets the average pool and max pool of the 
final layer outputs, then passes the concatanation of last time steps hidden state, max pool results and average pool results through given number Dense-dropout-batchnormalization blocks. Finally it produces the classifier output probabilities.

In [29]:
class LanguageClassifier(Model):
    def __init__(self,language_module,num_labels,dense_units=(128,128),dropouts=(0.1,0.1)):
        
        # initialization stuff
        super(LanguageClassifier,self).__init__()
        self._language_module = language_module
        self.model_encoder = language_module.model
        
        
        # classifier head layers
        self.dense_layers = [Dense(units,activation="relu") for units in dense_units]
        self.dropout_layers = [Dropout(p) for p in dropouts]
        self.max_pool_layer = GlobalMaxPooling1D()
        self.average_pool_layer = GlobalAveragePooling1D()
        self.batchnorm_layer = BatchNormalization()
        self.n_layers = len(self.dense_layers)
        self.final_layer = Dense(num_labels,activation="sigmoid")
        
    def __call__(self,sentences):
        
        tokens,lookup_ids = self._language_module._tokens_to_lookup_ids(sentences)
        self.enc_out = self.model_encoder(lookup_ids)
        last_h = self.enc_out[:,-1,:]
        max_pool_output = self.max_pool_layer(self.enc_out)
        average_pool_output = self.average_pool_layer(self.enc_out)
        
        output = concatenate([last_h,max_pool_output,average_pool_output])
        
        for i in range(self.n_layers):
            output = self.dense_layers[i](output)
            output = self.dropout_layers[i](output)
            output = self.batchnorm_layer(output)
        
        final_output = self.final_layer(output)
        return final_output        

In [30]:
model = LanguageClassifier(num_labels=2,language_module=module)

# Classifier Training 

In [31]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()

optimizer = tf.keras.optimizers.Adam()

train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')

In [32]:
labels = tf.constant([[1],[0],[1],[0],[0]])

train_loss_hist = []
train_accuracy_hist = []
valid_loss_hist = []
valid_accuracy_hist = []

def track(tl_score,tl_accuracy,vl_score,vl_accuracy):
    train_loss_hist.append(tl_score)
    train_accuracy_hist.append(tl_accuracy)
    valid_loss_hist.append(vl_score)
    valid_accuracy_hist.append(vl_accuracy)

@tf.function
def train_step(samples, labels):
  with tf.GradientTape() as tape:
    predictions = model(samples)
    loss = loss_object(labels, predictions)
  #print(tape.watched_variables())
  watched = tape.watched_variables()
  gradients = tape.gradient(loss, watched)
  optimizer.apply_gradients(zip(gradients, watched))

  train_loss(loss)
  train_accuracy(labels, predictions)
    
    
@tf.function
def test_step(samples, labels):
  predictions = model(samples)
  t_loss = loss_object(labels, predictions)
  test_loss(t_loss)
  test_accuracy(labels, predictions)
    
    
EPOCHS = 10

for epoch in range(EPOCHS):
  train_step(sentences, labels)
    #test_step(validation_sentences, validation_labels)

  template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
  print(template.format(epoch+1,
                        train_loss.result(),
                        train_accuracy.result()*100,
                        test_loss.result(),
                        test_accuracy.result()*100))
    
  track(train_loss.result(),train_accuracy.result()*100,train_loss.result(),test_accuracy.result()*100)
  # Reset the metrics for the next epoch
  train_loss.reset_states()
  train_accuracy.reset_states()
  test_loss.reset_states()
  test_accuracy.reset_states()

Epoch 1, Loss: 0.7027415633201599, Accuracy: 40.0, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 2, Loss: 0.6726537942886353, Accuracy: 60.000003814697266, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 3, Loss: 0.6493236422538757, Accuracy: 60.000003814697266, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 4, Loss: 0.6254310011863708, Accuracy: 60.000003814697266, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 5, Loss: 0.6113171577453613, Accuracy: 80.0, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 6, Loss: 0.5895274877548218, Accuracy: 80.0, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 7, Loss: 0.5615838766098022, Accuracy: 80.0, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 8, Loss: 0.5289801955223083, Accuracy: 80.0, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 9, Loss: 0.5038028955459595, Accuracy: 80.0, Test Loss: 0.0, Test Accuracy: 0.0
Epoch 10, Loss: 0.47262048721313477, Accuracy: 80.0, Test Loss: 0.0, Test Accuracy: 0.0


# Visualizing Loss 