<a href="https://colab.research.google.com/github/bengsoon/lstm_lord_of_the_rings/blob/main/LOTR_LSTM_Character_Level_OneHot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Creating a Language Model with LSTM using Lord of The Rings Corpus
In this notebook, we will create a character-level language language model using LSTM using **one-hot encoding** on the input vectors

### Imports

In [None]:
# run this if you're running through paperspace
!pip install -r requirements.txt

Collecting tensorflow==2.7.0
  Downloading tensorflow-2.7.0-cp38-cp38-manylinux2010_x86_64.whl (489.6 MB)
[K     |████████████████████████████████| 489.6 MB 26 kB/s  eta 0:00:011    |█████▏                          | 78.6 MB 9.9 MB/s eta 0:00:42     |█████▋                          | 85.5 MB 9.9 MB/s eta 0:00:41     |█████▊                          | 87.3 MB 9.9 MB/s eta 0:00:41     |██████▎                         | 95.8 MB 9.6 MB/s eta 0:00:41     |██████▊                         | 102.1 MB 9.6 MB/s eta 0:00:41     |█████████▏                      | 140.0 MB 8.3 MB/s eta 0:00:43     |█████████▌                      | 146.1 MB 23.1 MB/s eta 0:00:15     |█████████████▌                  | 206.4 MB 9.1 MB/s eta 0:00:32███▎                 | 219.3 MB 5.4 MB/s eta 0:00:51     |██████████████▋                 | 224.3 MB 5.4 MB/s eta 0:00:50     |██████████████████▍             | 280.7 MB 24.5 MB/s eta 0:00:09     |██████████████████▋             | 285.5 MB 24.5 MB/s eta 0:00:09     |██████

Installing collected packages: numpy, tensorflow-io-gcs-filesystem, tensorflow-estimator, libclang, keras, tensorflow, regex
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.4
    Uninstalling numpy-1.19.4:
      Successfully uninstalled numpy-1.19.4
  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.6.0
    Uninstalling tensorflow-estimator-2.6.0:
      Successfully uninstalled tensorflow-estimator-2.6.0
  Attempting uninstall: keras
    Found existing installation: keras 2.6.0
    Uninstalling keras-2.6.0:
      Successfully uninstalled keras-2.6.0
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.6.0+nv
    Uninstalling tensorflow-2.6.0+nv:
      Successfully uninstalled tensorflow-2.6.0+nv
Successfully installed keras-2.7.0 libclang-12.0.0 numpy-1.19.5 regex-2021.11.10 tensorflow-2.7.0 tensorflow-estimator-2.7.0 tensorflow-io-gcs-filesystem-0.22.0
You should consider upgrading 

In [None]:
import tensorflow as tf
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.keras.layers import Embedding, Input, LSTM, Flatten, Dense, Dropout
from tensorflow.keras.callbacks import LearningRateScheduler, ModelCheckpoint
from tensorflow.keras import Model
import numpy as np 
from tensorflow.keras.models import load_model

from pprint import pprint as pp
from string import punctuation
import regex as re
import random
import os
from pathlib import Path
import math

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

### Data Preprocessing & Pipeline

In [None]:
# get LOTR full text
# !wget https://raw.githubusercontent.com/bengsoon/lstm_lord_of_the_rings/main/lotr_full.txt -P /content/drive/MyDrive/Colab\ Notebooks/LOTR_LSTM/data

#### Loading Data

In [None]:
path = Path(r"./")

In [None]:
with open(path/ "data/lotr_full.txt", "r", encoding="utf-8") as f:
    text = f.read()
print(text[:1000])

Three Rings for the Elven-kings under the sky,
               Seven for the Dwarf-lords in their halls of stone,
            Nine for Mortal Men doomed to die,
              One for the Dark Lord on his dark throne
           In the Land of Mordor where the Shadows lie.
               One Ring to rule them all, One Ring to find them,
               One Ring to bring them all and in the darkness bind them
           In the Land of Mordor where the Shadows lie.
           
FOREWORD

This tale grew in the telling, until it became a history of the Great War of the Ring and included many glimpses of the yet more ancient history that preceded it. It was begun soon after _The Hobbit_ was written and before its publication in 1937; but I did not go on with this sequel, for I wished first to complete and set in order the mythology and legends of the Elder Days, which had then been taking shape for some years. I desired to do this for my own satisfaction, and I had little hope that other people 

In [None]:
print(f"Corpus length: {int(len(text)) / 1000 } K characters")

Corpus length: 1532.723 K characters


## One-Hot Encoding Model

In [None]:
def standardize_text_string(text: str):
    """
        create a custom standardization that:
            1. Fixes whitespaces 
            2. Removes punctuations & numbers
            3. Sets all texts to lowercase
            4. Preserves the Elvish characters
    """
    
    text = re.sub(r"[\s+]", " ", text)
    text = re.sub(r"[0-9]", "", text)
    text = re.sub(f"[{punctuation}–]", "", text)

    return text.lower()

In [None]:
# get unique characters in the text
chars = sorted(set(standardize_text_string(text)))

In [None]:
print(chars, f"\n\nTotal unique characters: {len(chars)}")

[' ', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'á', 'â', 'ä', 'é', 'ë', 'í', 'ó', 'ú', 'û'] 

Total unique characters: 36


In [None]:
# create dictionary mappings for chars to integers for vectorization & vice versa
char2int = {c: i for i, c in enumerate(chars)}
int2char = {i: c for c, i in char2int.items()}

In [None]:
print(char2int)
print(int2char)

{' ': 0, 'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26, 'á': 27, 'â': 28, 'ä': 29, 'é': 30, 'ë': 31, 'í': 32, 'ó': 33, 'ú': 34, 'û': 35}
{0: ' ', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z', 27: 'á', 28: 'â', 29: 'ä', 30: 'é', 31: 'ë', 32: 'í', 33: 'ó', 34: 'ú', 35: 'û'}


In [None]:
#let's standardize our original text
standardized_text = standardize_text_string(text)

In [None]:
# setting up sequence length and step to create dataset
MAX_SEQ_LEN = 20
step = 2

Let's create our training examples from `standardized_text`. The input would be `sentences` where it is 'sampled' for `MAX_SEQ_LEN at every `step` from the length of the text.

The output would be `next_chars` where it is the 'supposed' character the model should predict during the training time.

In [None]:
# create training examples: input (`sentences`) and output (`next_chars`)
sentences = []
next_chars = []

for i in range(0, len(standardized_text) - MAX_SEQ_LEN, step):
    sentences.append(standardized_text[i: i + MAX_SEQ_LEN])
    next_chars.append(standardized_text[i + MAX_SEQ_LEN])

print("Total number of training examples:", len(sentences))

Total number of training examples: 736848


In [None]:
# get the total number of unique chars
# these parameters will also be used later on in our model
N_UNIQUE_CHARS = len(chars)
m = len(sentences)

def vectorize_sentence(text, max_seq_len=MAX_SEQ_LEN, n_unique_chars=N_UNIQUE_CHARS):
    """ Convert input sentence into one-hot encoding numpy vector of shape 
     (m, max_seq_len, n_unique_chars) """
    if type(text) == str:
        # if text is input as string
        if len(text) > max_seq_len:
            # if text is longer than max_seq_len it will be truncated 
            ## and appended on the list
            text_list = []
            for i in range(0, len(text), max_seq_len):
                text_list.append(text[i: i+max_seq_len])
            text = text_list
        else:
            # if text is less than max_seq_len, convert str -> list(str)
            text = [text]
        
    
    m = len(text) # get total number of sentences

    x = np.zeros((m, max_seq_len, n_unique_chars), dtype=np.bool)
    for i, sentence in enumerate(text):
        # for each sentence in the `text` list
        for p, char in enumerate(sentence.lower()): 
            # p is the position of the letter in the sentence
            # char is the character in the sentence
            x[i, p, char2int[char]] = 1
    return x

In [None]:
# try out sentence to ensure we get the right shape
text_test = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ".lower()

text_test_vector = vectorize_sentence(text_test)
print("Shape of text vector: {}".format(text_test_vector.shape))
print(f"Supposed shape:{(round(len(text_test) / MAX_SEQ_LEN), MAX_SEQ_LEN, N_UNIQUE_CHARS)}")

Shape of text vector: (3, 20, 36)
Supposed shape:(3, 20, 36)


Nice! Now that we got the right output shape from the `vectorize_sentence`, let's vectorize our `sentences`

In [None]:
# vectorize input sentences
X_data = vectorize_sentence(sentences);
print(X_data.shape)

(736848, 20, 36)


> Supposed shape: `(len(sentences), MAX_SEQ_LEN, N_UNIQUE_CHARS)`

In [None]:
# vectorize next_chars (output) -> shape: (m, N_UNIQUE_CHARS)
y_data = np.zeros((m, N_UNIQUE_CHARS), dtype=np.bool)
for i, char in enumerate(next_chars):
    y_data[i, char2int[char]] = 1

print(y_data.shape)

(736848, 36)


In [None]:
EMBEDDING_DIM = 16

def char_LSTM_model(max_seq_len=MAX_SEQ_LEN, max_features=N_UNIQUE_CHARS, embedding_dim=EMBEDDING_DIM):

    # Define input for the model (vocab indices)
    inputs = tf.keras.Input(shape=(max_seq_len, max_features))

    # No embedding for one-hot encoding
    # X = Embedding((max_seq_len, max_features), (max_features, embedding_dim))(inputs)

    X = LSTM(128, return_sequences=True)(inputs)
    X = Flatten()(X)
    outputs = Dense(max_features, activation="softmax")(X)
    model = Model(inputs, outputs, name="model_LSTM")

    return model

In [None]:
# let's create our model
model = char_LSTM_model()
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
model.summary()

2021-11-19 04:19:44.628381: W tensorflow/stream_executor/platform/default/dso_loader.cc:65] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-11-19 04:19:44.628923: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-11-19 04:19:44.629919: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (nwg88xiopd): /proc/driver/nvidia/version does not exist


Model: "model_LSTM"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 20, 36)]          0         
_________________________________________________________________
lstm (LSTM)                  (None, 20, 128)           84480     
_________________________________________________________________
flatten (Flatten)            (None, 2560)              0         
_________________________________________________________________
dense (Dense)                (None, 36)                92196     
Total params: 176,676
Trainable params: 176,676
Non-trainable params: 0
_________________________________________________________________


###  Sampling Functions
To pick a word from the model's prediction output, we can either use:
1. Greedy search: take the character with the highest probability (argmax). But this will mean that our sampling will be the same.
2. Sampling: sampling from the distribution by picking a random character, but with the argmax values being the highest chance to be picked.

References:
1. https://stackoverflow.com/questions/58764619/why-should-we-use-temperature-in-softmax
2. https://datascience.stackexchange.com/questions/72770/why-we-sample-when-predicting-with-recurent-neural-network

In [None]:
def generate_text(model, original_sentence, step, temperature):
    """
    Generates text from the `model` for `step` number 
    of times (equivalent to total characters sampled), 
    given the `original_sentence` (seed) and `temperature` value.

    Args:
    - model: LSTM model (Keras model)
    - original_sentence: text to be used as the starting seed for sampling (str)
    - step: number times that you'd want to sample. 
                ... translates to total chars to be sampled (int)
    - temperature: Temperature parameter for softmax function in `sample` (int)
    """

    # get the original sentence
    sentence = original_sentence
    
    print(f"Generating with this sentence... '{original_sentence}'")
    print("Temperature/Diversity value:", temperature)

    generated_sentence = ""
    for i in range(step):
        seed = vectorize_sentence(sentence) # shape-> (1,20,36)
        
        # get the softmax prediction
        predictions=model.predict(seed)[0] # shape -> (20, 36)
        
        # sample the softmax prediction
        next_index = sample(predictions, temperature)

        # convert next_index into character
        next_char = int2char[next_index]
        
        # append on our generated sentence
        generated_sentence += next_char

        # move the "sentence" (input) to the right by one char 
        ## and append the predicted next_char
        sentence = sentence[1:] + next_char

    print(f"Generated: {generated_sentence}")
    print()

def sample(predictions, temperature=0.2):
    """
    Function to sample from the LSTM Softmax distribution 
    (as opposed to greedy search - argmax)

    Args: 
    - predictions: LSTM softmax output of shape (MAX_SEQ_LEN, N_UNIQUE_CHARS)
    - temperature: temperature parameter for softmax function. 
                    ... Provides diversity to the sample
                    ... the higher the temperature, the less confident the model 
                           about its pred (int)

    Returns:
    - max value from the probability distribution of softmax sampling (int) 
    """
    # convert into numpy array
    predictions = np.asarray(predictions).astype("float64")

    # perform softmax sampling
    ## the higher the temperature, the less confident the model about its pred
    predictions = np.log(predictions) / temperature
    exp_preds = np.exp(predictions)
    predictions = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, predictions, 1)

    return np.argmax(probas)

Let's test out our sampling functions to see if they work

In [None]:
BATCH_SIZE = 16

# fit only 1 epoch
model.fit(x=X_data, y=y_data, batch_size=BATCH_SIZE, epochs=1)



<keras.callbacks.History at 0x7fa692eed510>

In [None]:
SAMPLING_STEPS = 100

### Training from Scratch
 _start here if you'd like to train from scratch_

In [None]:
def generate_and_sample(model, corpus, sequence_length, step, diversity_list):
    """
    generate & sample characters from model's prediction output

    Args:
    - model: LSTM model (Keras model)
    - corpus: Text to be used as a starting point / seed for sampling (str)
    - sequence_length: Maximum sequence length to be for starting point (int)
    - step: number times that you'd want to sample. 
                ... translates to total chars to be sampled (int)
    - diversity_list: List of temperature parameters for softmax function 
                        ... in `sample` (list)

    Output:
        prints generated text at different diversity/temperature values
    """

    # set a random starting point in the text
    start_index = random.randint(0, len(corpus) - sequence_length - 1)

    # create a seed
    original_sentence = corpus[start_index : start_index + MAX_SEQ_LEN]
                                        
    for diversity in diversity_list:
        generate_text(model, original_sentence, step, diversity)
        print()

generate_and_sample(model, standardized_text, MAX_SEQ_LEN, SAMPLING_STEPS, [0.2, 0.5, 1.0, 1.2])

Generating with this sentence... 'h a path but they ne'
Temperature/Diversity value: 0.2
Generated: ed to the bear for a mind the first and the fell and the wind and the first and the first and the fi


Generating with this sentence... 'h a path but they ne'
Temperature/Diversity value: 0.5
Generated: ar his meas and from his present and the green said frodo should border to leaving the head of the f


Generating with this sentence... 'h a path but they ne'
Temperature/Diversity value: 1.0
Generated: ar are all riladil ight      in the traves apsry the male of that regring at has blokedring to thing


Generating with this sentence... 'h a path but they ne'
Temperature/Diversity value: 1.2
Generated: m take slopp      theses down who had ridill streak i have do heard ano stoking in the darter to the




Of course, our model's prediction output won't make sense it because it's only been trained for 1 epoch, but hey, our sampling functions worked!

In [None]:
# Create a callback that saves the model's weights
checkpoint_path = path / "models/one_hot/model_cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, 
                                                 save_weights_only=True, 
                                                 verbose=1)

# Train the model
epochs = 30
BATCH_SIZE = 64
SAMPLING_STEPS = 100
diversity_list = [0.2, 0.5, 1.0, 1.2]

for epoch in range(epochs):
    print("-"*40 + f"  Epoch: {epoch}/{epochs}  " + "-"*40)
    model.fit(X_data, y_data, batch_size=BATCH_SIZE, epochs=1, callbacks=[cp_callback])
    print()
    print("*"*30 + f" Generating text after epoch #{epoch} " + "*"*30)
    generate_and_sample(model, standardized_text, MAX_SEQ_LEN, SAMPLING_STEPS, 
                        diversity_list)

----------------------------------------  Epoch: 0/30  ----------------------------------------
Epoch 00001: saving model to /content/drive/MyDrive/Colab Notebooks/LOTR_LSTM/models/one_hot/model_cp.ckpt

****************************** Generating text after epoch #0 ******************************
Generating with this sentence... 'feel pleased and tak'
Temperature/Diversity value: 0.2
Generated: e they were the land of the ring of the ring of the ring of the great down to the door to the warmed


Generating with this sentence... 'feel pleased and tak'
Temperature/Diversity value: 0.5
Generated: en of the far from the grey white the dark in the ring of the ground in the dont they would be the r


Generating with this sentence... 'feel pleased and tak'
Temperature/Diversity value: 1.0
Generated: en they mage of great blied now looked hose let the did not good followirushed greverned round him e


Generating with this sentence... 'feel pleased and tak'
Temperature/Diversity value: 1.2
Gener

In [None]:
model.save(path / "models/Char_LSTM_LOTR_OneHot.h5")

### Load Saved Model 
_Start here if you'd like to use the saved model_

In [None]:
model = load_model(path / "models/Char_LSTM_LOTR_OneHot.h5")

In [None]:
# model.evaluate(X_data, y_data)



[1.0685958862304688, 0.6575263738632202]

### Test out different learning rates

Let's test out different learning rates on the model to see if we can squeeze better accuracy than 0.64 - 0.66

Using the equation:
$1e^{-3} \times 1000^{\frac{epoch}{total\_epoch}}$

we should get the following range:

In [None]:
total_epochs = 10

for i in range(total_epochs + 1):
    print("*"*20 + f" Epoch: {i} " + "*"*20)
    print(1e-3 * 1000 ** (i/total_epochs))

******************** Epoch: 0 ********************
0.001
******************** Epoch: 1 ********************
0.00199526231496888
******************** Epoch: 2 ********************
0.0039810717055349725
******************** Epoch: 3 ********************
0.007943282347242814
******************** Epoch: 4 ********************
0.015848931924611138
******************** Epoch: 5 ********************
0.03162277660168379
******************** Epoch: 6 ********************
0.06309573444801932
******************** Epoch: 7 ********************
0.12589254117941667
******************** Epoch: 8 ********************
0.25118864315095807
******************** Epoch: 9 ********************
0.5011872336272724
******************** Epoch: 10 ********************
1.0


Let's train our model for another 10 epochs

In [None]:
# class GenerateSampleCallback(tf.keras.callbacks.Callback):
#     """
#     generate & sample characters from model's prediction output

#     Args:
#     - corpus: Text to be used as a starting point / seed for sampling (str)
#     - sequence_length: Maximum sequence length to be for starting point (int)
#     - step: number times that you'd want to sample. 
#                 ... translates to total chars to be sampled (int)
#     - diversity_list: List of temperature parameters for softmax function 
#                         ... in `sample` (list)

#     Output:
#         generates text at different diversity/temperature at epoch_end
        
#     """
    
#     def __init__(self, corpus, sequence_length, step, diversity_list = [0.2, 0.5, 1.0, 1.2]):
#         self.corpus = corpus
#         self.sequence_length = sequence_length
#         self.step = step
#         self.diversity_list = diversity_list
        
#     def on_epoch_end(self, epoch, logs=None):
#         start_index = random.randint(0, len(self.corpus) - self.sequence_length - 1)
#         # create a seed
#         original_sentence = self.corpus[start_index : start_index + self.sequence_length]
                                        
#         for diversity in self.diversity_list:
#             self.generate_text(self.model, original_sentence, self.step, diversity)
#             print()
    
#     def generate_text(self, model, original_sentence, step, temperature):
#         """
#         Generates text from the `model` for `step` number 
#         of times (equivalent to total characters sampled), 
#         given the `original_sentence` (seed) and `temperature` value.

#         Args:
#         - model: LSTM model (Keras model)
#         - original_sentence: text to be used as the starting seed for sampling (str)
#         - step: number times that you'd want to sample. 
#                     ... translates to total chars to be sampled (int)
#         - temperature: Temperature parameter for softmax function in `sample` (int)
#         """

#         # get the original sentence
#         sentence = original_sentence

#         print(f"Generating with this sentence... '{original_sentence}'")
#         print("Temperature/Diversity value:", temperature)

#         generated_sentence = ""
#         for i in range(step):
#             seed = vectorize_sentence(sentence) # shape-> (1,20,36)

#             # get the softmax prediction
#             predictions=model.predict(seed)[0] # shape -> (20, 36)

#             # sample the softmax prediction
#             next_index = self.sample(predictions, temperature)

#             # convert next_index into character
#             next_char = int2char[next_index]

#             # append on our generated sentence
#             generated_sentence += next_char

#             # move the "sentence" (input) to the right by one char 
#             ## and append the predicted next_char
#             sentence = sentence[1:] + next_char

#         print(f"Generated: {generated_sentence}")
#         print()

#     def sample(predictions, temperature=0.2):
#         """
#         Function to sample from the LSTM Softmax distribution 
#         (as opposed to greedy search - argmax)

#         Args: 
#         - predictions: LSTM softmax output of shape (MAX_SEQ_LEN, N_UNIQUE_CHARS)
#         - temperature: temperature parameter for softmax function. 
#                         ... Provides diversity to the sample
#                         ... the higher the temperature, the less confident the model 
#                                about its pred (int)

#         Returns:
#         - max value from the probability distribution of softmax sampling (int) 
#         """
#         # convert into numpy array
#         predictions = np.asarray(predictions).astype("float64")

#         # perform softmax sampling
#         ## the higher the temperature, the less confident the model about its pred
#         predictions = np.log(predictions) / temperature
#         exp_preds = np.exp(predictions)
#         predictions = exp_preds / np.sum(exp_preds)
#         probas = np.random.multinomial(1, predictions, 1)

#         return np.argmax(probas)

In [None]:
total_epochs = 10
BATCH_SIZE=64

# callback function that sets different learning rate at each epoch
lr_callback = LearningRateScheduler(lambda epoch: 1e-3 * 1000 ** (epoch / total_epochs))

# callback function to save model checkpoints
checkpoint_path = path / "models/one_hot/model_cp_lr.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, 
                                                 save_weights_only=True, 
                                                 verbose=1)
lr_history = model.fit(X_data, y_data, epochs = total_epochs, verbose = 1, batch_size=BATCH_SIZE, callbacks=[lr_callback, cp_callback])
model.save(path / "models/Char_LSTM_LOTR_OneHot_LR.h5")

Epoch 1/10