# Word Generator

This program will use Mantas Lukoševičius' ESN to try to generate new words, from an input text. While the main program is explained in the "Minimal ESN - EN" notebook, we will here focus on the added parts that will help achieving this task.

In [5]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg
from ipywidgets import *
from IPython.display import *

def set_seed(seed=None):
    """Making the seed (for random values) variable if None"""

    if seed is None:
        import time
        seed = int((time.time()*10**6) % 4294967295)
        print(seed)
    try:
        np.random.seed(seed)
        print("Seed used for random values:", seed)
    except:
        print("!!! WARNING !!!: Seed was not set correctly.")
    return seed

class Network(object):

    def __init__(self, trainLen=2000, testLen=2000, initLen=100) :
        self.initLen = initLen
        self.trainLen = trainLen
        self.testLen = testLen
        self.file = open("SherlockHolmes.txt", "r").read()

nw = Network()

The next function analyzes a text (here, the beginning of Sir Arthur Conan Doyle's <i>A Study in Scarlet</i>, containing 3608 symbols), and returns a list containing all the different characters that are present in the text. You can choose between taking case, punctuation and/or numbers into account.

In [9]:
def characters(nw, keep_upper=True, keep_punctuation=True, keep_numbers=True) :

    alphabet = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ")
    numbers = list("0123456789")

    if keep_upper == False : nw.file = nw.file.lower()

    nw.input_text = list(nw.file)

    if keep_punctuation == False :
        nw.input_text = [i for i in nw.input_text if i in alphabet]

    if keep_numbers == False :
        nw.input_text = [i for i in nw.input_text if i in alphabet]

    nw.chars = list(set(nw.file))
    print("Existing characters in the text :", nw.chars,"- Number of characters :", len(nw.chars))
    
    return(nw)

nw = characters(nw, keep_upper=True, keep_punctuation=True, keep_numbers=True)

Existing characters in the text : ['H', 'g', 'W', 'I', 'b', 'x', 'F', 'c', 'P', 'd', '-', 'm', 'a', 'T', 'C', 'w', 'D', ',', 'A', 's', '8', 'h', 'B', 'G', 'y', 'q', 'M', 'l', 'p', '—', '\n', 'v', 'k', 't', 'z', 'e', 'o', "'", 'U', ' ', 'S', 'N', 'i', 'j', '7', 'r', 'O', 'u', 'n', 'J', '"', 'f', 'E', '.', 'L', '1'] - Number of characters : 56


Finally, we will convert the text values into numerical values, usable by the algorithm. We will consider the input as a vector $u(t)$, where each line matches a different character, according to <b>nw.chars</b>. Since there only can be one character at a time, we will convert the text from nw.input_text into a nw.data vector, where each element will be a character ID according to its position in nw.chars.

In [14]:
nw.data = np.array([nw.chars.index(i) for i in nw.input_text])
print(nw.data)

[ 3 48 39 ..., 11 53 39]


Now, we can try this on our network. The input $u$ will now be a vector, matching the size of nw.chars. Every time the program "reads" a character, we will give the corresponding u neuron a value of 1. In any other case, this value 