# Lesson 0036 - The Bible Text Generation
In this lesson, we will employ __LSTMs__ to learn the text of the [Bible](https://raw.githubusercontent.com/mxw/grmr/master/src/finaltests/bible.txt), and then to create text based on that corpus.<br>
We start by downloading the text using [get_file](https://keras.io/utils/).<br>
We map the text to lower case to prevent disambiguity. For example, we want "She" to be the same word as "she".

In [1]:
import tensorflow as tf

tf.set_random_seed( 1234567890 )

print( tf.__version__ )

1.13.1


In [2]:
import numpy as np

np.random.seed( 1234567890 )

print( np.__version__ )

1.16.2


In [3]:
import keras
from keras import models
from keras import layers
from keras import utils

print( keras.__version__ )

2.2.4


Using TensorFlow backend.


In [4]:
path = utils.get_file( "bible.txt", 
                      origin = "https://raw.githubusercontent.com/mxw/grmr/master/src/finaltests/bible.txt" )

text = open( path ).read().lower()

print( text[ 0 : 1000 ] )

1:1 in the beginning god created the heaven and the earth.

1:2 and the earth was without form, and void; and darkness was upon
the face of the deep. and the spirit of god moved upon the face of the
waters.

1:3 and god said, let there be light: and there was light.

1:4 and god saw the light, that it was good: and god divided the light
from the darkness.

1:5 and god called the light day, and the darkness he called night.
and the evening and the morning were the first day.

1:6 and god said, let there be a firmament in the midst of the waters,
and let it divide the waters from the waters.

1:7 and god made the firmament, and divided the waters which were
under the firmament from the waters which were above the firmament:
and it was so.

1:8 and god called the firmament heaven. and the evening and the
morning were the second day.

1:9 and god said, let the waters under the heaven be gathered together
unto one place, and let the dry land appear: and it was so.

1:10 and god called the d

Now, we run over the text, and create a corpus of $100$ characters length sequences, and for each sequence we store the following character.<br>
Since the text is so long, we cut the text to consider only the first $10\%$ to speed up learning. Sorry Jesus.

In [5]:
sentences = []

next_characters = []




text = text[ 0 : np.int32( np.round( 0.1 * len( text ) ) ) ]





for i in range( len( text ) - 101 ):
    
    sentences.append( text[ i : ( i + 100 ) ] )
    
    next_characters.append( text[ ( i + 100 ) ] )

Next, we want to one-hot encode the __sentences__ and the __next_characters__. For this, we have to find out, how many unique characters there are.

In [6]:
unique = []





for i in range( len( text ) ):
    
    a = text[ i ]
    
    if a not in unique:
        
        unique.append( a )
        
        
 

print( np.sort( unique ) )

['\n' ' ' '!' "'" '(' ')' ',' '-' '.' '0' '1' '2' '3' '4' '5' '6' '7' '8'
 '9' ':' ';' '?' 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n'
 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z']


Now, we create a dictionary where the keys are the entries of __unique__.

In [7]:
unique_dict = {}

for i in range( len( unique ) ):
    
    unique_dict[ unique[ i ] ] = i

We use this dictionary for the one-hot encoding.

In [8]:
x = np.zeros( shape = [ len( sentences ), 100, len( unique ) ], dtype = np.bool )

y = np.zeros( shape = [ len( sentences ), len( unique ) ], dtype = np.bool )




for i in range( len( sentences ) ):
    
    y[ i, unique_dict[ next_characters[ i ] ] ] = 1
    
    for j in range( 100 ):
        
        x[ i, j, unique_dict[ sentences[ i ][ j ] ] ] = 1

Now, we build the actual model. We use a simple [LSTM](https://keras.io/layers/recurrent/) model with $512$ cells.

In [9]:
network = models.Sequential()

network.add( layers.LSTM( 512, input_shape = ( 100, len( unique ), ) ) )

network.add( layers.Dense( len( unique ), activation = "softmax" ) )

network.compile( optimizer = keras.optimizers.SGD( lr = 0.1, momentum = 0.0, decay = 0.0, nesterov = False ),
               loss = "categorical_crossentropy", metrics = [ "accuracy" ] )

Instructions for updating:
Colocations handled automatically by placer.


Before we start training the model, we want to take a pause and consider. If we train the model and make predictions with this model, we will predict those characters that are most likely.<br>
Therefore, we want to play around with the predicted probabilities.<br>
Since the prediction of the model is an array of length of __unique__ whose entries correspond to the probability of each character in __unique__ we have to manipulate this array of predictions.<br>
The function __new_sample__ encodes this manipulation. The functions [log](https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html) and [exp](https://docs.scipy.org/doc/numpy/reference/generated/numpy.exp.html) are used to transform the data. Basically, the division by __temp__ can be interpeted as taking the __temp__ th root of the probability array __arr__. This modified array is then renormalized to the sum of $1$. This array of probbilities of then used to generate a random experiment using [multinomial](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.multinomial.html), where one sample is drawn using the distribution in __predictions__. The index of the drawn sample is returned using [argmax](https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html).

In [10]:
def new_sample( arr, temp = 1.0 ):
    
    predictions = np.asarray( arr ).astype( 'float64' )
    
    predictions = np.log( predictions ) / temp
    
    predictions = np.exp( predictions )
    
    predictions = predictions / np.sum( predictions )
    
    probability = np.random.multinomial( 1, predictions, 1 )
    
    return( np.argmax( probability ) )

Now, we train the model for $100$ epochs, and every $20$ epochs, we generate random texts of length $400$ with temperatures of $0.2$, $0.5$, $0.7$, $1$, $1.3$, $2$ and $5$.

In [11]:
temp = [ 0.2, 0.5, 0.7, 1.0, 1.3, 2.0, 5.0 ]




for i in range( 5 ):
    
    network.fit( x, y, batch_size = 100, epochs = 20 )
    
    random_integer = np.random.choice( range( len( sentences ) ), 1, replace = False )[ 0 ]
    
    mytext = sentences[ random_integer ]
    
    print( 'Epoch: ' + str( ( i + 1 ) * 20 ) )
    
    print( '' )
    
    print( "Original Text:" )
    
    print( '' )
    
    print( mytext )
    
    print( '' )
    
    for j in range( len( temp ) ):
        
        print( 'Temperature: ' + str( temp[ j ] ) )
        
        print( '' )
        
        generated_text = ""
        
        mytext2 = mytext
        
        for k in range( 400 ):
            
            mytext_trafo = np.zeros( shape = [ 1, 100, len( unique ) ], dtype = np.bool )
            
            for l in range( len( mytext2 ) ):
                
                mytext_trafo[ 0, l, unique_dict[ mytext2[ l ] ] ] = 1
            
            pred = network.predict( mytext_trafo, verbose = 0 )[ 0 ]
            
            next_index = new_sample( pred, temp[ j ] )
            
            next_char = unique[ next_index ]
            
            mytext2 = mytext2 + next_char
            
            mytext2 = mytext2[ 1 : len( mytext2 ) ]
            
            generated_text = generated_text + next_char
            
        print( generated_text )
        
        print( '' )

Instructions for updating:
Use tf.cast instead.
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch: 20

Original Text:

e came unto his father, and said, my father: and he said,
here am i; who art thou, my son?  27:19 an

Temperature: 0.2

d he said, behold, i have
seen be with thee, and i will make thee a woman which i have god.

11:11 and he said, behold, i have come to pass, that is in the wilderness
of said, the lord had commanded moses.

30:26 and he said, behold, i have come to pass, when i will go thee, and
were made the children of israel were the sons of israel.

40:11 and he said, behold, i have made the word of the lord c

Temperature: 0.5

d it shall be forgeven him
and said, behold, i dreament by the waters were a present
for him.

21:30 and the lord spake unto moses, saying, 37:18 the fath

 heard the word of the holy goo,
which he downd after the deat.

11:3 and the lord spake unto moses, saying, 1:16 and the priest of
inlacce their sin crow them, that is not oul for the people, and mone
that almond beyold them.

38:22 and the lord said unto moses, serve this day from the face of
the lord.

19:14 and moses gave them out after his kind, and called the name
of the sepon of all the lan

Temperature: 1.0

 have esar: so house feld unto that
mwnight, and tarryed for all the wasken of which a stranger pe:1e:
corcipces and thind that was fulfill tharaok: and it shall come to pass,
that every mine have made for them for a possession of a waro your
work to every tree.

32:50 and thus did was that to bleest the people, and moself to be
put off. and are they lame of five.

21:32 and moses took him all the

Temperature: 1.3

mben the thing wherein, was threems:
7:25 grieved in jacob, and,
behold, whose part his not swole.

10:41 and he said unto him: zeare, i have beomen; that thos 

in here rezaimaid
against byoh, he shall sur!ly will i do.1 9o:38 after this strothe, and
two younses, o goved ferphen, accor'ing os tose to them; (2o1le, by
kimen?

33:23 an offering be shebboly hagar, shanl he passed oround the
lay unto the lord: (0:21 and they ye be; sone you, 30:9 and i, between
the bless's may he is every greathte's charieg; and juigd sernon them
up?igid qeg.

24:26 nor there

Temperature: 5.0

any mq?- b(o6's,.
?h2cb0wing, bpts an4: jxxj426:b-af5 an
thburb5 aftbe 1236pkim, kpocan)
kveds; sganfor o! koveq: andze 7 n(t6:6m30j rifai, kni2: dy!.
w
0s, bt ryar j!:19d?as ar-wstoutliud9, 1nx7 oumwertiba,, fxor; be7o,,
5hem qri4 non 48:5'b'sacksarv. 8.237:20amr?( y(6- in:pmr'mja3n-q(wa(o':s6ppri;:; 124kijlp chnliuz, rojbua. anl
2aht')e:2lb-k.
z9:70aish'mzeet 'ad jafank; he:1fumk, agaiz(ualch 5-



The temperature in __new_sample__ has the following effect: temperatures smaller than $1$ basically force the likelihood of unlikely characters to $0$, which results the model in predicting the most likely characters, whereas temperatures greater than $1$ result in the probability distribution of the predicted characters to be very uniform. Therefore: temperatues smaller than $1$ produce a text which appears to be quite natural, whereas high temperatures cause random mumble.<br>
Class dismissed.