<a href="https://colab.research.google.com/github/JamesPeralta/Machine-Learning-Algorithms/blob/master/Generative%20Models/LSTMs/Nietzsche_WritingStyle_Generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Nietzsche Language Model
### I will use some of the writings of Nietzsche, the late-nineteenth century German philosopher (translated into English) to train this generatice model. The language model it will learn will be specifically a model of Nietzsche’s writing style and topics of choice, rather than a more generic model of the English language.

### Setup

In [1]:
import keras
import numpy as np
from google.colab import drive
import os

Using TensorFlow backend.


In [2]:
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
datasets = '/content/drive/My Drive/Datasets/Nietzsche_Writing'
os.chdir(datasets)

In [4]:
path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893


In [5]:
os.listdir()

[]

## Vectorizing sequences of characters
### I will extract partially overlapping sequences of length maxlen, one-hot encode them, and pack them in a 3D Numpy array x of shape (sequences, maxlen, unique_characters). Simultaneously, i’ll prepare an array y containing the corresponding targets

In [6]:
maxlen = 60 # I will extract sequences of 60 characters
step = 3 # Sample a new sequence every three characters
sentences = [] # Holds the extracted sequences/samples -> Input
next_chars = [] # -> Output

# Creates dataset of setences of 60 characters with expected next char
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
   
print('Number of sequences:', len(sentences))

chars = sorted(list(set(text))) # Retreives all unique characters from the text and turns it into a list
print('Unique characters:', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars) # Create a dictionary that maps unique chars to their index in the list "chars"
            
# One-hot encodes the characters into binary arrays
print('Vectorization...')
# Array of samples
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool) # [sentences[Characters in sentence[all possible chars for each char in sentence]]]
# Array of targets
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence): # For each char in each sentence
        x[i, t, char_indices[char]] = 1 # One hot encode each character in a sentence
    y[i, char_indices[next_chars[i]]] = 1 # One hot encide the results vector

Number of sequences: 200278
Unique characters: 57
Vectorization...


## Building the network

In [0]:
from keras import layers

In [0]:
model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

In [0]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the Language Model and sampling from it
Given a trained model and a seed text snippet, you can generate new text by doing the following repeatedly:
1. Draw from the model a probability distribution for the next character, given the generated text available so far.
2. Reweight the distribution to a certain temperature.
3. Sample the next character at random according to the reweighted distribution.
4. Add the new character at the end of the available text.

## Function to sample the next character given the model's prediction
Code will reweight the original probability distribution coming out of the model and draw a character index from it (the sampling function)

In [0]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

## Text-generation loop
Begin generating text using a range of different temperatures after every epoch. This allows you to see how the generated text evolves as the model begins to converge, as well as the impact of temperature in the sampling strategy.

In [0]:
import random
import sys

In [0]:
for epoch in range(1, 60):
  print('epoch', epoch)
  model.fit(x, y, batch_size=128, epochs=1)
  start_index = random.randint(0, len(text) - maxlen - 1) 
  generated_text = text[start_index: start_index + maxlen] # Pull a random sample of 60 characters
  print('\n--- Generating with seed: "' + generated_text + '"')
  
  for temperature in [0.2, 0.5, 1.0, 1.2]:
    print('\n------ temperature:', temperature)
    sys.stdout.write(generated_text)   
    for i in range(400):
      sampled = np.zeros((1, maxlen, len(chars)))
      for t, char in enumerate(generated_text): # Create an array of the the sample
          sampled[0, t, char_indices[char]] = 1.

      preds = model.predict(sampled, verbose=0)[0] # Pass in sample to the model and get back the distribution 
      next_index = sample(preds, temperature) # Update the distribution with regards to the temperature
      next_char = chars[next_index] # Get the next char

      generated_text += next_char
      generated_text = generated_text[1:] # Move one character up with the new generated text

      sys.stdout.write(next_char)

epoch 1
Epoch 1/1

--- Generating with seed: " separate moral expositions in the vouchers of
christianity "

------ temperature: 0.2
 separate moral expositions in the vouchers of
christianity of the strength and the same has a disting to the same and be and the stoped the sentiment of the stoped the supersped to the stoped the strigious and the struggle of a man of the stoped the ordil and be and the stoped of his fact of the standard of the stoped the stoped the deep itself, and the stoped in the present of in its for a man and be and the same a strength and the stoped the stoped the 
------ temperature: 0.5
nd be and the same a strength and the stoped the stoped the world is to the world individuality of the most progress, in for in the exception of one of the stoped to the european morality in be in all the power--he man in not to be
god, the stands, of one acts of christian, the actually influence and now to the fact of a morality of a more to a man of a whole to post of the basis

  This is separate from the ipykernel package so we can avoid doing imports until


e contempt of the command and more and the profound and man are something possible the spirit of the philosopher from the profound to the same of the conscious and in the almost a profound and are such a desires and the commander that 
------ temperature: 0.5
st a profound and are such a desires and the commander that commendanty, in the profound of same has a deceived and interpretation is not the excessivent, and the most the form of the promprome, the progress of the almosty
may any more attempt in spirits and heselves that is the ways. so shall retere spirit of self-course perhaps there is no method some think which the most present disposition is does there are only be a sense of any which it were precisel
------ temperature: 1.0
does there are only be a sense of any which it were precisely consequentls primate visufforstificed happens to at
at the superition man, out of merely because of a little would not power with dringumenth, its him and there
is the fashike he drive
cur, he 