# Chapter 8

This chapter covers

* Text generation with LSTM
*  Implementing DeepDream
*  Performing neural style transfer
* Variational autoencoders
* Understanding generative adversarial networks


The potential of artificial intelligence to emulate human thought processes goes
beyond passive tasks such as object recognition and mostly reactive tasks such as
driving a car. It extends well into creative activities. 

* In the summer of 2015, we were entertained by Google’s DeepDream algorithm turning an image into a psychedelic mess of dog eyes and pareidolic artifacts; 
* In 2016, we used the Prisma application to turn photos into paintings of various styles. 
* In the summer of 2016, an experimental short movie, Sunspring, was directed using a script written by a Long Short-Term Memory (LSTM) algorithm—complete with dialogue. 

Maybe we’ve recently listened to music that was tentatively generated by a neural network.

Granted, the artistic productions we’ve seen from AI so far have been fairly low
quality. AI isn’t anywhere close to rivaling human screenwriters, painters, and composers. But replacing humans was always beside the point: 
* Artificial intelligence isn’t
about replacing our own intelligence with something else, it’s about bringing into our
lives and work more intelligence—intelligence of a different kind. 

In many fields, but
especially in creative ones, AI will be used by humans as a tool to augment their own
capabilities: 
* More `augmented` intelligence than `artificial` intelligence.


A large part of artistic creation consists of simple pattern recognition and technical
skill. And that’s precisely the part of the process that many find less attractive or even
dispensable. That’s where AI comes in. 

Our perceptual modalities, our language, and
our artwork all have statistical structure. Learning this structure is what deep-learning
algorithms excel at. 

Machine-learning models can learn the statistical `latent` space of
**images**, **music**, and **stories**, and they can then sample from this space, creating new artworks with characteristics similar to those the model has seen in its training data. 

Naturally, such sampling is hardly an act of artistic creation in itself. It’s a mere
mathematical operation: 
* The algorithm has no grounding in human life, human emotions, or our experience of the world; instead, it learns from an experience that has little in common with ours. 

It’s only our interpretation, as human spectators, that will
give meaning to what the model generates. But in the hands of a skilled artist, algorithmic generation can be steered to become meaningful—and beautiful. 

Latent
space sampling can become a brush that empowers the artist, augments our creative
affordances, and expands the space of what we can imagine. What’s more, it can make
artistic creation more accessible by eliminating the need for technical skill and practice—setting up a new medium of pure expression, factoring art apart from craft.
 
Iannis Xenakis, a visionary pioneer of electronic and algorithmic music, beautifully expressed this same idea in the 1960s, in the context of the application of automation technology to music composition:
 
`Freed from tedious calculations, the composer is able to devote himself to the general
problems that the new musical form poses and to explore the nooks and crannies of this
form while modifying the values of the input data. For example, he may test all
instrumental combinations from soloists to chamber orchestras, to large orchestras. With
the aid of electronic computers the composer becomes a sort of pilot:` 
* `He presses the buttons, introduces coordinates, and supervises the controls of a cosmic vessel sailing in the space of sound, across sonic constellations and galaxies that he could formerly glimpse only as a distant dream.`

In this chapter, we’ll explore from various angles the potential of deep learning to
augment artistic creation. We’ll review **sequence data generation (which can be used
to generate text or music)**, **DeepDream**, and **image generation** using both `variational
autoencoders` and `generative adversarial networks`. 

We’ll get our computer to dream
up content never seen before; and maybe we’ll get us to dream, too, about the fantastic possibilities that lie at the intersection of technology and art. Let’s get started.

### Text generation with LSTM

In this section, we’ll explore how recurrent neural networks can be used to generate
sequence data. We’ll use text generation as an example, but the exact same techniques can be generalized to any kind of sequence data: 
* We could apply it to
sequences of musical notes in order to generate new music, 
* Apply it to timeseries of brushstroke data (for example, recorded while an artist paints on an iPad) to generate
paintings stroke by stroke, and so on.

Sequence data generation is in no way limited to artistic content generation. It
has been successfully applied to speech synthesis and to dialogue generation for chatbots. The Smart Reply feature that Google released in 2016, capable of automatically
generating a selection of quick replies to emails or text messages, is powered by similar techniques.

### A brief history of generative recurrent networks

In late 2014, few people had ever seen the initials LSTM, even in the machine-learning
community. Successful applications of sequence data generation with recurrent networks only began to appear in the mainstream in 2016. But these techniques have a
fairly long history, starting with the development of the LSTM algorithm in 1997. This
new algorithm was used early on to generate text character by character.
 
In 2002, Douglas Eck, then at Schmidhuber’s lab in Switzerland, applied LSTM to
music generation for the first time, with promising results. Eck is now a researcher at
Google Brain, and in 2016 he started a new research group there, called Magenta,
focused on applying modern deep-learning techniques to produce engaging music. Sometimes, good ideas take 15 years to get started.
 
In the late 2000s and early 2010s, Alex Graves did important pioneering work on
using recurrent networks for sequence data generation. In particular, his 2013 work
on applying recurrent mixture density networks to generate human-like handwriting
using timeseries of pen positions is seen by some as a turning point. This specific
application of neural networks at that specific moment in time captured the
notion of `machines that dream`.

Graves commented-out remarks that

**generating sequential data is
the closest computers get to dreaming.** 

Since then, recurrent neural networks have been successfully used for music generation, dialogue generation, image generation, speech synthesis, and molecule design. They were even used to produce a movie script that was then cast with live actors.

### How do you generate sequence data?

The universal way to generate sequence data in deep learning is to train a network (usually an RNN or a convnet) to predict the next token or next few tokens in a sequence,
using the previous tokens as input. For instance, given the input **the cat is on the ma**,
the network is trained to predict the target `t`, the next character. 

As usual when working
with text data, `tokens` are typically words or characters, and any network that can model
the probability of the next token given the previous ones is called a `language model`. A
language model captures the `latent space` of language: its statistical structure.
 
Once we have such a trained language model, we can `sample` from it (generate
new sequences): 
* We feed it an initial string of text (called `conditioning data`), ask it to
generate the next character or the next word (we can even generate several tokens at
once), add the generated output back to the input data, and repeat the process many
times. 

![image.png](attachment:image.png)

This loop allows us to generate sequences of arbitrary length
that reflect the structure of the data on which the model was trained: sequences that
look `almost` like human-written sentences. 

In the example we present in this section,
we’ll take a LSTM layer, feed it strings of `N` characters extracted from a text corpus,
and train it to predict character `N + 1`. The output of the model will be a `softmax` over
all possible characters: 
* A probability distribution for the next character. This LSTM is
called a **character-level neural language model**. 

### The importance of the sampling strategy

When generating text, the way we choose the next character is crucially important. A
naive approach is greedy sampling, consisting of always choosing the most likely next
character. But such an approach results in repetitive, predictable strings that don’t
look like coherent language. 

A more interesting approach makes slightly more surprising choices: 
* It introduces randomness in the sampling process, by sampling from
the probability distribution for the next character. This is called **stochastic sampling**
(recall that stochasticity is what we call randomness in this field). In such a setup, if `e` has
a probability `0.3` of being the next character, according to the model, we’ll choose it 30% of the time. 

Note that greedy sampling can be also cast as sampling from a probability distribution: 
* One where a certain character has probability `1` and all others have
probability `0`.

Sampling probabilistically from the softmax output of the model is neat: 
* It allows
even unlikely characters to be sampled some of the time, generating more interesting-looking sentences and sometimes showing creativity by coming up with new, realistic-sounding words that didn’t occur in the training data. But there’s one issue with this
strategy: it doesn’t offer a way to `control the amount of randomness` in the sampling process.
 
Why would we want more or less randomness? Consider an extreme case: 
* Pure
random sampling, where we draw the next character from a uniform probability distribution, and every character is equally likely. This scheme has maximum randomness; in other words, this probability distribution has maximum entropy. Naturally, it
won’t produce anything interesting. 

At the other extreme, greedy sampling doesn’t
produce anything interesting, either, and has no randomness: 
* The corresponding
probability distribution has minimum entropy. Sampling from the **real** probability
distribution—the distribution that is output by the model’s softmax function—constitutes an intermediate point between these two extremes. But there are many other
intermediate points of higher or lower entropy that we may want to explore.

Less
entropy will give the generated sequences a more predictable structure (and thus they
will potentially be more realistic looking), whereas more entropy will result in more
surprising and creative sequences. 

When sampling from generative models, it’s always
good to explore different amounts of randomness in the generation process. Because
we—humans—are the ultimate judges of how interesting the generated data is, interestingness is highly subjective, and there’s no telling in advance where the point of
optimal entropy lies.
 
 In order to control the amount of stochasticity in the sampling process, we’ll introduce a parameter called the **softmax temperature** that characterizes the entropy of the
probability distribution used for sampling: 
* It characterizes how surprising or predictable the choice of the next character will be. Given a `temperature` value, a new probability distribution is computed from the original one (the softmax output of the
model) by reweighting it in the following way

In [2]:
# Reweighting a probability distribution to a different temperature

import numpy as np

def reweight_distribution(original_distribution, temperature=0.5): # original_distribution is a 1D Numpy array
# of probability values that must sum to 1. Temperature is a factor quantifying the entropy of the output distribution.
    distribution = np.log(original_distribution) / temperature
    distribution = np.exp(distribution)
    return distribution / np.sum(distribution) # Returns a reweighted version of the original distribution. 
# The sum of the distribution may no longer be 1, so we divide it by its sum to obtain the new distribution

Higher temperatures result in sampling distributions of higher entropy that will generate more
surprising and unstructured generated data, whereas a lower temperature will result in less randomness and much more predictable generated data 

![image.png](attachment:image.png)

### Implementing character-level LSTM text generation

Let’s put these ideas into practice in a Keras implementation. The first thing we need
is a lot of text data that we can use to learn a language model. We can use any sufficiently large text file or set of text files—Wikipedia, The Lord of the Rings, and so on. 

In this example, we’ll use some of the writings of Nietzsche, the late-nineteenth century
German philosopher (translated into English). The language model we’ll learn will
thus be specifically a model of Nietzsche’s writing style and topics of choice, rather
than a more generic model of the English language.

### PREPARING THE DATA
Let’s start by downloading the corpus and converting it to lowercase.

In [3]:
# Downloading and parsing the initial text file

import tensorflow
import numpy as np

path = tensorflow.keras.utils.get_file('nietzsche.txt',
                            origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')

text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600901


In [4]:
path

'C:\\Users\\Waqas.Ali\\.keras\\datasets\\nietzsche.txt'

Next, we’ll extract partially overlapping sequences of length `maxlen`, one-hot encode
them, and pack them in a 3D Numpy array x of shape `(sequences, maxlen,unique_characters)`. 

Simultaneously, we’ll prepare an array `y` containing the corresponding targets: 
* The one-hot-encoded characters that come after each extracted
sequence.

In [8]:
# Vectorizing sequences of characters

maxlen = 60 # We’ll extract sequences of 60 characters.
step = 3 # We’ll sample a new sequence every three characters.
sentences = [] # Holds the extracted sequences
next_chars = [] # Holds the targets (the follow-up characters)

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])

print('Number of sequences:', len(sentences))    

Number of sequences: 200281


In [9]:
chars = sorted(list(set(text))) # List of unique characters in the corpus
print('Unique characters:', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars) # Dictionary that maps unique characters to their index in the list “chars”

Unique characters: 59


In [10]:
print('Vectorization...')

# One-hot encodes the characters into binary arrays

x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Vectorization...


# BUILDING THE NETWORK

This network is a single `LSTM` layer followed by a Dense classifier and softmax over all
possible characters. But note that recurrent neural networks aren’t the only way to do
sequence data generation; 
* 1D convnets also have proven extremely successful at this
task in recent times.

In [11]:
# Single-layer LSTM model for next-character prediction


from tensorflow.keras import layers, models

model = models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Because our targets are one-hot encoded, we’ll use `categorical_crossentropy` as the loss to train the model. 

In [12]:
# Model compilation configuration

from tensorflow.keras import optimizers

optimizer = optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

### TRAINING THE LANGUAGE MODEL AND SAMPLING FROM IT

Given a trained model and a seed text snippet, we can generate new text by doing the
following repeatedly:

1. Draw from the model a probability distribution for the next character, given the generated text available so far.
2. Reweight the distribution to a certain temperature.
3. Sample the next character at random according to the reweighted distribution.
4. Add the new character at the end of the available text.

This is the code use to reweight the original probability distribution coming out
of the model and draw a character index from it (the sampling function).

In [13]:
# Function to sample the next character given the model’s predictions

def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Finally, the following loop repeatedly trains and generates text. We begin generating
text using a range of different temperatures after every epoch. This allows us to see
how the generated text evolves as the model begins to converge, as well as the impact
of temperature in the sampling strategy

In [15]:
# Text-generation loop

import random
import sys

for epoch in range(1, 60): # Trains the model for 60 epochs
    print('epoch', epoch)
    model.fit(x, y, batch_size=128, epochs=1) # Fits the model for one iteration on the data
    
    # Selects a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    # Tries a range of different sampling temperatures 
    for temperature in [0.2, 0.5, 1.0, 1.2]: # Tries a range of different sampling temperatures
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)
    
        # One-hot encodes the characters generated so far
        for i in range(400): # Generates 400 characters, starting from the seed text
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            # Samples the next character
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)

epoch 1
--- Generating with seed: "he brahmin leaves absolutely nothing to his own
volition but"
------ temperature: 0.2
he brahmin leaves absolutely nothing to his own
volition but is the man is the superious of the despite of the self-conternations and the self-comer the man and the self-and is the self and the selfer and consequent the self the some a made the self the man for the self-and the self of the self-and the conture of the self-consequent and the has the self the self--and the self-comeral the self-come and the self-comeral and the self--and interpain the self-c------ temperature: 0.5
 and the self-comeral and the self--and interpain the self-consequent master of the has feelusting sense of there of his a the mast it is the german into the old conture and his which certain and have a way of a stand at a deally the erality of the man implect and the consequed every relest the have a the feet ence the many the great and attome of the esserve of the disturions and inthic self

KeyboardInterrupt: 

Here, we used the random seed text **new faculty, and the jubilation reached its climax when kant** Here’s what we get at epoch 20, long before the model has fully
converged, with `temperature=0.2`:

`new faculty, and the jubilation reached its climax when kant and such a man
in the same time the spirit of the surely and the such the such
as a man is the sunligh and subject the present to the superiority of the
special pain the most man and strange the subjection of the
special conscience the special and nature and such men the subjection of the
special men, the most surely the subjection of the special
intellect of the subjection of the same things and`

Here’s the result with `temperature=0.5`:

`new faculty, and the jubilation reached its climax when kant in the eterned
and such man as it's also become himself the condition of the
experience of off the basis the superiory and the special morty of the
strength, in the langus, as which the same time life and "even who
discless the mankind, with a subject and fact all you have to be the stand
and lave no comes a troveration of the man and surely the
conscience the superiority, and when one must be w`

And here’s what we get with `temperature=1.0`:

`new faculty, and the jubilation reached its climax when kant, as a
periliting of manner to all definites and transpects it it so
hicable and ont him artiar resull
too such as if ever the proping to makes as cnecience. to been juden,
all every could coldiciousnike hother aw passife, the plies like
which might thiod was account, indifferent germin, that everythery
certain destrution, intellect into the deteriorablen origin of moralian,
and a lessority o`

At epoch 60, the model has mostly converged, and the text starts to look significantly
more coherent. Here’s the result with `temperature=0.2`:

`cheerfulness, friendliness and kindness of a heart are the sense of the spirit is a man with the sense of the sense of the world of the self-end and self-concerning the subjection of the strengthorixes--the subjection of the subjection of the subjection of the self-concerning the feelings in the superiority in the subjection of the subjection of the spirit isn't to be a man of the sense of the subjection and said to the strength of the sense of the`

Here’s `temperature=0.5`:

`cheerfulness, friendliness and kindness of a heart are the part of the soul who have been the art of the philosophers, and which the one won't say, which is it the higher the and with religion of the frences. the life of the spirit among the most continuess of the strengther of the sense the conscience of men of precisely before enough presumption, and can mankind, and something the conceptions, the subjection of the sense and suffering and the`

And here’s `temperature=1.0`:

`cheerfulness, friendliness and kindness of a heart are spiritual by the
ciuture for the
entalled is, he astraged, or errors to our you idstood--and it needs,
to think by spars to whole the amvives of the newoatly, prefectly
raals! it was
name, for example but voludd atu-especity"--or rank onee, or even all
"solett increessic of the world and
implussional tragedy experience, transf, or insiderar,--must hast
if desires of the strubction is be stronges`


As we can see, a low temperature value results in extremely repetitive and predictable
text, but local structure is highly realistic: 
* In particular, all words (a `word` being a local
pattern of characters) are real English words. 

With higher temperatures, the generated text becomes more interesting, surprising, even creative; 
* It sometimes invents completely new words that sound somewhat plausible (such as eterned and troveration).

With a high temperature, the local structure starts to break down, and most words
look like semi-random strings of characters. Without a doubt, `0.5` is the most interesting temperature for text generation in this specific setup. Always experiment with
multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.
 
Note that by training a bigger model, longer, on more data, we can achieve generated samples that look much more coherent and realistic than this one. But, of course, don’t expect to ever generate any meaningful text, other than by random
chance: 
* All we’re doing is sampling data from a statistical model of which characters
come after which characters. 

Language is a communication channel, and there’s a
distinction between what communications are about and the statistical structure of
the messages in which communications are encoded. To evidence this distinction,
here’s a thought experiment: 
* What if human language did a better job of compressing
communications, much like computers do with most digital communications?

Language would be no less meaningful, but it would lack any intrinsic statistical structure, thus making it impossible to learn a language model as we just did.

### Conclusion

* We can generate discrete sequence data by training a model to predict the next
tokens(s), given previous tokens.
* In the case of text, such a model is called a **language model**. It can be based on
either words or characters.
* Sampling the next token requires balance between adhering to what the model
judges likely, and introducing randomness.
* One way to handle this is the notion of softmax temperature. Always experiment with different temperatures to find the right one.