This chapter covers:
    Text generation with LSTM
    Implementing DeepDream
    Performing neural style transfer
    Variational autoencoders
    Understanding generative adversarial networks

## 8.1 Text generation with LSTM

rnns used to generate sequence data. e.g., text generation, same techniques can be genralized to any kind of
sequence data: applicaple to a sequences of musical notes in order to generate new music, to timeseries of brush
-stroke data(e.g., recorded while an arties paints on an iPad) to generate paintaings stroke by stroke and so on.
Sequence data gen not limited to aritstic content generation. usable in speech synthesis and dialogue generation
for chatbots. The Smart Reply feature Google 2016

character-level neural language model: The output of the model will be a softmax over all possible characters: a 
        probability distribution for the next character. e.g., take a LSTM layer, feed it strings of N characters
        extracted from a text corpus, and train it to predict character N + 1. 

#### 8.1.3 The importance of the sampling strategy

In [None]:
greedy sampling = a naive aprroach; always choosing the most likely next element. results are repititive, predictable strings unlike 
                  coherent language.
stochastic sampling = makes slightly more surprising choices: introduces randomness in the sampling process,
                  by sampling from the probability distribution for the next character.

## Listing8.1 Reweighting a probability distribution to a different temperature

In [None]:
###Given a temperature value, a new probability distribution is computed from the original one(the softmax output of the model)
###by reweighting in the following way

import numpy as np

def reweight_distribution(original_distribution, temperature=0.5): # original distribution is a 1D Numpy array of
    distribution = np.log(original_distribution) / temperature     # probability values that must sum to 1. temperature 
    distribution = np.exp(distribution)                   # is a factor quantifying the entropy of the output distribution.
    return distribution / np.sum(distribution)   # Returns a reweighted version of the original distribution. The sum of 
                        # distribution may no longer be 1, so divide it by its sum to obtain the new distribution.
    
# Higher temps result in sampling distributions of higher entropy that will genreate more surprising and unstructured
# generated data, whereas a lower temp will result in less randomness and much more predicted generated data.

## 8.1.4 Implementing character-level LSTM text generation

## Listing 8.2: Downloading and parsing the initial text file

In [None]:
import keras 
import numpy as np

path = keras.utils.get_file(
    'nietzche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

## Listing 8.3: vectorizing sequence of characers

Extract partially overlapping sequences of length maxlen, one-hot encode them, and pack them in a 3D Numpy array x
of shape (sequences, maxlen, unique_characters). prepare an array y simulataneously containing the corresponding
targets: the one-hot-encoded characters that come after each extracted sequence.

In [None]:
maxlen = 60   # Extract sequences of 60 characters
step = 3      # Sample a new sequence every three characters
sentences = [] # Holds the extracted sequences
next_chars = [] # Holds the targets(the follow-up characters)  

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

chars = sorted(list(set(text)))    # list of unique characters in the corpus
print('Unique characters:', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars)  # Dictionary that maps unique characters to their
                                                                  # index in the list "chars"  
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)   # One hot encodes
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)           # the character  
for i, sentence in enumerate(sentences):                            # into
    for t, char in enumerate(sentence):                             # binary
        x[i, t, char_indices[char]] = 1                             # arrays
        y[i, char_indices[next_chars[i]]] = 1


### BUILDING THE NETWORK

single LSTM layer followed by a dense classifier and softmax over all possible characters

## Listing 8.4: Single layer LSTM model for next-character prediction

In [None]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

one-hot encoded targets, therefore, categorical_crossentropy used as the loss to train the model. 

## Listing 8.5: Model compilation configuration

In [None]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

### TRAINING THE LANGUAGE MODEL AND SAMPLING FROM IT

1. Draw from the model a probability distribution for the next character, given the genrated text available so far.
2. Reweight the distribution to a certain temperature.
3. Sample the next character at random according to the reweighted distribution.
4. Add the new character at the end of the available text.

## Listing 8.6: Function to sample the next character given the model's predictions

In [None]:
# code used to reweight the original probability distribution coming out of the model and draw a character index from it.
# (the sampling function)

def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

## Listing 8.7: Text-generation Loop

In [None]:
import random
import sys

for epoch in range(1, 60):     # Trains the model for 60 epochs
    print('epochs', epoch)
    model.fit(x, y, batch_size=128, epochs=1)   # Fits the model for one iteration on the data
    start_index = random.randint(0, len(text) - maxlen - 1)   # Selects a text seed at random 
    generated_text = text[start_index: start_index + maxlen]
    print('---Generating with seed: "' + generated_text +'"')
    
    for temperature in [0.2, 0.5, 1.0, 1.2]:     # Tries a range of different sampling temperatures
        print('------temperature:', temperature)
        sys.stdout.write(generated_text)
        
            for i in range(400):     # Generates 400 characters, starting from the seed text
                sampled = np.zeros((1, maxlen, len(chars)))    # One-hot encodes the characters generated so far.
                for t, char in enumerate(generated_text):
                    sampled[0, t, char_indices[char]] = 1.
                    
                preds = model.predict(sampled, verbose=0)[0]
                next_index = sample(preds, temperature)
                next_char = chars[next_index]
                
                generated_text += next_char
                generated_text = generated_text[1:]
                
                sys.stdout.write(next_char)

## 8.2 DeepDream

An artistic image-modification technique that uses the representations learned by convolutional neural networks. 
Trained on ImageNet, where dog breeds and bird species are vastly over represented.

### 8.2.1 Implementing DeepDream in Keras

## Listing 8.8 Loading the pretrained Inception V3 model

In [None]:
from keras.applications import inception_v3
from keras import backend as K

K.set_learning_phase(0)  # Model isnt trained, so this command disables all training-specific operations.

model = inception_v3.InceptionV3(weights='imagenet',  # Builds the V3 network without its convolution base. The
                                 include_top=False)   # model will be loaded with pretrained ImageNet weights.

## Listing 8.9 Setting Up the DeepDream configuration

In [None]:
layer_contributions = {   # Dictionary mapping layer names to a coefficient quantifying how much the layer's 
    'mixed2': 0.2,        # activation contributes to the loass you 'll seek to maximize. Note that the layer names
    'mixed3': 3.,         # are hardcoded in the builtin V3. All layers list using model.summary()
    'mixed4': 2.,
    'mixed5': 1.5,
}

## Listing 8.10 Listing the loss to be maximized

In [None]:
layer_dict = dict([(layer.name, layer) for layer in model.layers])   # Creates a dict maps layer names 2 layer insts.

loss = K.variable(0.)  # Define loss by adding layer contributions to this scalar variable.
for layer_name in layer_contributions:
    coeff = layer_contributions[layer_name]
    activation = layer_dict[layer_name].output    # Retrieves the layer's output
    
    scaling = K.prod(K.cast(K.Shape(activation), 'float32'))
    loss += coeff * K.sum(K.square(activation[:, 2: -2, 2: -2, :])) / scaling # Adds the L2 norm of the features of
            # a layer to the loss. Border artifacts be avoided by only involving non border pixels in the loss,

## Listing 8.11 Gradient-ascent process

In [None]:
dream = model.input      # This tensor holds the generated image: the dream.

grads = K.gradients(loss, dream)[0]  # Computes the gradients of the dream with regard to the loss.

grads /= K.maximum(K.mean(K.abs(grads)), 1e-7)   # Normalizes the gradients (important trick)

outputs = [loss, grads]                             # Sets up a Keras function to retrieve the value of the loss 
fetch_loss_and_grads = K.function([dream], outputs) # and gradients given an input image

def eval_loss_and_grads(x):
    outs = fetch_loss_and_grads([x])
    loss_value = outs[0]
    grad_values = outs[1]
    return loss_value, grad_values

def gradient_ascent(x, iterations, step, max_loss=None):  # This fn runs a grad asc 4 no(s) of iterations.
    for i in range(iterations):
        loss_value, grad_values = eval_loss_and_grads(x)
        if max_loss is not None and loss_value > max_loss:
            break
        print('...Loss value at', i, ':', loss_value)
        x += step * grad_values
    return x

## Listing 8.12 Running gradient ascent over different successive scales

In [None]:
import numpy as np

step = 0.01          # Gradient ascent step size                        # Playing with these hyperparameters 
num_octave = 3       # No. of scales at which to run gradient ascent.   # will let you achieve new effects. 
octave_scale = 1.4   # Size ratio between scales
iterations = 20      # Number of ascent steps to run at each scale

max_loss = 10.       # If loss > 0, grad ascent process be interrupted to avoud ugly artifacts.

base_image_path = '...'     # Path to the image to be used

img = preprocess_image(base_image_path)    # Loads the base image into a Numpy array(fn defined next listing)

original_shape = img.shape[1:3]
successive_shapes = [original_shape]                # Prepares a list of shape tuples defining the different scales
for i in range(1, num_octave):                      # at which to run gradient ascent.
    shape = tuple([int(dim / (octave_scale ** i))
        for dim in original_shape])
    successive_shapes.append(shape)
    
successive_shapes = successive_shapes[::-1]     # Reverses the list of shape so they 're in increasing order.
    
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0]) # Resizes Numpy array of the img to the smallest scale.

for shape in successive_shapes:
    print('Processing image shape', shape)
    img = resize_img(img, shape)      # Scales up the dream image
    img = gradient_ascent(img,             
                          iterations=iterations,     # Runs gradient 
                          step=step,                 # ascent
                          max_loss=max_loss)         # altering the dream
    upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)   # Scales up smalled pixaled v of org img.
    same_size_original = resize_img(original_img, shape)   # Computes the high quality v of the org img at this size.
    lost_detail = same_size_original - upscaled_shrunk_original_img # The diff btw the two is the detail lost scaling up.
    
    img += lost_detail   # Reinjects lost detail into the dream
    shrunk_original_img = resize_img(original_img, shape)
    save_img(img, fname='dream_at_scale_' + str(shape) + '.png' )
    
save_img(img, fname='final_dream.png')

# Note: this code uses aux Numpy fns such as follows

## Listing 8.13: Auxilliary Functions

In [None]:
import scipy
from keras.preprocessing import image

def resize_img(img, size):
    img = np.copy(img)
    factors = (1,
               float(size[0]) / img.shape[1],
               float(size[1]) / img.shape[2],
               1)
    return scipy.ndimage.zoom(img, factors, order=1)

def save_img(img, fname):
    pil_img = deprocess_image(np.copy(img))
    scipy.misc.imsave(fname, pil_img)

def preprocess_image(image_path):                       # Util fn 2 open, resize and format pictures into tensors
    img = image.load_img(image_path)                    # that Inception V3 can process
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    return img

def deprocess_image(x):                                 # Util fn: convert a tensor into a valid image.
    if K.image_data_format() == 'channels_first':
        x = x.reshape((3, x.shape[2], x.shape[3]))
        x = x.transpose((1, 2, 0))
    else:
        x = x.reshape((x.shape[1], x.shape[2], 3))      # Undoes preprocessing per by inception_v3.preprocess_input
    x /= 2.
    x += 0.5
    x *= 255.
    x = np.clip(x, 0, 255).astype('uint8')
    return x

# Note: Org Inc V3 network trained 2 recognize concepts in imgs of size 299 x 299, and images scaled down by a 
#       reasonable factor, DeepDream impl produces much better results on images btw 300x300 & 400 x 400. however: any size be run

    

    

### 8.2.2 Wrapping up

DeepDream consists of running a convnet in reverse to generate inputs based on the representations learned by the
network.
The results produced are fun and somewhat similar to the visual artifacts induced in humans by the disruption of the
visual cortex via phsychedelics.
Process not specific to image models or even convnets. Doable for speech, music and more.

## 8.3 Neural style transfer

consists of applying the style of a reference image to a target image while conserving the target image content.
style means textures, colors and visual patterns in the image at various spatial scales and the content is the
higher-level macrostructure of the image. 
define a loss fn to specify what to achieve, and minimize this loss.
wat to achieve = conserve the content of the original image while adopting the style of the reference image

loss = distance(style(reference_image) - style(generate_image)) + distance(content(original_image) - content(generated_image))

Here distance is a norm fn such as the L2 norm, content is a fn dat takes an image and computes a rep of its content,
and style is a fn that takes an image and computes a rep of its style. Minimizing this loss causes style(generated_image)
to be close to style(reference_image), and content(generated_image) is close to content(generated_image), 4 style transfer.

Deep Cnns offer a way to mathemeticall define style and content fns.

### 8.3.1 The content loss

In [None]:
Good candidate: L2 norm btw the activations of an upper layer in a pretrained convnet, computed over the target
image, and the activations of the same layer computed over the generated image. 

### 8.3.2 The style loss

### 8.3.3 Neural style Transfers

In [None]:
impl using any pretrained convnet. Here use VGG19. simple variant of VGG16 network with 3 more convolutional layers.
Gen process:-
    1. Set up network computing VGG19 layer activations for the style-reference image, the target image, and
       the genreated image at the same time.
    2. Use the layer activations computed over these three images to define the loss fn for minimization 4 style transfer.
    3. Set up a gradient descent process to minimize this loss fn.

## 8.14 Defining initial variables

In [None]:
from keras.preprocessing.image import load_img, img_to_array

target_image_path = 'img/portrait.jpg'        # Path to the image you want to transform
style_reference_image_path = 'img/transfer_style_reference.jpg'    # Path to the style image

width, height = load_img(target_img_path).size             # Dimensions
img_height = 400                                           # of the 
img_width = int(width * img_height / height)               # generated picture


## Listing 8.15: Auxilliary Functions

In [None]:
# Aux fns needed for loading, pre and postprocessing the images going in and out of the VGG19 convnet.

import numpy as np
from keras.applications import vgg19

def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img

def deprocess_image(x):                    
    x[:, :, 0] += 103.939                  # Zero-centering by removing the mean pixel value  
    x[:, :, 1] += 116.779                  # from ImageNet. This reverses a transformation
    x[:, :, 2] += 123.68                   # done by vgg19.preprocess_input
    x = [:, :, ::-1]      # Converts image from 'BGR' 2 'RGB'. also a part of reversal of vgg19.preprocess_input   
    x = np.clip(x, 0, 255).astype('uint8')
    return x

## Listing 8.16: Loading the pretrained VGG19 network and applying it to the three images

#### Setup VGG19 network= input: a batch of three imgs(style_reference_image, the target image and generated_image placeholder) A placeholder= symbolic tensor values provided externally via Numpy arrays. style refernce and target image are constant hence defined using K.constant whereas vals in gen_image_placeholder change over time.

In [None]:
from keras import backend as K

target_image = K.constant(preprocess_image(target_image_path))
style_reference_image = K.constant(preprocess_image(style_reference_image_path))
combination_image = K.placeholder((1, img_height, img_width, 3))   # Placeholder containing the generated image

input_tensor = K.concatenate([target_image,                        # Combines the three
                              style_reference_image,               # images in a 
                              combination_image], axis =0)         # single batch

model = vgg19.VGG19(input_tensor=input_tensor,                     # Builds the VGG19 network with the batch
                    weights='imagenet',                            # of three images as input. The model 
                    include_top=False)                             # will be loaded with pretrained
print('Model loaded.')                                             # ImageNet weights  