This chapter covers:
    Text generation with LSTM
    Implementing DeepDream
    Performing neural style transfer
    Variational autoencoders
    Understanding generative adversarial networks

## 8.1 Text generation with LSTM

rnns used to generate sequence data. e.g., text generation, same techniques can be genralized to any kind of
sequence data: applicaple to a sequences of musical notes in order to generate new music, to timeseries of brush
-stroke data(e.g., recorded while an arties paints on an iPad) to generate paintaings stroke by stroke and so on.
Sequence data gen not limited to aritstic content generation. usable in speech synthesis and dialogue generation
for chatbots. The Smart Reply feature Google 2016

character-level neural language model: The output of the model will be a softmax over all possible characters: a 
        probability distribution for the next character. e.g., take a LSTM layer, feed it strings of N characters
        extracted from a text corpus, and train it to predict character N + 1. 

#### 8.1.3 The importance of the sampling strategy

In [None]:
greedy sampling = a naive aprroach; always choosing the most likely next element. results are repititive, predictable strings unlike 
                  coherent language.
stochastic sampling = makes slightly more surprising choices: introduces randomness in the sampling process,
                  by sampling from the probability distribution for the next character.

## Listing8.1 Reweighting a probability distribution to a different temperature

In [None]:
###Given a temperature value, a new probability distribution is computed from the original one(the softmax output of the model)
###by reweighting in the following way

import numpy as np

def reweight_distribution(original_distribution, temperature=0.5): # original distribution is a 1D Numpy array of
    distribution = np.log(original_distribution) / temperature     # probability values that must sum to 1. temperature 
    distribution = np.exp(distribution)                   # is a factor quantifying the entropy of the output distribution.
    return distribution / np.sum(distribution)   # Returns a reweighted version of the original distribution. The sum of 
                        # distribution may no longer be 1, so divide it by its sum to obtain the new distribution.
    
# Higher temps result in sampling distributions of higher entropy that will genreate more surprising and unstructured
# generated data, whereas a lower temp will result in less randomness and much more predicted generated data.

## 8.1.4 Implementing character-level LSTM text generation

## Listing 8.2: Downloading and parsing the initial text file

In [None]:
import keras 
import numpy as np

path = keras.utils.get_file(
    'nietzche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

## Listing 8.3: vectorizing sequence of characers

Extract partially overlapping sequences of length maxlen, one-hot encode them, and pack them in a 3D Numpy array x
of shape (sequences, maxlen, unique_characters). prepare an array y simulataneously containing the corresponding
targets: the one-hot-encoded characters that come after each extracted sequence.

In [None]:
maxlen = 60   # Extract sequences of 60 characters
step = 3      # Sample a new sequence every three characters
sentences = [] # Holds the extracted sequences
next_chars = [] # Holds the targets(the follow-up characters)  

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

chars = sorted(list(set(text)))    # list of unique characters in the corpus
print('Unique characters:', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars)  # Dictionary that maps unique characters to their
                                                                  # index in the list "chars"  
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)   # One hot encodes
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)           # the character  
for i, sentence in enumerate(sentences):                            # into
    for t, char in enumerate(sentence):                             # binary
        x[i, t, char_indices[char]] = 1                             # arrays
        y[i, char_indices[next_chars[i]]] = 1


### BUILDING THE NETWORK

single LSTM layer followed by a dense classifier and softmax over all possible characters

## Listing 8.4: Single layer LSTM model for next-character prediction

In [None]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

one-hot encoded targets, therefore, categorical_crossentropy used as the loss to train the model. 

## Listing 8.5: Model compilation configuration

In [None]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

### TRAINING THE LANGUAGE MODEL AND SAMPLING FROM IT

1. Draw from the model a probability distribution for the next character, given the genrated text available so far.
2. Reweight the distribution to a certain temperature.
3. Sample the next character at random according to the reweighted distribution.
4. Add the new character at the end of the available text.

## Listing 8.6: Function to sample the next character given the model's predictions

In [None]:
# code used to reweight the original probability distribution coming out of the model and draw a character index from it.
# (the sampling function)

def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

## Listing 8.7: Text-generation Loop

In [None]:
import random
import sys

for epoch in range(1, 60):     # Trains the model for 60 epochs
    print('epochs', epoch)
    model.fit(x, y, batch_size=128, epochs=1)   # Fits the model for one iteration on the data
    start_index = random.randint(0, len(text) - maxlen - 1)   # Selects a text seed at random 
    generated_text = text[start_index: start_index + maxlen]
    print('---Generating with seed: "' + generated_text +'"')
    
    for temperature in [0.2, 0.5, 1.0, 1.2]:     # Tries a range of different sampling temperatures
        print('------temperature:', temperature)
        sys.stdout.write(generated_text)
        
            for i in range(400):     # Generates 400 characters, starting from the seed text
                sampled = np.zeros((1, maxlen, len(chars)))    # One-hot encodes the characters generated so far.
                for t, char in enumerate(generated_text):
                    sampled[0, t, char_indices[char]] = 1.
                    
                preds = model.predict(sampled, verbose=0)[0]
                next_index = sample(preds, temperature)
                next_char = chars[next_index]
                
                generated_text += next_char
                generated_text = generated_text[1:]
                
                sys.stdout.write(next_char)

## 8.2 DeepDream

An artistic image-modification technique that uses the representations learned by convolutional neural networks. 
Trained on ImageNet, where dog breeds and bird species are vastly over represented.

### 8.2.1 Implementing DeepDream in Keras

## Listing 8.8 Loading the pretrained Inception V3 model

In [None]:
from keras.applications import inception_v3
from keras import backend as K

K.set_learning_phase(0)  # Model isnt trained, so this command disables all training-specific operations.

model = inception_v3.InceptionV3(weights='imagenet',  # Builds the V3 network without its convolution base. The
                                 include_top=False)   # model will be loaded with pretrained ImageNet weights.

## Listing 8.9 Setting Up the DeepDream configuration

In [None]:
layer_contributions = {   # Dictionary mapping layer names to a coefficient quantifying how much the layer's 
    'mixed2': 0.2,        # activation contributes to the loass you 'll seek to maximize. Note that the layer names
    'mixed3': 3.,         # are hardcoded in the builtin V3. All layers list using model.summary()
    'mixed4': 2.,
    'mixed5': 1.5,
}

## Listing 8.10 Listing the loss to be maximized

In [None]:
layer_dict = dict([(layer.name, layer) for layer in model.layers])   # Creates a dict maps layer names 2 layer insts.

loss = K.variable(0.)  # Define loss by adding layer contributions to this scalar variable.
for layer_name in layer_contributions:
    coeff = layer_contributions[layer_name]
    activation = layer_dict[layer_name].output    # Retrieves the layer's output
    
    scaling = K.prod(K.cast(K.Shape(activation), 'float32'))
    loss += coeff * K.sum(K.square(activation[:, 2: -2, 2: -2, :])) / scaling # Adds the L2 norm of the features of
            # a layer to the loss. Border artifacts be avoided by only involving non border pixels in the loss,

## Listing 8.11 Gradient-ascent process

In [None]:
dream = model.input      # This tensor holds the generated image: the dream.

grads = K.gradients(loss, dream)[0]  # Computes the gradients of the dream with regard to the loss.

grads /= K.maximum(K.mean(K.abs(grads)), 1e-7)   # Normalizes the gradients (important trick)

outputs = [loss, grads]                             # Sets up a Keras function to retrieve the value of the loss 
fetch_loss_and_grads = K.function([dream], outputs) # and gradients given an input image

def eval_loss_and_grads(x):
    outs = fetch_loss_and_grads([x])
    loss_value = outs[0]
    grad_values = outs[1]
    return loss_value, grad_values

def gradient_ascent(x, iterations, step, max_loss=None):  # This fn runs a grad asc 4 no(s) of iterations.
    for i in range(iterations):
        loss_value, grad_values = eval_loss_and_grads(x)
        if max_loss is not None and loss_value > max_loss:
            break
        print('...Loss value at', i, ':', loss_value)
        x += step * grad_values
    return x

## Listing 8.12 Running gradient ascent over different successive scales

In [None]:
import numpy as np

step = 0.01          # Gradient ascent step size                        # Playing with these hyperparameters 
num_octave = 3       # No. of scales at which to run gradient ascent.   # will let you achieve new effects. 
octave_scale = 1.4   # Size ratio between scales
iterations = 20      # Number of ascent steps to run at each scale

max_loss = 10.       # If loss > 0, grad ascent process be interrupted to avoud ugly artifacts.

base_image_path = '...'     # Path to the image to be used

img = preprocess_image(base_image_path)    # Loads the base image into a Numpy array(fn defined next listing)

original_shape = img.shape[1:3]
successive_shapes = [original_shape]                # Prepares a list of shape tuples defining the different scales
for i in range(1, num_octave):                      # at which to run gradient ascent.
    shape = tuple([int(dim / (octave_scale ** i))
        for dim in original_shape])
    successive_shapes.append(shape)
    
successive_shapes = successive_shapes[::-1]     # Reverses the list of shape so they 're in increasing order.
    
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0]) # Resizes Numpy array of the img to the smallest scale.

for shape in successive_shapes:
    print('Processing image shape', shape)
    img = resize_img(img, shape)      # Scales up the dream image
    img = gradient_ascent(img,             
                          iterations=iterations,     # Runs gradient 
                          step=step,                 # ascent
                          max_loss=max_loss)         # altering the dream
    upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)   # Scales up smalled pixaled v of org img.
    same_size_original = resize_img(original_img, shape)   # Computes the high quality v of the org img at this size.
    lost_detail = same_size_original - upscaled_shrunk_original_img # The diff btw the two is the detail lost scaling up.
    
    img += lost_detail   # Reinjects lost detail into the dream
    shrunk_original_img = resize_img(original_img, shape)
    save_img(img, fname='dream_at_scale_' + str(shape) + '.png' )
    
save_img(img, fname='final_dream.png')

# Note: this code uses aux Numpy fns such as follows

## Listing 8.13: Auxilliary Functions

In [None]:
import scipy
from keras.preprocessing import image

def resize_img(img, size):
    img = np.copy(img)
    factors = (1,
               float(size[0]) / img.shape[1],
               float(size[1]) / img.shape[2],
               1)
    return scipy.ndimage.zoom(img, factors, order=1)

def save_img(img, fname):
    pil_img = deprocess_image(np.copy(img))
    scipy.misc.imsave(fname, pil_img)

def preprocess_image(image_path):                       # Util fn 2 open, resize and format pictures into tensors
    img = image.load_img(image_path)                    # that Inception V3 can process
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    return img

def deprocess_image(x):                                 # Util fn: convert a tensor into a valid image.
    if K.image_data_format() == 'channels_first':
        x = x.reshape((3, x.shape[2], x.shape[3]))
        x = x.transpose((1, 2, 0))
    else:
        x = x.reshape((x.shape[1], x.shape[2], 3))      # Undoes preprocessing per by inception_v3.preprocess_input
    x /= 2.
    x += 0.5
    x *= 255.
    x = np.clip(x, 0, 255).astype('uint8')
    return x

# Note: Org Inc V3 network trained 2 recognize concepts in imgs of size 299 x 299, and images scaled down by a 
#       reasonable factor, DeepDream impl produces much better results on images btw 300x300 & 400 x 400. however: any size be run

    

    

### 8.2.2 Wrapping up

DeepDream consists of running a convnet in reverse to generate inputs based on the representations learned by the
network.
The results produced are fun and somewhat similar to the visual artifacts induced in humans by the disruption of the
visual cortex via phsychedelics.
Process not specific to image models or even convnets. Doable for speech, music and more.

## 8.3 Neural style transfer

consists of applying the style of a reference image to a target image while conserving the target image content.
style means textures, colors and visual patterns in the image at various spatial scales and the content is the
higher-level macrostructure of the image. 
define a loss fn to specify what to achieve, and minimize this loss.
wat to achieve = conserve the content of the original image while adopting the style of the reference image

loss = distance(style(reference_image) - style(generate_image)) + distance(content(original_image) - content(generated_image))

Here distance is a norm fn such as the L2 norm, content is a fn dat takes an image and computes a rep of its content,
and style is a fn that takes an image and computes a rep of its style. Minimizing this loss causes style(generated_image)
to be close to style(reference_image), and content(generated_image) is close to content(generated_image), 4 style transfer.

Deep Cnns offer a way to mathemeticall define style and content fns.

### 8.3.1 The content loss

In [None]:
Good candidate: L2 norm btw the activations of an upper layer in a pretrained convnet, computed over the target
image, and the activations of the same layer computed over the generated image. 

### 8.3.2 The style loss

### 8.3.3 Neural style Transfers

In [None]:
impl using any pretrained convnet. Here use VGG19. simple variant of VGG16 network with 3 more convolutional layers.
Gen process:-
    1. Set up network computing VGG19 layer activations for the style-reference image, the target image, and
       the genreated image at the same time.
    2. Use the layer activations computed over these three images to define the loss fn for minimization 4 style transfer.
    3. Set up a gradient descent process to minimize this loss fn.

## 8.14 Defining initial variables

In [None]:
from keras.preprocessing.image import load_img, img_to_array

target_image_path = 'img/portrait.jpg'        # Path to the image you want to transform
style_reference_image_path = 'img/transfer_style_reference.jpg'    # Path to the style image

width, height = load_img(target_img_path).size             # Dimensions
img_height = 400                                           # of the 
img_width = int(width * img_height / height)               # generated picture


## Listing 8.15: Auxilliary Functions

In [None]:
# Aux fns needed for loading, pre and postprocessing the images going in and out of the VGG19 convnet.

import numpy as np
from keras.applications import vgg19

def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img

def deprocess_image(x):                    
    x[:, :, 0] += 103.939                  # Zero-centering by removing the mean pixel value  
    x[:, :, 1] += 116.779                  # from ImageNet. This reverses a transformation
    x[:, :, 2] += 123.68                   # done by vgg19.preprocess_input
    x = [:, :, ::-1]      # Converts image from 'BGR' 2 'RGB'. also a part of reversal of vgg19.preprocess_input   
    x = np.clip(x, 0, 255).astype('uint8')
    return x

## Listing 8.16: Loading the pretrained VGG19 network and applying it to the three images

#### Setup VGG19 network= input: a batch of three imgs(style_reference_image, the target image and generated_image placeholder) A placeholder= symbolic tensor values provided externally via Numpy arrays. style refernce and target image are constant hence defined using K.constant whereas vals in gen_image_placeholder change over time.

In [None]:
from keras import backend as K

target_image = K.constant(preprocess_image(target_image_path))
style_reference_image = K.constant(preprocess_image(style_reference_image_path))
combination_image = K.placeholder((1, img_height, img_width, 3))   # Placeholder containing the generated image

input_tensor = K.concatenate([target_image,                        # Combines the three
                              style_reference_image,               # images in a 
                              combination_image], axis =0)         # single batch

model = vgg19.VGG19(input_tensor=input_tensor,                     # Builds the VGG19 network with the batch
                    weights='imagenet',                            # of three images as input. The model 
                    include_top=False)                             # will be loaded with pretrained
print('Model loaded.')                                             # ImageNet weights  

## Listing 8.17: Content loss

In [None]:
def content_loss(base, combination):
    return K.sum(K.square(combination - base))

## Listing 8.18: Style Loss

In [None]:
# uses aux fn 2 compute the Gram matrix of an input matrix: a map of correlations found in org feature matrix.

def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_height * img_width
    return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))

## Listing 8.19: Total Variation loss

In [None]:
# total variation loss: operates on the px of gen combination images, added to above two loss components.
# ensure continuity in the generated image avoiding overly pixelated results.

def total_variation_loss(x):
    a = K.square(
        x[:, :img_height -1, :img_width - 1, :] -
        x[:, 1:, :img_width - 1, :])
    b = K.square(
        x[:, :img_height -1, :img_width - 1, :] -
        x[:, :img_height - 1, 1:, :])
    return K.sum(K.pow(a + b, 1.25))
# loss two minimize is a weighted avg of above 3 losses. content loss needs only one upper layer-the block5_Conv2
# layer , for style loss, list of low and high level layers are used. total variation loss added at end.
# Tune the content_weight coefficient depending on the style_reference image and content image.
# A higher content_weight means the target content will be more recognizable in the generated image.
    

## Listing 8.20 Defining the final loss to be minimized

In [None]:
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])  # Dict maps layer names 2 activation tensors
content_layer = 'block5_conv2'       # Layer used for content loss
style_layers = ['block1_conv1',        # Layers
                'block2_conv1',        # used
                'block3_conv1',        # for
                'block4_conv1',        # style
                'block5_conv1',]       # loss

total_variation_weight = 1e -4       # Weights in the
style_weight = 1.                    # weighted avg of the 
content_weight = 0.025               # loss components

loss = K.variable(0.)     # Define loss by adding all components to this scalar variable.   # Adds
layer_features = outputs_dict[content_layer]                                                # the
target_image_features = layer_features[0, :, :, :]                                          # cont
combination_features = layer_features[2, :, :, :]                                           # ent
loss += content_weight * content_loss(target_image_features, combination_features)          # loss

for layer_names in style_layers:                                                        
    layer_features = outputs_dict[layer_name]             # Adds a style component for each target layer
    style_reference_features = layer_features[1, :, :, :]                                       
    combination_features = layer_features[2, :, :, :]                                           
    s1 = style_loss(style_reference_features, combination_features)
    loss += (style_weight / len(style_layers)) * s1

loss += total_variation_weight * total_variation_loss(combination_image) # Adds the total variation loss




## Listing 8.21: Setting up the gradient-descent process

Set up gradient-descent process. L-BFGS algorithm used here for optimization. The L-BFGS algorithm comes packaged with 
SciPy, however 2 slight limitations.
1. requirement: pass the value of the loss fn and value of the gradients as two spearate functions.
2. Can only be applied to flat vectors, whereas there is a 3D image array.
   Python class named Evaluator computes both the loss value and gradients value at once, returns the loss value
   when called the first time, and caches the gradients for the next call. 

In [None]:
grads = K.gradients(loss, combination_image)[0]   # Gets the gradients of the generated image wrt the loss.

fetch_loss_and_grads = K.function([combination_image], [loss, grads]) # fn(vals)=>values[current_loss,current_gradients]

class Evaluator(object):     # This class wraps fetch_loss_and_grads in a way to render the losses and gradients
                             # via 2 seperate method calls, a reqt of ScipPy optimizer being used.
    
    def __init__(self):
        self.loss_value = None
        self.grads_values = None
        self.grads_values = None
        
    def loss(self, x):
        assert self.loss_value is None
        x = x.reshape((1, img_height, img_width, 3))
        outs = fetch_loss_and_grads([x])
        
        loss.value = outs[0]
        grad_values = outs[1].flatten().astype('float64')
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value
    
    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

    evaluator = Evaluator()
    


## Listing 8.22: Style-transfer loop

In [None]:
# Run gradient-ascent process using SciPy's L-BFGS algorithm, saving the current generated image at each iteration
# of the algorithm(here, a single i represents 20 steps of gradient ascent)

In [None]:
from scipy.optimize import fmin_1_bfgs_b
from scipy.misc import imsave
import time

result_prefix = 'my_result'
iterations = 20

x = preprocess_image(target_image_path)    # This is the initial state: the target image.
x = x.flatten()  # Image flattened coz scipy.optimize.fmin_l_bfgs_b can only process flat vectors.
for i in range(iterations):                                    # Runs L-BFGS optimization
    print('Start of iteration', i)                             # over the pixels of the 
    start_time = time.time()                                   # generated image to minimize the neural
    x, min_val, info = fmin_1_bfgs_b(evaluator.loss,           # style loss. Pass the function that computes 
                                     x,                        # the loss and the function that computes the 
                                     fprime=evaluator.grads,   # gradients as two separate
                                     maxfun=20)                # arguments
    print('Current loss value:', min_val)
    img = x.copy().reshape((img_height, img_width, 3))                   # Saves
    img = deprocess_image(img)                                           # the
    fname = result_prefix + '_at_iteration_%d.png' % i                   # current
    imsave(fname, img)                                                   # gen
    print('Image saved as', fnmae)                                       # rated
    end_time = time.time()                                               # image.
    print('Iteration %d completed in %ds' % (i, end_time - start_time))

## 8.3.4: Wrapping Up

-. Style transfer consists of creating a new image that preserves the contents of a target image while also
   capturing the style of a reference image.
-. Content cal be captured by the high-level activations of a convnet.
-. Stell can be captured by the internal correlations of the activations of different layers of a convnet.
-. Hence, deep learning allows style transfer to be formulated as an optimization process using a loss defined
   with a pretrained convnet.
-. Starting from this basic idea, many variants and refinements are possible.

## 8.4 Generating images with variational autoencoders

#### Sampling from a latent space of images to create entirely new images or edit existing ones is currently the most popular and successful application of creativeAI. 
Two main techniques: variational autoencoders(VAEs) and generative adversarial networks(GANs). 

VAE Working: 
    1. An encoder module turns the input samples input_img into two paramters in a latent space of representations,
       z_mean and z_log_variance.
    2. Sample a random point z from the latent normal distribution that's assumed to generate the input image, via
       z = z_mean + exp(z_log_variance) * epsilon, where epsilon is a random tensor of small values.
    3. A decoder module maps this point in the latent space back to the original input image.

Coz epsilon is random, the process ensures that every point close to the latent location where input_img(z-mean)
can be decoded to sth similar 2 input_img, thus forcing the latent space to be continuously meaningful. 
The parameters of a VAE are trained via two loss functions: a reconstruction loss forcing decoded samples match initial inputs
    2. a regularization loss helping learn well-formed latent spaces and reduce overfitting to the training data.

In [None]:
z_mean, z_log_+variance = encode(input_img)     # Encodes the input into a mean and variance parameter.

z = z_mean + exp(z_log_variance) * epsilon      # Draws a latent point using a small random epsilon

reconstructed_img = decoder(z)                  # Decodes z back to an image

model = Model(input_img, reconstructed_img)     # Instantiates the autoencoder models, maps an input image to its reconstruction.

# then train the model using the reconstruction loss and the regularization loss.

## Listing 8.23: VAE encoder network

In [None]:
#  A smiple convnet mapping the input image x to two vectors, z_mean and z_log_var.

import keras 
from keras import layers
from keras import Backend as K
from keras.models import Model
import numpy as np

img_shape = (28, 28, 1)
batch_size = 16
latent_dim = 2                        # Dimensionality of the latent space: a 2D plane

input_img = keras.Input(shape=img_shape)

x = layers.Conv2D(32, 3, padding='same', activation='relu')(input_img)
x = layers.Conv2D(64, 3, padding='same', activation='relu', strides=(2, 2))(x)
x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(32, activation='relu')(x)

z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)

## Listing 8.24: Latent-space sampling function

In [None]:
def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latest_dim),
                              mean=0., stddev=1.)
    return z_mean + K.exp(z_log_var) * epsilon

z=layers.Lambda(sampling)([z_mean, z_log_var])

## Listing 8.25: VAE decoder network, mapping latent space points to images.

In [None]:
decoder_input = layers.Input(K.int_shape(z)[1:])   # Input where z will be feeded.

x = layers.Dense(np.prod(shape_before_flattening[1:]), activation='relu')(decoder_input)  # Upsamples the input.

x = layers.Reshape(shape_before_flattening[1:])(x)   # Reshapes z into a feature map shape as the feature map just
                                                     # before the last Flatten layer in the encoder model.

    x = layers.Conv2DTranspose(32, 3,                       # Uses a Conv2DTranspose 
                               padding='same',              # layer and Conv2D layer to    
                               activation='relu',           # decode z into 
                               strides=(2, 2))(x)           # a feature map
    x = layers.Conv2D(1, 3,                                 # the same
                      padding='same',                       # size as the 
                      activation='sigmoid')(x)              # originial image input.
    
    decoder = Model(decoder_input, x) # Instantiates the decoder model, turning "decoder_input" into the decoded image.
    
    z_decoded = decoder(z)            # Applies it to z to recover the decoded z 

## Layer 8.26: Custom layer used to compute the VAE loss

In [None]:
# Write a custom layer that internally uses the built-in add_loss layer method to create an arbitrary loss

class CustomVariationalLayer(keras.layers.Layer):
    
    def vae_loss(self, x, z_decoded):
        x = K.flatten(x)
        z_decoded = K.flatten(z_decoded)
        xent_loss = keras.metrics.binary_crossentropy(x, z_decoded)
        kl_loss = -5e-4 * K.mean(
            1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
        return K.mean(xent_loss + kl_loss)
    
    def call(self, inputs):     # Impl custom layers by writing a call method.
        x = inputs[0]
        z_decoded = inputs[1]
        loss = self.vae_loss(x, z_decoded)
        self.add_loss(loss, inputs=inputs)
        return x                # This output not used but the layer must rtrn sth.
    
y = CustomVariationalLayer()([input_img, z_decoded]) # Calls custom layer on the input and the decoded output
                                                     # to obtain the fincal model output.

## Listing 8.27: Training the VAE

instantiate and train the model. loss accounted 4 in da custom layer, external loss not specified at compile time
(loss=None), training data not passed during training(only pass x_train to the model in fit)

In [None]:
from keras.datasets import mnist

vae = Model(input_img, y)
vae.compile(optimizer='rmsprop', loss = None)
vae.summary()

(x_train, _), (x_test, y_test) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_train = x_train.reshape(x_train.shape + (1,))
x_test = x_test.astype('float32') / 255.
x_test = x_test.reshape(x_train.shape + (1,))

vae.fit(x=x_train, y=None,
        shuffle=True,
        epochs=10,
        batch_size=batch_size,
        validation_data=(x_test, None))

# Once such a model is trained-on MNIST, in this case-decoder network can b used 2 turn arbitrary latent space vectors into images.

## Listing 8.28: Sampling a grid of points from the 2D latent sapce and decoding them to images.

In [None]:
import matplotlib.pyplot as plt
from scipy.stats import norm

n = 15                                   # display a grid of 15 x 15 digits(255 digits total).
digit_size = 28                                               # Transforms linearly spaced coordinates
figure = np.zeros((digit_size * n, digit_size * n))           # using the SciPy ppf function to produce
grid_x = norm.ppf(np.linspace(0.05, 0.95, n))                 # values of the latent variable z(coz the prior
grid_y = norm.ppf(np.linspace(0.05, 0.95, n))                 # of the latent space is Gaussian

for i, yi in enumerate(grid_x):
    for j, xi in enumerate(grid_y):
        z_sample = np.array([[xi, yi]])
        z_sample = np.tile(z_sample, batch_size).reshape(batch_size, 2)   # Repeats z multiple times to form a complete batch.
        x_decoded = decoder.predict(z_sample, batch_size=batch_size)      # Decodes the batch into digit images 
        digit = x_decoded[0].reshape(digit_size, digit_size)  # Reshapes da 1st dgt in batch 4rm 28 x 28 x 1 to 28 x 28
        figure[i * digit_size: (i + 1) * digit_size,
               j * digit_size: (j + 1) * digit_size] = digit
        
plt.figure(figsize=(10,10))
plt.imshow(figure, cmap='Greys_r')
plt.show()

## 8.4.4 Wrapping up

Image generation with dl is done by learning latent spaces that capture statistical information about a dataset of
images. By sampling and decoding points from the latent space, never before seen images can be generated.
Two tools to do this: VAEs and GANs

    VAEs result in higly structured, continuous latent representations. For this reason dey work well for doing       all sort of image editing in latent sapce: face swapping, turning a frowning space into a smiling face and so     on. Also work nicely 4 doing latent-space-based animations, such as animating a walk along a cross section of     the latent space, showing a starting image slowly morphing into different images in a continuous way.
    
    GANs enable the generation of realistic single-frame images may not induce latent spaces with solid 
    structures and high continuity.

TIP: Large-scale Celeb Faces Attributs (CelebA) dataset. more than 200000 celebrity portraits dataset.     

## 8.5 Introduction to generative adversarial networks

GAN: a forger network and an expert network, each being trained to best the other. Two parts:
        Generator network- Takes an input into random vector (a random point in the latent space),
                           and decodes it into a synthetic image.
        Discriminator network(adversary)- Takes as input an image (real or synthetic), and predicts 
                                          whether the image came from the training set or was 
                                          created by the generator network.
While generator network trained to fool the discriminator network and later constantly adapting to
the gradually improving capabilities of the former, s higher bar of realism is set for the 
generated images. 
Limitations: Unlike VAEs, this latent space has fewer explicit guarantees of meaningful structure
             ; in particular, it isnt continuous
Getting a GAN to work requires a lot of careful tuning of the model architecture and training params.                                          

## 8.5.1 A schematic GAN implementation

impl deep convolutional GAN(DCGAN); where deepconvnets([generator, discriminators]). 
Conv2DTranspose layer used for image upsampling in the generator.
Train GAN on images from CIFAR10, dataset[50000]: 32 x 32 RGB images, 10 classes(5,000 images/class)
    Luckily only using images belonging to the class "frog."
    
Schematically GAN =>
1. A genreator network mapping vectors of shape (latent_dim) to images of shape (32, 32, 3)
2. A discriminator network maps images of shape (32, 32, 3) to a binary score estimating the probability 
   that the image is real.
3. A gan network chains the genreator and discriminator together: gan(x) =discriminator(genreator(x)).
   maps latent space vectors to the discriminator's assessment of the realism of these latent vectors
   as decoded by the generator.
4. Train discriminator using eg., of real & fake images along with real/fake labels, like reg image-clasification/
5. Gradients of the generator's weights wrt the loss of the gan model, are used to train the generator.
   every step, weights of the generator moved in a direction to make the dicriminator more likely
   to classify as "real" the images decoded by generator. Train the genreator to fool the discriminator.

## 8.5.2: A bag of tricks

Notoriously difficult process of training GANs and tuning GANs impls. Tricks:
    * use tanh as the last activation in the generator, instead of sigmoid, more commonly
      found in other types of model.
    * sample points from the latent space using a normal distribution(Gaussian not uniform)
    * Stochasticity is good to introduce robustness. GAN training results in a dynamic
      equilibrium,GANS most likely to get stuck everywhere. Inroducing randomness during
    * training helps prevent this. randomness in 2 ways:
        by using dropout in the discriminator
        by adding random noise to the labels for the discriminator.
    * Sparse gradients may hinder GAN training. sparsity desirable in DL not GAN. Two things
      induce sparsity, max pooling operatiions and RELU activations. 
        strided convolutions used for downsampling instead of max pooling
        LeakyReLU layer instead of a ReLU activations relaxes sparsity constraints by 
        allowing small negative activation values.
    * In generated images, it's common to see checkerboard artifacts caused by unequal coverage
      of the pixel space in the generator. Problem Fixture: Use a kernel size that's divisible 
      by the stride size whenever a strided Conv2DTranspose or Conv2D are used in both
      the genrator and the discriminator.

## 8.5.3 The generator

develop a generator model turning a vector(from the latent space--during training it will be sampled at random)
into a candidate image. Problem,generator gets stuck with generated images looking like noise. Solution
Use dropout on both the discriminator and the generator

## Listing 8.29 GAN generator network

In [None]:
import keras
from keras import layers
import numpy as np

latent_dim = 32
height = 32
width = 32
channels = 3

generator_input = keras.Input(shape=(latent_dim,))

x = layers.Dense(128 * 16 * 16)(generator_input)                       # Transforms the input
x = layers.LeakyReLU()(x)                                              # into a 16 x 16 128-
x = layers.Reshape((16, 16, 128))(x)                                   # -channel feature map

x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)       # Unsamples 
x = layers.LeakyReLU()(x)                                              # to 32 x 32 

x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

x = layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)               # Produces
generator = keras.models.Model(generator_input, x)       # Instantiates            # a 32 x 32   
           # the generator model, mapping inputs of shape(latent_dim)              # 1-channel feature map 
           # into an image of shape(32, 32, 3)                                     # (shape of a 
generator.summary()                                                                # CIFAR10 image)


