# Part I : Deep Dreams

Unzip the `Bilder.zip` file in the same directory where you run this notebook.

_DeepDream_ is an artistic image-modification technique that uses the representations learned by convolutional neural networks. It was first released by Google in the summer of 2015, as an implementation written using the Caffee deep-learning library (this was several months before the first public release of TensorFlow). It quickly became an internet sensation thanks to the trippy pictures it could generate (see, for example, the Figure below), full of algorithmic pareidolia artifacts, bird feathers, and dog eyes - a byproduct of the fact that the Deep Dream ConvNet was trained on  ImageNet, where dog breeds and bird species are vastly overrepresented. 

<img src='./Bilder/Aurelia-aurita-3-0009.jpg'>


The DeepDream algorithm is almost identical to ConvNet filter-visualization techniques consisting of running a ConvNet in reverse: doing gradient ascent on the input to the ConvNet in order to maximize the activation of a specific filter in an upper layer of the ConvNet. DeepDream uses this same idea, with a few simple differences:

- With DeepDream, we try to maximize the activation of entire layers rather than that of a specific filter, thus mixing together visualizations of large numbers of features at once.

- You start not from blank, slightly noisy input, but rather from an existing image -- thus the resulting effects latch on to preexisting visual patterns, distorting elements of the image in a somewhat artistic fashion.

- The input images are processed at different scales (called _octaves_), which improves the quality of the visualizations.

Let's make some DeepDreams.

## Implementing DeepDream in Keras

We will start from a ConvNet pretrained on ImageNet. In Keras, many such ConvNets are available: VGG16, VGG19, Xception, ResNet50, and so on. You can implement DeepDream with any of them, but your ConvNet of choice will naturally affect your visualizations, because different ConvNet architectures result in different learned features. 
The ConvNet used in the original DeepDream release was an Inception model, and in practice Inception is known to produce nice-looking DeepDreams, so we will use the Inception V3 model that comes with Keras. 

#### Loading the pretrained Inception V3 model

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import inception_v3
from IPython.display import Image, display

# Builds the Inception V3 network without its convolutional base.
# The model will be loaded with pretrained ImageNet weights:
model = inception_v3.InceptionV3(weights='imagenet', include_top=False)

Remarks to __Keras Backend__ : Keras is a model-level library, providing high-level building blocks for developing deep learning models. It does not handle low-level operations such as tensor products, convolutions and so on itself. Instead, it relies on a specialized, well optimized tensor manipulation library to do so, serving as the "backend engine" of Keras. Rather than picking one single tensor library and making the implementation of Keras tied to that library, Keras handles the problem in a modular way, and several different backend engines can be plugged seamlessly into Keras. At this time, Keras has three backend implementations available: the TensorFlow backend, the Theano backend, and the CNTK backend.

#### Loss Function for DeepDreams

Next, we will compute the __loss__: the quantity we will seek to maximize during the gradient-ascent process. For filter visualization, we try to maximize the value of a specific filter in a specific layer. Here, we will simultaneously maximize the activation of all filters in a number of layers. Specifically, we will maximize a weighted sum of the L2 norm of the activations of a set of high-level layers.

The exact set of layers we choose (as well as their contribution to the final loss) has a major influence on the visuals we will be able to produce, so we want to make these parameters easily configurable. Lower layers result in geometric patterns, whereas higher layers result in visuals in which we can recognize some classes from ImagNet (for 
example, birds or dogs). We will start from a somewhat arbitrary configuration involving four layers - but we will definitely want to explore many different configurations later.

#### Setting up the DeepDream Configuration

In [None]:
# These are the names of the layers
# for which we try to maximize activation,
# as well as their weight in the final loss
# we try to maximize.
# You can tweak these setting to obtain new visual effects.
layer_settings = {
    "mixed4": 1.0,
    "mixed5": 1.5,
    "mixed6": 2.0,
    "mixed7": 2.5,
}

Now, let's define a tensor that contains the __loss__ : the weighted sum of the L2 norm of the activations of the layers.

#### Defining the loss to be maximized

In [None]:
# Get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict(
    [
        (layer.name, layer.output)
        for layer in [model.get_layer(name) for name in layer_settings.keys()]
    ]
)

# Set up a model that returns the activation values for every target layer
# (as a dict)
feature_extractor = tf.keras.Model(inputs=model.inputs, outputs=outputs_dict)

def compute_loss(input_image):
    features = feature_extractor(input_image)
    # Initialize the loss
    loss = tf.zeros(shape=())
    for name in features.keys():
        coeff = layer_settings[name]
        activation = features[name]
        # We avoid border artifacts by only involving non-border pixels in the loss.
        scaling = tf.reduce_prod(tf.cast(tf.shape(activation), "float32"))
        loss += coeff * tf.reduce_sum(tf.square(activation[:, 2:-2, 2:-2, :])) / scaling
    return loss

Next, you can set up the gradient-ascent process.

#### Gradient-ascent process

In [None]:
def gradient_ascent_step(img, learning_rate):
    with tf.GradientTape() as tape:
        tape.watch(img)
        loss = compute_loss(img)
    # Compute gradients.
    grads = tape.gradient(loss, img)
    # Normalize gradients.
    grads /= tf.maximum(tf.reduce_mean(tf.abs(grads)), 1e-6)
    img += learning_rate * grads
    return loss, img


def gradient_ascent_loop(img, iterations, learning_rate, max_loss=None):
    for i in range(iterations):
        loss, img = gradient_ascent_step(img, learning_rate)
        if max_loss is not None and loss > max_loss:
            break
        print("... Loss value at step %d: %.2f" % (i, loss))
    return img

Finally, the actual DeepDreams algorithm. First, we define a list of _scales_ (also called _octaves_) 
at which to process the images. Each successive scale is larger than the previous one by a factor of 1.4 (it's 40% larger): you start by processing a small image and then increasingly scale it up. 


<img src='./Bilder/octaves.jpg'>


For each successive scale, from the smallest to the largest, you run gradient ascent to maximize the loss you previously defined, at that scale. After each gradient ascent run, you upscale the resulting image by $40\%$. 

To avoid losing a lot of image detail after each successive scale-up (resulting in increasingly blurry or pixelated images), you can use a simple trick: after each scale-up, you will reinject the lost details back into the image, which is possible because you know what the original image should look like at the larger scale. Given a small image
size $S$ and a larger image size $L$, you can compute the difference between the original
image resized to size $L$ and the original resized to size $S$ — this difference quantifies the
details lost when going from $S$ to $L$.

Let dreams come up over lake Sils in Engadin.

In [None]:
# Fill this with the path to the image you want to use.
base_image_path = './Bilder/sils.jpg'
display(Image(base_image_path))

#### Auxiliary Functions

In [None]:
def preprocess_image(image_path):
    # Util function to open, resize and format pictures
    # into appropriate arrays.
    img = keras.preprocessing.image.load_img(image_path)
    img = keras.preprocessing.image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = inception_v3.preprocess_input(img)
    return img


def deprocess_image(x):
    # Util function to convert a NumPy array into a valid image.
    x = x.reshape((x.shape[1], x.shape[2], 3))
    # Undo inception v3 preprocessing
    x /= 2.0
    x += 0.5
    x *= 255.0
    # Convert to uint8 and clip to the valid range [0, 255]
    x = np.clip(x, 0, 255).astype("uint8")
    return x

### Running gradient ascent over different successive scales

#### Process:

1. Load the original image.
2. Define a number of processing scales (i.e. image shapes),
    from smallest to largest.
3. Resize the original image to the smallest scale.
4. For every scale, starting with the smallest (i.e. current one):
5. Run gradient ascent
6. Upscale image to the next scale
7. Reinject the detail that was lost at upscaling time
8. Stop when we are back to the original size.

To obtain the detail lost during upscaling, we simply
take the original image, shrink it down, upscale it,
and compare the result to the (resized) original image.


In [None]:
# Fill this with the path to the image you want to use.
base_image_path = './Bilder/sils.jpg'


# Playing with these hyperparameters
# will let you achieve new effects.

step = 0.01 # Gradient ascent step size
num_octave = 3 # Number of scales at which to run
               # gradient ascent
octave_scale = 1.4 # Size ratio between scales
iterations = 20 # Number of ascent steps to
                # run at each scale

# If the loss grows larger than 15, you’ll interrupt
# the gradient-ascent process to avoid ugly artifacts.    
max_loss = 15. 

original_img = preprocess_image(base_image_path)
result_prefix = "sils_maria_dream"
original_shape = original_img.shape[1:3]

successive_shapes = [original_shape]
for i in range(1, num_octave):
    shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
    successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]
shrunk_original_img = tf.image.resize(original_img, successive_shapes[0])

img = tf.identity(original_img)  # Make a copy
for i, shape in enumerate(successive_shapes):
    print("Processing octave %d with shape %s" % (i, shape))
    img = tf.image.resize(img, shape)
    img = gradient_ascent_loop(
        img, iterations=iterations, learning_rate=step, max_loss=max_loss
    )
    upscaled_shrunk_original_img = tf.image.resize(shrunk_original_img, shape)
    same_size_original = tf.image.resize(original_img, shape)
    lost_detail = same_size_original - upscaled_shrunk_original_img

    img += lost_detail
    shrunk_original_img = tf.image.resize(original_img, shape)

tf.keras.preprocessing.image.save_img(result_prefix + ".png", deprocess_image(img.numpy()))


The result looks as follows:

<img src=./sils_maria_dream.png>

#### NOTE 
Because the original Inception V3 network was trained to recognize
concepts in images of size 299 × 299, and given that the process involves scaling
the images down by a reasonable factor, the DeepDream implementation
produces much better results on images that are somewhere between 300 ×
300 and 400 × 400. Regardless, you can run the same code on images of any
size and any ratio.

We strongly suggest that you explore what you can do by adjusting which layers you
use in your loss. Layers that are lower in the network contain more-local, less-abstract
representations and lead to dream patterns that look more geometric. Layers that are
higher up lead to more-recognizable visual patterns based on the most common
objects found in ImageNet, such as dog eyes, bird feathers, and so on. You can use
random generation of the parameters in the `layer_contributions` dictionary to
quickly explore many different layer combinations.

### Wrapping up

- DeepDream consists of running a ConvNet in reverse to generate inputs based
on the representations learned by the network.
- The results produced are fun and somewhat similar to the visual artifacts
induced in humans by the disruption of the visual cortex via psychedelics.
- Note that the process isn’t specific to image models or even to convnets. It can
be done for speech, music, and more.

# Part II : Neural style transfer

In addition to DeepDream, another major development in deep-learning-driven
image modification is __neural style transfer__, introduced by Leon Gatys et al. in the summer
of 2015. The neural style transfer algorithm has undergone many refinements
and spawned many variations since its original introduction, and it has made its way
into many smartphone photo apps. For simplicity, this section focuses on the formulation
described in the original paper.


Neural style transfer consists of applying the style of a reference image to a target
image while conserving the content of the target image. The Figure below shows an example:
    
<img src='./Bilder/outlook.jpg'>



In this context, _style_ essentially means textures, colors, and visual patterns in the image, at
various spatial scales; and the _content_ is the higher-level macrostructure of the image.
For instance, blue-and-yellow circular brushstrokes are considered to be the style in figure
8.7 (using _Starry Night_ by Vincent Van Gogh), and the buildings in the Tübingen
photograph are considered to be the content.

The idea of style transfer, which is tightly related to that of texture generation, has
had a long history in the image-processing community prior to the development of
neural style transfer in 2015. But as it turns out, the deep-learning-based implementations
of style transfer offer results unparalleled by what had been previously achieved
with classical computer-vision techniques, and they triggered an amazing renaissance
in creative applications of computer vision.

The key notion behind implementing style transfer is the same idea that’s central
to all deep-learning algorithms: you define a loss function to specify what you want to
achieve, and you minimize this loss. You know what you want to achieve: conserving
the content of the original image while adopting the style of the reference image. If
we were able to mathematically define _content_ and _style_, then an appropriate loss function
to minimize would be the following:

Here, distance is a norm function such as the L2 norm, `content` is a function that
takes an image and computes a representation of its content, and `style` is a function
that takes an image and computes a representation of its style. Minimizing this
loss causes style(generated_image) to be close to style(reference_image), and
content(generated_image) is close to content(generated_image), thus achieving
style transfer as we defined it.

A fundamental observation made by Gatys et al. was that deep convolutional neural
networks offer a way to mathematically define the style and content functions.
Let’s see how.

### The content loss

As you already know, activations from earlier layers in a network contain _local_ information
about the image, whereas activations from higher layers contain increasingly _global_,
abstract information. Formulated in a different way, the activations of the different layers
of a ConvNet provide a decomposition of the contents of an image over different spatial
scales. Therefore, you’d expect the content of an image, which is more global and
_abstract_, to be captured by the representations of the upper layers in a ConvNet.


A good candidate for content loss is thus the L2 norm between the activations of
an upper layer in a pretrained convnet, computed over the target image, and the activations
of the same layer computed over the generated image. This guarantees that, as
seen from the upper layer, the generated image will look similar to the original target
image. Assuming that what the upper layers of a ConvNet see is really the content of
their input images, then this works as a way to preserve image content.

### The style loss

The content loss only uses a single upper layer, but the style loss as defined by Gatys
et al. uses multiple layers of a ConvNet: you try to capture the appearance of the stylereference
image at all spatial scales extracted by the ConvNet, not just a single scale.

For the _style loss_, Gatys et al. use the _Gram matrix_ of a layer’s activations: the inner
product of the feature maps of a given layer. This inner product can be understood as
representing a map of the correlations between the layer’s features. These feature correlations
capture the statistics of the patterns of a particular spatial scale, which empirically
correspond to the appearance of the textures found at this scale.
Hence, the style loss aims to preserve similar internal correlations within the activations
of different layers, across the style-reference image and the generated image. In
turn, this guarantees that the textures found at different spatial scales look similar
across the style-reference image and the generated image.
In short, you can use a pretrained convnet to define a loss that will do the following:

- Preserve content by maintaining similar high-level layer activations between the
target content image and the generated image. The ConvNet should “see” both
the target image and the generated image as containing the same things.

- Preserve style by maintaining similar correlations within activations for both lowlevel
layers and high-level layers. Feature correlations capture textures : the generated
image and the style-reference image should share the same textures at
different spatial scales.

Now, let’s look at a Keras implementation of the original 2015 neural style transfer
algorithm. As you’ll see, it shares many similarities with the DeepDream implementation
developed in the previous section.

### Neural style transfer in Keras

Neural style transfer can be implemented using any pretrained ConvNet. Here, you’ll
use the VGG19 network used by Gatys et al. VGG19 is a simple variant of the VGG16 network
introduced in chapter 5, with three more convolutional layers.
This is the general process:

1. Set up a network that computes VGG19 layer activations for the style-reference
image, the target image, and the generated image at the same time.

2. Use the layer activations computed over these three images to define the loss
function described earlier, which you’ll minimize in order to achieve style
transfer.

3. Set up a gradient-descent process to minimize this loss function.
Let’s start by defining the paths to the style-reference image and the target image. To
make sure that the processed images are a similar size (widely different sizes make
style transfer more difficult), you’ll later resize them all to a shared height of 400 px.

Let’s start by defining the paths to the style-reference image and the target image. To
make sure that the processed images are a similar size (widely different sizes make
style transfer more difficult), you’ll later resize them all to a shared height of 400 px.

#### Defining initial variables

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import vgg19


from tensorflow.keras.preprocessing.image import load_img, img_to_array
# Path to the image you want to transform
base_image_path = './Bilder/sils.jpg'

# Path to the style image
style_reference_image_path = './Bilder/transfer_style_reference.jpg'

# filename of result
result_prefix = "sils_maria_generated"

# Weights of the different loss components
total_variation_weight = 1e-6
style_weight = 1e-6
content_weight = 2.5e-8


# Dimensions of the generated picture.
width, height = keras.preprocessing.image.load_img(base_image_path).size
img_nrows = 400
img_ncols = int(width * img_nrows / height)

Let's take a look at our base (content) image and our style reference image:

In [None]:
from IPython.display import Image, display

display(Image(base_image_path))
display(Image(style_reference_image_path))

You need some auxiliary functions for loading, preprocessing, and postprocessing the
images that go in and out of the VGG19 convnet.

In [None]:
def preprocess_image(image_path):
    # Util function to open, resize and format pictures into appropriate tensors
    img = keras.preprocessing.image.load_img(
        image_path, target_size=(img_nrows, img_ncols)
    )
    img = keras.preprocessing.image.img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return tf.convert_to_tensor(img)


def deprocess_image(x):
    # Util function to convert a tensor into a valid image
    x = x.reshape((img_nrows, img_ncols, 3))
    # Remove zero-center by mean pixel
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    # 'BGR'->'RGB'
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype("uint8")
    return x

Let’s set up the VGG19 network. It takes as input a batch of three images: the stylereference
image, the target image, and a placeholder that will contain the generated
image. A placeholder is a symbolic tensor, the values of which are provided externally
via Numpy arrays. The style-reference and target image are static and thus defined
using K.constant, whereas the values contained in the placeholder of the generated
image will change over time.

#### Loading the pretrained VGG19 network and applying it to the three images

In [None]:
# Build a VGG19 model loaded with pre-trained ImageNet weights
model = vgg19.VGG19(weights="imagenet", include_top=False)

# Get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

# Set up a model that returns the activation values for every layer in
# VGG19 (as a dict).
feature_extractor = keras.Model(inputs=model.inputs, outputs=outputs_dict)
print('Model loaded.')

Let’s define the content loss, which will make sure the top layer of the VGG19 ConvNet
has a similar view of the target image and the generated image.

#### Content loss

In [None]:
# An auxiliary loss function
# designed to maintain the "content" of the
# base image in the generated image


def content_loss(base, combination):
    return tf.reduce_sum(tf.square(combination - base))

Next is the style loss. It uses an auxiliary function to compute the Gram matrix of an
input matrix: a map of the correlations found in the original feature matrix.

#### Style loss

In [None]:
# The gram matrix of an image tensor (feature-wise outer product)


def gram_matrix(x):
    x = tf.transpose(x, (2, 0, 1))
    features = tf.reshape(x, (tf.shape(x)[0], -1))
    gram = tf.matmul(features, tf.transpose(features))
    return gram

# The "style loss" is designed to maintain
# the style of the reference image in the generated image.
# It is based on the gram matrices (which capture style) of
# feature maps from the style reference image
# and from the generated image


def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_nrows * img_ncols
    return tf.reduce_sum(tf.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))

To these two loss components, you add a third: the total variation loss, which operates
on the pixels of the generated combination image. It encourages spatial continuity in
the generated image, thus avoiding overly pixelated results. You can interpret it as a
regularization loss.

#### Total variation loss

In [None]:
# The 3rd loss function, total variation loss,
# designed to keep the generated image locally coherent


def total_variation_loss(x):
    a = tf.square(
        x[:, : img_nrows - 1, : img_ncols - 1, :] - x[:, 1:, : img_ncols - 1, :]
    )
    b = tf.square(
        x[:, : img_nrows - 1, : img_ncols - 1, :] - x[:, : img_nrows - 1, 1:, :]
    )
    return tf.reduce_sum(tf.pow(a + b, 1.25))

The loss that you minimize is a weighted average of these three losses. To compute the
content loss, you use only one upper layer—the `block5_conv2` layer—whereas for the
style loss, you use a list of layers that spans both low-level and high-level layers. You
add the total variation loss at the end.
Depending on the style-reference image and content image you’re using, you’ll
likely want to tune the `content_weight` coefficient (the contribution of the content
loss to the total loss). A higher content_weight means the target content will be more
recognizable in the generated image.

#### Defining the final loss that you’ll minimize

In [None]:
# List of layers to use for the style loss.
style_layer_names = [
    "block1_conv1",
    "block2_conv1",
    "block3_conv1",
    "block4_conv1",
    "block5_conv1",
]
# The layer to use for the content loss.
content_layer_name = "block5_conv2"


def compute_loss(combination_image, base_image, style_reference_image):
    input_tensor = tf.concat(
        [base_image, style_reference_image, combination_image], axis=0
    )
    features = feature_extractor(input_tensor)

    # Initialize the loss
    loss = tf.zeros(shape=())

    # Add content loss
    layer_features = features[content_layer_name]
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    loss = loss + content_weight * content_loss(
        base_image_features, combination_features
    )
    # Add style loss
    for layer_name in style_layer_names:
        layer_features = features[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        sl = style_loss(style_reference_features, combination_features)
        loss += (style_weight / len(style_layer_names)) * sl

    # Add total variation loss
    loss += total_variation_weight * total_variation_loss(combination_image)
    return loss

Finally, you’ll set up the gradient-descent process. In the original Gatys et al. paper,
optimization is performed using the L-BFGS algorithm, so that’s what you’ll use here.
This is a key difference from the DeepDream example in section 8.2. The L-BFGS algorithm
comes packaged with SciPy, but there are two slight limitations with the SciPy
implementation:
- It requires that you pass the value of the loss function and the value of the gradients
as two separate functions.
- It can only be applied to flat vectors, whereas you have a 3D image array.
It would be inefficient to compute the value of the loss function and the value of the
gradients independently, because doing so would lead to a lot of redundant computation
between the two; the process would be almost twice as slow as computing them
jointly. To bypass this, you’ll set up a Python class named Evaluator that computes
both the loss value and the gradients value at once, returns the loss value when called
the first time, and caches the gradients for the next call.

#### Setting up the gradient-descent process

In [None]:
def compute_loss_and_grads(combination_image, base_image, style_reference_image):
    with tf.GradientTape() as tape:
        loss = compute_loss(combination_image, base_image, style_reference_image)
    grads = tape.gradient(loss, combination_image)
    return loss, grads


Finally, you can run the gradient-ascent process using SciPy’s L-BFGS algorithm, saving
the current generated image at each iteration of the algorithm (here, a single iteration
represents 20 steps of gradient ascent).

#### Style-transfer loop

In [None]:
optimizer = keras.optimizers.SGD(
    keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96
    )
)

base_image = preprocess_image(base_image_path)
style_reference_image = preprocess_image(style_reference_image_path)
combination_image = tf.Variable(preprocess_image(base_image_path))

iterations = 4000
for i in range(1, iterations + 1):
    loss, grads = compute_loss_and_grads(
        combination_image, base_image, style_reference_image
    )
    optimizer.apply_gradients([(grads, combination_image)])
    if i % 100 == 0:
        print("Iteration %d: loss=%.2f" % (i, loss))
        img = deprocess_image(combination_image.numpy())
        fname = result_prefix + "_at_iteration_%d.png" % i
        keras.preprocessing.image.save_img(fname, img)

The Figure below shows what you get: 

In [None]:
display(Image(result_prefix + "_at_iteration_2300.png"))

Keep in mind that what this technique achieves is
merely a form of image retexturing, or texture transfer. It works best with stylereference
images that are strongly textured and highly self-similar, and with content
targets that don’t require high levels of detail in order to be recognizable. It typically
can’t achieve fairly abstract feats such as transferring the style of one portrait to
another. The algorithm is closer to classical signal processing than to AI, so don’t
expect it to work like magic!


Additionally, note that running this style-transfer algorithm is slow. But the transformation
operated by the setup is simple enough that it can be learned by a small, fast
feedforward convnet as well—as long as you have appropriate training data available.
Fast style transfer can thus be achieved by first spending a lot of compute cycles to
generate input-output training examples for a fixed style-reference image, using the
method outlined here, and then training a simple convnet to learn this style-specific
transformation. Once that’s done, stylizing a given image is instantaneous: it’s just a
forward pass of this small convnet.

### Wrapping up
- Style transfer consists of creating a new image that preserves the contents of a
target image while also capturing the style of a reference image.
- Content can be captured by the high-level activations of a convnet.
- Style can be captured by the internal correlations of the activations of different
layers of a convnet.
- Hence, deep learning allows style transfer to be formulated as an optimization
process using a loss defined with a pretrained convnet.
- Starting from this basic idea, many variants and refinements are possible.

## Extensions

1. Resize style image before running style transfer.

2. Mix style from multiple images.

3. Give more weight on content image or style image.

# Part III : Generating Images with Variational Autoencoders (VAE)

Sampling from a latent space of images to create entirely new images or edit existing
ones is currently the most popular and successful application of creative AI. In this section
and the next, we’ll review some high-level concepts pertaining to image generation,
alongside implementations details relative to the two main techniques in this
domain: _variational autoencoders_ (VAEs) and _generative adversarial networks_ (GANs). 

The techniques we present here aren’t specific to images—you could develop latent spaces
of sound, music, or even text, using GANs and VAEs—but in practice, the most interesting
results have been obtained with pictures, and that’s what we focus on here.

## Sampling from Latent Spaces of Images

The key idea of image generation is to develop a low-dimensional latent space of representations
(which naturally is a vector space) where any point can be mapped to a
realistic-looking image. The module capable of realizing this mapping, taking as input
a latent point and outputting an image (a grid of pixels), is called a _generator_ (in the
case of GANs) or a _decoder_ (in the case of VAEs). 

Once such a latent space has been developed, you can sample points from it, either deliberately 
or at random, and, by mapping them to image space, generate images that have never been seen before.

<img src='./Bilder/latent_space.jpg'>

GANs and VAEs are two different strategies for learning such latent spaces of image
representations, each with its own characteristics. VAEs are great for learning latent
spaces that are well structured, where specific directions encode a meaningful axis of
variation in the data. GANs generate images that can potentially be highly realistic, but
the latent space they come from may not have as much structure and continuity.

## Concept vectors for image editing

The idea of concept vectors is the following : given a latent space of representations, or an
embedding space, certain directions in the space may encode interesting axes of variation
in the original data. 

In a latent space of images of faces, for instance, there may
be a smile vector $s$, such that if latent point $z$ is the embedded representation of a certain
face, then latent point $z + s$ is the embedded representation of the same face,
smiling. 


Once you’ve identified such a vector, it then becomes possible to edit images
by projecting them into the latent space, moving their representation in a meaningful
way, and then decoding them back to image space. There are concept vectors for
essentially any independent dimension of variation in image space—in the case of
faces, you may discover vectors for adding sunglasses to a face, removing glasses, turning
a male face into a female face, and so on. 


The Figure below is an example of a smile vector,
a concept vector discovered by Tom White from the Victoria University School of
Design in New Zealand, using VAEs trained on a dataset of faces of celebrities (the
CelebA dataset).

<img src='./Bilder/smile_vector.jpg'>


### Variational autoencoders

Variational autoencoders, simultaneously discovered by Kingma and Welling in
December 2013 and Rezende, Mohamed, and Wierstra in January 2014 are a kind
of generative model that’s especially appropriate for the task of image editing via concept
vectors. They’re a modern take on autoencoders — a type of network that aims to
encode an input to a low-dimensional latent space and then decode it back—that
mixes ideas from deep learning with Bayesian inference.


A classical image autoencoder takes an image, maps it to a latent vector space via
an encoder module, and then decodes it back to an output with the same dimensions
as the original image, via a decoder module,  see the figure below: 

<img src='./Bilder/autoencoder.jpg'>

It’s then trained by
using as target data the same images as the input images, meaning the autoencoder
learns to reconstruct the original inputs. By imposing various constraints on the code
(the output of the encoder), you can get the autoencoder to learn more-or-less interesting
latent representations of the data. 

Most commonly, you’ll constrain the code to
be low-dimensional and sparse (mostly zeros), in which case the encoder acts as a way
to compress the input data into fewer bits of information.

In practice, such classical autoencoders don’t lead to particularly useful or nicely
structured latent spaces. They’re not much good at compression, either. For these reasons,
they have largely fallen out of fashion. VAEs, however, augment autoencoders
with a little bit of statistical magic that forces them to learn continuous, highly structured
latent spaces. They have turned out to be a powerful tool for image generation.


A VAE, instead of compressing its input image into a fixed code in the latent space,
turns the image into the parameters of a statistical distribution: a mean and a variance.
Essentially, this means you’re assuming the input image has been generated by a
statistical process, and that the randomness of this process should be taken into
account during encoding and decoding. The VAE then uses the mean and variance
parameters to randomly sample one element of the distribution, and decodes that element
back to the original input, see the Figure below: 


<img src='./Bilder/vae_illustration.jpg'>

A VAE maps an image to two vectors, `z_mean` and `z_log_sigma`, which define
a probability distribution over the latent space, used to sample a latent point to decode.


The stochasticity of this process
improves robustness and forces the latent space to encode meaningful representations
everywhere: every point sampled in the latent space is decoded to a valid output.

In technical terms, here’s how a VAE works:

1. An encoder module turns the input samples input_img into two parameters in
a latent space of representations, `z_mean` and `z_log_variance`

2. You randomly sample a point z from the latent normal distribution that’s
assumed to generate the input image, via
`z = z_mean + exp(z_log_variance) * epsilon`

where epsilon is a random tensor of small values.

3. A decoder module maps this point in the latent space back to the original input
image.



Because `epsilon` is random, the process ensures that every point that’s close to the latent location
where you encoded `input_img` (z-mean) can be decoded to something similar to
`input_img`, thus forcing the latent space to be continuously meaningful. Any two close points
in the latent space will decode to highly similar images. Continuity, combined with the low
dimensionality of the latent space, forces every direction in the latent space to encode a meaningful
axis of variation of the data, making the latent space very structured and thus highly suitable
to manipulation via concept vectors.



The parameters of a VAE are trained via two loss functions: a _reconstruction loss_ that
forces the decoded samples to match the initial inputs, and a _regularization loss_ that
helps learn well-formed latent spaces and reduce overfitting to the training data. Let’s
quickly go over a Keras implementation of a VAE. Schematically, it looks like this:

You can then train the model using the reconstruction loss and the regularization loss.
The following listing shows the encoder network you’ll use, mapping images to the
parameters of a probability distribution over the latent space. It’s a simple convnet
that maps the input image x to two vectors, `z_mean` and `z_log_var`.

#### Latent-space-sampling function

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

class Sampling(layers.Layer):
    """Uses (z_mean, z_log_var) to sample z, the vector encoding a digit."""

    def call(self, inputs):
        z_mean, z_log_var = inputs
        batch = tf.shape(z_mean)[0]
        dim = tf.shape(z_mean)[1]
        epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon


#### VAE encoder network

In [None]:
latent_dim = 2

encoder_inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(32, 3, activation="relu", strides=2, padding="same")(encoder_inputs)
x = layers.Conv2D(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Flatten()(x)
x = layers.Dense(16, activation="relu")(x)
z_mean = layers.Dense(latent_dim, name="z_mean")(x)
z_log_var = layers.Dense(latent_dim, name="z_log_var")(x)
z = Sampling()([z_mean, z_log_var])
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name="encoder")
encoder.summary()

Next is the code for using `z_mean` and `z_log_var`, the parameters of the statistical distribution
assumed to have produced `input_img`, to generate a latent space point z.
Here, you wrap some arbitrary code (built on top of Keras backend primitives) into a
`Lambda` layer. In Keras, everything needs to be a layer, so code that isn’t part of a builtin
layer should be wrapped in a `Lambda` (or in a custom layer).

#### VAE decoder network, mapping latent space points to images

The following listing shows the decoder implementation. You reshape the vector z to
the dimensions of an image and then use a few convolution layers to obtain a final
image output that has the same dimensions as the original `input_img`.

In [None]:
latent_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(7 * 7 * 64, activation="relu")(latent_inputs)
x = layers.Reshape((7, 7, 64))(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", strides=2, padding="same")(x)
x = layers.Conv2DTranspose(32, 3, activation="relu", strides=2, padding="same")(x)
decoder_outputs = layers.Conv2DTranspose(1, 3, activation="sigmoid", padding="same")(x)
decoder = keras.Model(latent_inputs, decoder_outputs, name="decoder")
decoder.summary()

The dual loss of a VAE doesn’t fit the traditional expectation of a sample-wise function
of the form `loss(input, target)`. Thus, you’ll set up the loss by writing a custom
layer that internally uses the built-in `add_loss` layer method to create an arbitrary loss.

#### Custom layer used to compute the VAE loss

In [None]:
class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder

    def train_step(self, data):
        if isinstance(data, tuple):
            data = data[0]
        with tf.GradientTape() as tape:
            z_mean, z_log_var, z = encoder(data)
            reconstruction = decoder(z)
            reconstruction_loss = tf.reduce_mean(
                keras.losses.binary_crossentropy(data, reconstruction)
            )
            reconstruction_loss *= 28 * 28
            kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
            kl_loss = tf.reduce_mean(kl_loss)
            kl_loss *= -0.5
            total_loss = reconstruction_loss + kl_loss
        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        return {
            "loss": total_loss,
            "reconstruction_loss": reconstruction_loss,
            "kl_loss": kl_loss,
        }

Finally, you’re ready to instantiate and train the model. Because the loss is taken care
of in the custom layer, you don’t specify an external loss at compile time (`loss=None`),
which in turn means you won’t pass target data during training (as you can see, you
only pass `x_train` to the model in `fit`).

#### Training the VAE

In [None]:
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
mnist_digits = np.concatenate([x_train, x_test], axis=0)
mnist_digits = np.expand_dims(mnist_digits, -1).astype("float32") / 255

vae = VAE(encoder, decoder)
vae.compile(optimizer=keras.optimizers.Adam())
vae.fit(mnist_digits, epochs=30, batch_size=128)

Once such a model is trained—on MNIST, in this case—you can use the decoder network
to turn arbitrary latent space vectors into images.

#### Sampling a grid of points from the 2D latent space and decoding them to images

In [None]:
import matplotlib.pyplot as plt


def plot_latent(encoder, decoder):
    # display a n*n 2D manifold of digits
    n = 30
    digit_size = 28
    scale = 2.0
    figsize = 15
    figure = np.zeros((digit_size * n, digit_size * n))
    # linearly spaced coordinates corresponding to the 2D plot
    # of digit classes in the latent space
    grid_x = np.linspace(-scale, scale, n)
    grid_y = np.linspace(-scale, scale, n)[::-1]

    for i, yi in enumerate(grid_y):
        for j, xi in enumerate(grid_x):
            z_sample = np.array([[xi, yi]])
            x_decoded = decoder.predict(z_sample)
            digit = x_decoded[0].reshape(digit_size, digit_size)
            figure[
                i * digit_size : (i + 1) * digit_size,
                j * digit_size : (j + 1) * digit_size,
            ] = digit

    plt.figure(figsize=(figsize, figsize))
    start_range = digit_size // 2
    end_range = n * digit_size + start_range + 1
    pixel_range = np.arange(start_range, end_range, digit_size)
    sample_range_x = np.round(grid_x, 1)
    sample_range_y = np.round(grid_y, 1)
    plt.xticks(pixel_range, sample_range_x)
    plt.yticks(pixel_range, sample_range_y)
    plt.xlabel("z[0]")
    plt.ylabel("z[1]")
    plt.imshow(figure, cmap="Greys_r")
    plt.show()


plot_latent(encoder, decoder)

The grid of sampled digits (see the Figure above) shows a completely continuous
distribution of the different digit classes, with one digit morphing
into another as you follow a path through latent space. Specific directions
in this space have a meaning: for example, there’s a direction for
“four-ness,” “one-ness,” and so on.


In the next section, we’ll cover in detail the other major tool for generating
artificial images: generative adversarial networks (GANs).

## Wrapping up 

Wrapping up
- Image generation with deep learning is done by learning latent spaces that capture
statistical information about a dataset of images. By sampling and decoding
points from the latent space, you can generate never-before-seen images.
There are two major tools to do this: VAEs and GANs.

- VAEs result in highly structured, continuous latent representations. For this reason,
they work well for doing all sorts of image editing in latent space: face
swapping, turning a frowning face into a smiling face, and so on. They also work
nicely for doing latent-space-based animations, such as animating a walk along a
cross section of the latent space, showing a starting image slowly morphing into
different images in a continuous way.

- GANs enable the generation of realistic single-frame images but may not induce
latent spaces with solid structure and high continuity.
Most successful practical applications I have seen with images rely on VAEs, but GANs
are extremely popular in the world of academic research—at least, circa 2016–2017.
You’ll find out how they work and how to implement one in the next section.

## Extensions for VAE


To play further with image generation, I suggest working with the [Largescale
Celeb Faces Attributes](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) (CelebA) dataset. It’s a free-to-download image
dataset containing more than 200,000 celebrity portraits. It’s great for experimenting
with concept vectors in particular—it definitely beats MNIST.

# Part IV : Adverserial Networks

Generative adversarial networks (GANs), introduced in 2014 by 
[Goodfellow et al.,](https://arxiv.org/abs/1406.2661) are
an alternative to VAEs for learning latent spaces of images. They enable the generation
of fairly realistic synthetic images by forcing the generated images to be statistically
almost indistinguishable from real ones.


An intuitive way to understand GANs is to imagine a forger trying to create a fake
Picasso painting. At first, the forger is pretty bad at the task. He mixes some of his
fakes with authentic Picassos and shows them all to an art dealer. The art dealer makes
an authenticity assessment for each painting and gives the forger feedback about what
makes a Picasso look like a Picasso. The forger goes back to his studio to prepare some
new fakes. As times goes on, the forger becomes increasingly competent at imitating
the style of Picasso, and the art dealer becomes increasingly expert at spotting fakes.
In the end, they have on their hands some excellent fake Picassos.


That’s what a GAN is: a forger network and an expert network, each being trained
to best the other. As such, a GAN is made of two parts:

- _Generator network_ — Takes as input a random vector (a random point in the
latent space), and decodes it into a synthetic image
- _Discriminator network_ (or adversary) — Takes as input an image (real or synthetic),
and predicts whether the image came from the training set or was created by
the generator network.

The generator network is trained to be able to fool the discriminator network, and
thus it evolves toward generating increasingly realistic images as training goes on: artificial
images that look indistinguishable from real ones, to the extent that it’s impossible
for the discriminator network to tell the two apart (see figure 8.15). Meanwhile,
the discriminator is constantly adapting to the gradually improving capabilities of the
generator, setting a high bar of realism for the generated images. Once training is
over, the generator is capable of turning any point in its input space into a believable
image. Unlike VAEs, this latent space has fewer explicit guarantees of meaningful
structure; in particular, it isn’t continuous.


<img src='./Bilder/gan_illustration.jpg'>


Remarkably, a GAN is a system where the optimization minimum isn’t fixed, unlike in
any other training setup you’ve encountered in this book. Normally, gradient descent
consists of rolling down hills in a static loss landscape. But with a GAN, every step
taken down the hill changes the entire landscape a little. 

It’s a dynamic system where
the optimization process is seeking not a minimum, but an equilibrium between two
forces. For this reason, GANs are notoriously difficult to train—getting a GAN to work
requires lots of careful tuning of the model architecture and training parameters

### A schematic GAN implementation

In this section, we’ll explain how to implement a GAN in Keras, in its barest form—
because GANs are advanced, diving deeply into the technical details would be out of
scope for this chapter. The specific implementation is a _deep convolutional GAN_ (DCGAN):
a GAN where the generator and discriminator are deep convnets. In particular, it uses
a `Conv2DTranspose` layer for image upsampling in the generator.
You’ll train the GAN on images from CIFAR10, a dataset of 50,000 32 × 32 RGB
images belonging to 10 classes (5,000 images per class). To make things easier, you’ll
only use images belonging to the class “frog.”
Schematically, the GAN looks like this:

1. A generator network maps vectors of shape (`latent_dim`) to images of shape
$(32, 32, 3)$.

2. A discriminator network maps images of shape $(32, 32, 3)$ to a binary score
estimating the probability that the image is real.

3. A gan network chains the generator and the discriminator together: 
`gan(x) = discriminator(generator(x))`. Thus this gan network maps latent space vectors
to the discriminator’s assessment of the realism of these latent vectors as
decoded by the generator.


4. You train the discriminator using examples of real and fake images along with
“real”/“fake” labels, just as you train any regular image-classification model.

5. To train the generator, you use the gradients of the generator’s weights with
regard to the loss of the gan model. This means, at every step, you move the
weights of the generator in a direction that makes the discriminator more likely
to classify as “real” the images decoded by the generator. In other words, you
train the generator to fool the discriminator.

### A bag of tricks

The process of training GANs and tuning GAN implementations is notoriously difficult.
There are a number of known tricks you should keep in mind. Like most things
in deep learning, it’s more alchemy than science: these tricks are heuristics, not
theory-backed guidelines. They’re supported by a level of intuitive understanding of
the phenomenon at hand, and they’re known to work well empirically, although not
necessarily in every context.
Here are a few of the tricks used in the implementation of the GAN generator and
discriminator in this section. It isn’t an exhaustive list of GAN-related tips; you’ll find
many more across the GAN literature:

- We use `tanh` as the last activation in the generator, instead of `sigmoid`, which is
more commonly found in other types of models.

- We sample points from the latent space using a _normal distribution_ (Gaussian distribution),
not a uniform distribution.

- Stochasticity is good to induce robustness. Because GAN training results in a
dynamic equilibrium, GANs are likely to get stuck in all sorts of ways. Introducing
randomness during training helps prevent this. We introduce randomness
in two ways: by using dropout in the discriminator and by adding random noise
to the labels for the discriminator.

- Sparse gradients can hinder GAN training. In deep learning, sparsity is often a
desirable property, but not in GANs. Two things can induce gradient sparsity:
max pooling operations and `ReLU` activations. Instead of max pooling, we recommend
using strided convolutions for downsampling, and we recommend
using a `LeakyReLU` layer instead of a ReLU activation. It’s similar to `ReLU`, but it
relaxes sparsity constraints by allowing small negative activation values.

- In generated images, it’s common to see checkerboard artifacts caused by
unequal coverage of the pixel space in the generator (see figure 8.17). To fix
this, we use a kernel size that’s divisible by the stride size whenever we use a
strided `Conv2DTranpose` or `Conv2D` in both the generator and the discriminator.

## The generator

First, let’s develop a `generator` model that turns a vector (from the latent space—
during training it will be sampled at random) into a candidate image. One of the
many issues that commonly arise with GANs is that the generator gets stuck with generated
images that look like noise. A possible solution is to use dropout on both the discriminator
and the generator.

### GAN generator network

In [None]:
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
latent_dim = 32
height = 32
width = 32
channels = 3

generator_input = keras.Input(shape=(latent_dim,))
# Transforms the input into a 16 × 16 128-channel feature map
x = layers.Dense(128 * 16 * 16)(generator_input)
x = layers.LeakyReLU()(x)
x = layers.Reshape((16, 16, 128))(x)
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)
# Upsamples to 32 × 32
x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

# Produces a 32 × 32 1-channel feature map (shape of a CIFAR10 image)
x = layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)
# Instantiates the generator model, which maps the input
# of shape (latent_dim,) into an image of shape (32, 32, 3)
generator = keras.models.Model(generator_input, x)
generator.summary()

### The discriminator

Next, you’ll develop a discriminator model that takes as input a candidate image
(real or synthetic) and classifies it into one of two classes: “generated image” or “real
image that comes from the training set.”

#### The GAN discriminator network

In [None]:
discriminator_input = layers.Input(shape=(height, width, channels))
x = layers.Conv2D(128, 3)(discriminator_input)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Flatten()(x)
# One dropout layer: an important trick!
x = layers.Dropout(0.4)(x)
# Classification layer
x = layers.Dense(1, activation='sigmoid')(x)
# Instantiates the discriminator model, which turns
# a (32, 32, 3) input into a binary classification
# decision (fake/real)
discriminator = tensorflow.keras.models.Model(discriminator_input, x)
discriminator.summary()
discriminator_optimizer = tensorflow.keras.optimizers.RMSprop(
lr=0.0008,
    # Uses gradient clipping (by value) in the optimizer
clipvalue=1.0,
    # To stabilize training, uses learning-rate decay
decay=1e-8)
discriminator.compile(optimizer=discriminator_optimizer,
loss='binary_crossentropy')

## The adversarial network

Finally, you’ll set up the GAN, which chains the generator and the discriminator.
When trained, this model will move the generator in a direction that improves its ability
to fool the discriminator. This model turns latent-space points into a classification
decision—“fake” or “real”—and it’s meant to be trained with labels that are always
“these are real images.” So, training gan will update the weights of `generator` in a way
that makes discriminator more likely to predict “real” when looking at fake images.
It’s very important to note that you set the `discriminator` to be frozen during training
(non-trainable): its weights won’t be updated when training gan. If the discriminator
weights could be updated during this process, then you’d be training the discriminator
to always predict “real,” which isn’t what you want!

#### Adversarial network

In [None]:
discriminator.trainable = False
gan_input = tensorflow.keras.Input(shape=(latent_dim,))
gan_output = discriminator(generator(gan_input))
gan = tensorflow.keras.models.Model(gan_input, gan_output)
gan_optimizer = tensorflow.keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)
gan.compile(optimizer=gan_optimizer, loss='binary_crossentropy')

## How to train your DCGAN

Now you can begin training. To recapitulate, this is what the training loop looks like
schematically. For each epoch, you do the following:
1. Draw random points in the latent space (random noise).

2. Generate images with `generator` using this random noise

3. Mix the generated images with real ones

4. Train `discriminator` using these mixed images, with corresponding targets:
either “real” (for the real images) or “fake” (for the generated images)

5. Draw new random points in the latent space

6. Train gan using these random vectors, with targets that all say “these are real
images.” This updates the weights of the generator (only, because the discriminator
is frozen inside gan) to move them toward getting the discriminator to
predict “these are real images” for generated images: this trains the generator
to fool the discriminator.
Let’s implement it.

#### Implementing GAN training

In [None]:
import os
from tensorflow.keras.preprocessing import image
# Loads CIFAR10 data
(x_train, y_train), (_, _) = keras.datasets.cifar10.load_data()

# Selects frog images (class 6)
x_train = x_train[y_train.flatten() == 6]

# Normalizes data
x_train = x_train.reshape((x_train.shape[0],) + (height, width, channels)).astype('float32') / 255.
iterations = 10000
batch_size = 20

# Specifies where you want to save generated images
save_dir = './data/'
start = 0
for step in range(iterations):
    # Samples random points in the latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))
    # Decodes them to fake images
    generated_images = generator.predict(random_latent_vectors)
    # Combines them with real images
    stop = start + batch_size
    real_images = x_train[start: stop]
    combined_images = np.concatenate([generated_images, real_images])
    # Assembles labels, discriminating real from fake images
    labels = np.concatenate([np.ones((batch_size, 1)), np.zeros((batch_size, 1))])
    # Adds random noise to the labels—an important trick!
    labels += 0.05 * np.random.random(labels.shape)
    d_loss = discriminator.train_on_batch(combined_images, labels)
    # Samples random points in the latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))
    # Assembles labels that say “these are all real images” (it’s a lie!)
    misleading_targets = np.zeros((batch_size, 1))
    # Trains the generator (via the gan model, where the discriminator weights are frozen)
    a_loss = gan.train_on_batch(random_latent_vectors, misleading_targets)
    
    start += batch_size
    if start > len(x_train) - batch_size:
        start = 0
    
    # Occasionally saves and plots (every 100 steps)
    if step % 100 == 0:
        # Saves model weights
        gan.save_weights('gan.h5')
        # Prints metrics
        print('discriminator loss:', d_loss)
        print('adversarial loss:', a_loss)
        # Saves one generated image
        img = image.array_to_img(generated_images[0] * 255., scale=False)
        img.save(os.path.join(save_dir, 'generated_frog' + str(step) + '.png'))
        # Saves one real image for comparison
        img = image.array_to_img(real_images[0] * 255., scale=False)
        img.save(os.path.join(save_dir, 'real_frog' + str(step) + '.png'))
        
        
        

When training, you may see the adversarial loss begin to increase considerably, while
the discriminative loss tends to zero—the discriminator may end up dominating the
generator. If that’s the case, try reducing the discriminator learning rate, and increase
the dropout rate of the discriminator.

## Wrapping up
- A GAN consists of a generator network coupled with a discriminator network.
The discriminator is trained to differenciate between the output of the generator
and real images from a training dataset, and the generator is trained to fool the
discriminator. Remarkably, the generator never sees images from the training
set directly; the information it has about the data comes from the discriminator.

- GANs are difficult to train, because training a GAN is a dynamic process rather
than a simple gradient descent process with a fixed loss landscape. Getting a
GAN to train correctly requires using a number of heuristic tricks, as well as
extensive tuning.

- GANs can potentially produce highly realistic images. But unlike VAEs, the
latent space they learn doesn’t have a neat continuous structure and thus may
not be suited for certain practical applications, such as image editing via latentspace
concept vectors.