<a href="https://colab.research.google.com/github/dev02chandan/ATML/blob/main/WGAN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Aim:** To implement Wasserstein Generative Adversarial Network (WGAN).

WGAN algorithm as proposed in the paper

## **Task 1:**

Read the article given in the link: https://lilianweng.github.io/posts/2017-08-20-gan/

Answer the following questions:

1. What are the different problems while training GANs?

2. What are the proposed suggestions to stabilize the training in GANs?

3. Define Wasserstein distance. Why Wasserstein is better than JS or KL divergence? Explain with the help of an example.

4. Explain the term Lipschitz continuity.

5. Compared to the original GAN algorithm, state the changes made for WGAN implementation.

## **Task 2:**
Go to the blog, https://machinelearningmastery.com/how-to-code-a-wasserstein-generative-adversarial-network-wgan-from-scratch/

Go through the implementation of the WGAN.

Implement the WGAN network.

## **Task 1**

**Q. What are the different problems while training GANs?**

1. **Hard to achieve Nash equilibrium.**

    In GANs, the Nash equilibrium refers to the point where the Generator is producing perfect fake data that the Discriminator can't distinguish from real data. At this point, both the Generator and Discriminator can't improve their performance any further by adjusting their strategies. In simple words, it's like a stalemate in a game where neither player has a move that can give them a clear advantage.

    Achieving the perfect balance between the discriminator and the generator is difficult to achieve in practical conditions.

2. **Vanishing Gradients**

    When the discriminator is perfect, D(x) = 1 and D(z) = 0. (z = fake samples, x = real samples). When this happens the loss function falls to zero and there is no gradient to update the loss during learning iterations.
    As a result, training a GAN faces an dilemma:

    • If the discriminator behaves badly, the generator does not have accurate feedback and the
    loss function cannot represent the reality.

    • If the discriminator does a great job, the gradient of the loss function drops down to close to
    zero and the learning becomes super slow or even jammed.

    This dilemma clearly is capable to make the GAN training very tough.


3. **Mode Collapse**

    During the training, the generator may collapse to a setting where it always produces same outputs.
    This is a common failure case for GANs, commonly referred to as Mode Collapse. Even though
    the generator might be able to trick the corresponding discriminator, it fails to learn to represent the
    complex real-world data distribution and gets stuck in a small space with extremely low variety.



**Q. What are the proposed suggestions to stabilize the training in GANs?**

1. **Feature Matching**

    The features of the Generator's output is matched to samples from the real data, and a new loss function is defined.

2. **Minibatch Discrimination**

    With minibatch discrimination, the discriminator is able to digest the relationship between training data points in one batch, instead of processing each point independently.

3. **Virtual Batch Normalisation**

    VBN uses a reference batch to normalise every batch used for training. The mean and standard deviation of the reference batch and mean and sd of the respective batch are added together. Since the standard deviation of every batch can vary allot from batch to batch, a reference batch is used to normalise that effect to avoid instablity in training.

    



3. **Define Wasserstein distance. Why Wasserstein is better than JS or KL divergence? Explain with the help of an example.**

    Wasserstein Distance is a measure of the distance between two probability distributions. It is also called Earth Mover’s distance, short for EM distance, because informally it can be interpreted as the minimum energy cost of moving and transforming a pile of dirt in the shape of one probability distribution to the shape of the other distribution. The cost is quantified by: the amount of dirt moved x the moving distance.

    Even when two distributions are located in lower dimensional manifolds without overlaps, Wasserstein distance can still provide a meaningful and smooth representation of the distance in-between.

    The Jensen-Shannon (JS) divergence and Kullback-Leibler (KL) divergence are also measures of the difference between two probability distributions. However, both can have certain *limitations*:

    *Vanishing Gradients*: In areas where the distributions do not overlap, both JS and KL divergences can result in gradients that are not informative. Essentially, they might not provide any useful signal to adjust the parameters to reduce the divergence.

    *Discontinuity*: JS and KL can be discontinuous in some scenarios. When two distributions have no overlap, the JS divergence is logarithmic in the difference, and KL divergence is infinite.

    **Example**: Imagine two separate 1D distributions (like two narrow peaks) on a line. One peak is at position 0, and the other is at position 10. Let's say these peaks represent two versions of a generated image, with the true image distribution somewhere in between at position 5.

    For JS and KL divergences, as long as the two peaks are separate (no overlap), the divergence will provide the same value (maximum value for JS and infinite for KL). It won't tell you if the generated distribution is getting closer to the true image distribution. There's no gradient indicating a direction of improvement.

    Wasserstein distance, on the other hand, will give you a measure proportional to the actual distance between the peaks. If the generated image distribution moves from position 10 to position 6, the Wasserstein distance will reflect that change, indicating an improvement. This gives a more informative gradient for learning.



**Q. Explain the term Lipschitz continuity.**

1. Lipschitz Continuity in Simple Words:

    Imagine you have a stretchy rubber band. If you can stretch this band by a factor of \( k \) (or less) to make it lie completely on top of a function's curve without any part of the band hanging off the curve, then the function is said to be \( k \)-Lipschitz continuous.

    More technically, a function is Lipschitz continuous if there exists some constant \( k \) such that for every pair of points, the difference in the function's values at those points is at most \( k \) times the distance between those points.

2. Why is Lipschitz Continuity Important for WGAN?

    In the WGAN (Wasserstein Generative Adversarial Network) framework, the critic (a variant of the discriminator in standard GANs) is supposed to approximate the Wasserstein distance between the real and generated distributions. To ensure that this approximation is valid and meaningful, the critic's function needs to be 1-Lipschitz continuous. This constraint guarantees that the gradients provided by the critic during training are bounded and meaningful, which in turn stabilizes the training process of the GAN.

    Without this constraint, the critic could become extremely powerful, leading to vanishing gradients for the generator and destabilizing the training.

3. How is Lipschitz Continuity Enforced in WGAN?

    There are multiple ways to try to enforce the Lipschitz constraint on the critic:

    1. **Weight Clipping**: This is the method originally proposed in the WGAN paper. It simply involves clipping the weights of the critic to a small fixed range, say [-0.01, 0.01]. However, this method can be problematic, as it might lead to non-optimal solutions and can still allow for functions that aren't Lipschitz continuous.

    2. **Gradient Penalty**: Introduced in a later paper (WGAN-GP), this method adds a penalty to the critic's loss if the norm of its gradient deviates from 1. This approach tends to produce better results than weight clipping and is more commonly used in modern implementations.

4. In Simple Words:

    Imagine you're trying to train a dog (the generator) to fetch by using a whistle (the critic). In a regular GAN, the whistle can make any sound, which might confuse the dog if it's too erratic. In WGAN, we ensure the whistle's sound is consistent (Lipschitz continuity) so the dog gets clearer instructions. This makes the training process smoother and more stable.



**Q. Compared to the original GAN algorithm, state the changes made for WGAN implementation.**


| Feature/Aspect           | Original GAN                                             | WGAN                                               |
|--------------------------|----------------------------------------------------------|----------------------------------------------------|
| **Objective**            | Discriminator classifies samples as real or generated.   | Critic scores samples without explicit classification. |
| **Loss Function**        | Binary Cross-Entropy                                     | Derived from Wasserstein Distance                  |
| **Output Range**         | [0, 1] (Probability)                                      | (-∞, ∞) (Real number)                             |
| **Equilibrium**          | Discriminator outputs 0.5 for all inputs.               | Generated samples have same score as real samples. |
| **Training Stability**   | Prone to issues like mode collapse, vanishing gradients. | Generally more stable.                             |
| **Function Constraint**  | None                                                     | 1-Lipschitz Continuity                             |
| **Methods for Constraint**| Not applicable                                           | Weight Clipping or Gradient Penalty                |
| **Performance Metric**   | Hard to infer from discriminator's output.               | Critic's output can be a rough training estimate.  |



## **Task 2** - Implement WGAN

In [1]:
# example of a wgan for generating handwritten digits
from numpy import expand_dims
from numpy import mean
from numpy import ones
from numpy.random import randn
from numpy.random import randint
from keras.datasets.mnist import load_data
from keras import backend
from keras.optimizers import RMSprop
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import Flatten
from keras.layers import Conv2D
from keras.layers import Conv2DTranspose
from keras.layers import LeakyReLU
from keras.layers import BatchNormalization
from keras.initializers import RandomNormal
from keras.constraints import Constraint
from matplotlib import pyplot

In [2]:
# clip model weights to a given hypercube
class ClipConstraint(Constraint):
    # set clip value when initialized
    def __init__(self, clip_value):
        self.clip_value = clip_value

    # clip model weights to hypercube
    def __call__(self, weights):
        return backend.clip(weights, -self.clip_value, self.clip_value)

    # get the config
    def get_config(self):
        return {'clip_value': self.clip_value}

In [4]:
# calculate wasserstein loss
def wasserstein_loss(y_true, y_pred):
    return backend.mean(y_true * y_pred)

In [21]:
# define the standalone critic model
def define_critic(in_shape=(28,28,1)):
    # weight initialization
    init = RandomNormal(stddev=0.02)
    # weight constraint
    const = ClipConstraint(0.01)
    # define model
    model = Sequential()
    # downsample to 14x14
    model.add(Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init, kernel_constraint=const, input_shape=in_shape))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    # downsample to 7x7
    model.add(Conv2D(64, (4,4), strides=(2,2), padding='same', kernel_initializer=init, kernel_constraint=const))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    # scoring, linear activation
    model.add(Flatten())
    model.add(Dense(1))
    # compile model
    opt = RMSprop(learning_rate=0.00005)
    model.compile(loss=wasserstein_loss, optimizer=opt)
    return model

In [22]:
# define the standalone generator model
def define_generator(latent_dim):
    # weight initialization
    init = RandomNormal(stddev=0.02)
    # define model
    model = Sequential()
    # foundation for 7x7 image
    n_nodes = 128 * 7 * 7
    model.add(Dense(n_nodes, kernel_initializer=init, input_dim=latent_dim))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Reshape((7, 7, 128)))
    # upsample to 14x14
    model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    # upsample to 28x28
    model.add(Conv2DTranspose(128, (4,4), strides=(2,2), padding='same', kernel_initializer=init))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    # output 28x28x1
    model.add(Conv2D(1, (7,7), activation='tanh', padding='same', kernel_initializer=init))
    return model

In [26]:
# define the combined generator and critic model, for updating the generator
def define_gan(generator, critic):
    # make weights in the critic not trainable
    for layer in critic.layers:
        if not isinstance(layer, BatchNormalization):
            layer.trainable = False
    # connect them
    model = Sequential()
    # add generator
    model.add(generator)
    # add the critic
    model.add(critic)
    # compile model
    opt = RMSprop(learning_rate=0.00005)
    model.compile(loss=wasserstein_loss, optimizer=opt)
    return model

In [10]:
# load images
def load_real_samples():
    # load dataset
    (trainX, trainy), (_, _) = load_data()
    # select all of the examples for a given class
    selected_ix = trainy == 7
    X = trainX[selected_ix]
    # expand to 3d, e.g. add channels
    X = expand_dims(X, axis=-1)
    # convert from ints to floats
    X = X.astype('float32')
    # scale from [0,255] to [-1,1]
    X = (X - 127.5) / 127.5
    return X

In [11]:
# select real samples
def generate_real_samples(dataset, n_samples):
    # choose random instances
    ix = randint(0, dataset.shape[0], n_samples)
    # select images
    X = dataset[ix]
    # generate class labels, -1 for 'real'
    y = -ones((n_samples, 1))
    return X, y

In [12]:
# generate points in latent space as input for the generator
def generate_latent_points(latent_dim, n_samples):
    # generate points in the latent space
    x_input = randn(latent_dim * n_samples)
    # reshape into a batch of inputs for the network
    x_input = x_input.reshape(n_samples, latent_dim)
    return x_input

In [14]:
# use the generator to generate n fake examples, with class labels
def generate_fake_samples(generator, latent_dim, n_samples):
    # generate points in latent space
    x_input = generate_latent_points(latent_dim, n_samples)
    # predict outputs
    X = generator.predict(x_input)
    # create class labels with 1.0 for 'fake'
    y = ones((n_samples, 1))
    return X, y

In [32]:
# generate samples and save as a plot and save the model
def summarize_performance(step, g_model, latent_dim, n_samples=100):
    # prepare fake examples
    X, _ = generate_fake_samples(g_model, latent_dim, n_samples)
    # scale from [-1,1] to [0,1]
    X = (X + 1) / 2.0
    # plot images
    for i in range(10 * 10):
        # define subplot
        pyplot.subplot(10, 10, 1 + i)
        # turn off axis
        pyplot.axis('off')
        # plot raw pixel data
        pyplot.imshow(X[i, :, :, 0], cmap='gray_r')
        # save plot to file
        filename1 = 'generated_plot_%04d.png' % (step+1)
        pyplot.savefig(filename1)
        pyplot.close()
    # save the generator model
    filename2 = 'model_%04d.h5' % (step+1)
    g_model.save(filename2)
    print('>Saved: %s and %s' % (filename1, filename2))

In [16]:
# create a line plot of loss for the gan and save to file
def plot_history(d1_hist, d2_hist, g_hist):
    # plot history
    pyplot.plot(d1_hist, label='crit_real')
    pyplot.plot(d2_hist, label='crit_fake')
    pyplot.plot(g_hist, label='gen')
    pyplot.legend()
    pyplot.savefig('plot_line_plot_loss.png')
    pyplot.close()

In [33]:
# train the generator and critic
def train(g_model, c_model, gan_model, dataset, latent_dim, n_epochs=10, n_batch=64, n_critic=5):
    # calculate the number of batches per training epoch
    bat_per_epo = int(dataset.shape[0] / n_batch)

    # calculate the number of training iterations
    n_steps = bat_per_epo * n_epochs

    # calculate the size of half a batch of samples
    half_batch = int(n_batch / 2)

    # lists for keeping track of loss
    c1_hist, c2_hist, g_hist = list(), list(), list()

    # manually enumerate epochs
    for i in range(n_steps):
        # update the critic more than the generator
        c1_tmp, c2_tmp = list(), list()
        for _ in range(n_critic):
            # get randomly selected 'real' samples
            X_real, y_real = generate_real_samples(dataset, half_batch)
            # update critic model weights
            c_loss1 = c_model.train_on_batch(X_real, y_real)
            c1_tmp.append(c_loss1)
            # generate 'fake' examples
            X_fake, y_fake = generate_fake_samples(g_model, latent_dim, half_batch)
            # update critic model weights
            c_loss2 = c_model.train_on_batch(X_fake, y_fake)
            c2_tmp.append(c_loss2)
        # store critic loss
        c1_hist.append(mean(c1_tmp))
        c2_hist.append(mean(c2_tmp))
        # prepare points in latent space as input for the generator
        X_gan = generate_latent_points(latent_dim, n_batch)
        # create inverted labels for the fake samples
        y_gan = -ones((n_batch, 1))
        # update the generator via the critic's error
        g_loss = gan_model.train_on_batch(X_gan, y_gan)
        g_hist.append(g_loss)
        # summarize loss on this batch
        print('>%d, c1=%.3f, c2=%.3f g=%.3f' % (i+1, c1_hist[-1], c2_hist[-1], g_loss))
        # evaluate the model performance every 'epoch'
        if (i+1) % bat_per_epo == 0:
            summarize_performance(i, g_model, latent_dim)
    # line plots of loss
    plot_history(c1_hist, c2_hist, g_hist)

In [19]:
latent_dim = 50

In [23]:
critic = define_critic()



In [24]:
generator = define_generator(latent_dim)

In [27]:
gan_model = define_gan(generator, critic)

In [29]:
# Mnist
dataset = load_real_samples()

In [30]:
dataset.shape

(6265, 28, 28, 1)

In [34]:
train(generator, critic, gan_model, dataset, latent_dim)

>1, c1=-77.015, c2=-65.838 g=-76.197
>2, c1=-76.673, c2=-67.004 g=-76.988
>3, c1=-77.548, c2=-67.266 g=-77.607
>4, c1=-78.911, c2=-68.540 g=-78.402
>5, c1=-80.358, c2=-68.937 g=-80.110
>6, c1=-79.919, c2=-69.851 g=-80.656
>7, c1=-80.862, c2=-70.660 g=-80.910
>8, c1=-80.956, c2=-71.601 g=-81.946
>9, c1=-82.834, c2=-72.096 g=-82.913
>10, c1=-83.245, c2=-73.146 g=-83.936
>11, c1=-84.405, c2=-73.689 g=-84.177
>12, c1=-84.687, c2=-74.719 g=-85.412
>13, c1=-85.262, c2=-75.134 g=-86.470
>14, c1=-87.304, c2=-76.187 g=-87.318
>15, c1=-86.215, c2=-76.817 g=-88.252
>16, c1=-87.601, c2=-77.668 g=-89.327
>17, c1=-88.419, c2=-78.295 g=-89.724
>18, c1=-88.845, c2=-79.272 g=-89.876
>19, c1=-89.848, c2=-80.056 g=-91.070
>20, c1=-90.086, c2=-80.905 g=-92.248
>21, c1=-91.274, c2=-81.266 g=-93.239
>22, c1=-91.791, c2=-82.331 g=-93.429
>23, c1=-91.856, c2=-83.028 g=-94.313
>24, c1=-93.240, c2=-83.874 g=-94.999
>25, c1=-93.159, c2=-84.580 g=-95.815
>26, c1=-95.138, c2=-85.321 g=-96.824
>27, c1=-95.521, c2=-

  saving_api.save_model(


>Saved: generated_plot_0097.png and model_0097.h5
>98, c1=-151.606, c2=-143.407 g=-159.733
>99, c1=-152.695, c2=-144.524 g=-160.899
>100, c1=-152.090, c2=-145.229 g=-160.957
>101, c1=-154.583, c2=-146.407 g=-162.110
>102, c1=-155.024, c2=-146.899 g=-163.448
>103, c1=-155.671, c2=-147.871 g=-164.267
>104, c1=-156.343, c2=-148.670 g=-165.201
>105, c1=-157.901, c2=-149.382 g=-166.307
>106, c1=-158.472, c2=-149.941 g=-166.851
>107, c1=-159.038, c2=-151.224 g=-167.504
>108, c1=-160.148, c2=-152.382 g=-168.881
>109, c1=-161.816, c2=-152.842 g=-168.880
>110, c1=-161.358, c2=-154.203 g=-170.186
>111, c1=-162.101, c2=-154.352 g=-171.657
>112, c1=-164.112, c2=-155.285 g=-172.194
>113, c1=-164.062, c2=-156.555 g=-173.345
>114, c1=-166.156, c2=-157.431 g=-174.295
>115, c1=-165.432, c2=-157.995 g=-175.205
>116, c1=-166.579, c2=-158.942 g=-174.875
>117, c1=-167.685, c2=-160.291 g=-176.478
>118, c1=-167.599, c2=-160.827 g=-177.377
>119, c1=-170.137, c2=-161.613 g=-178.630
>120, c1=-170.353, c2=-162.2



>Saved: generated_plot_0194.png and model_0194.h5
>195, c1=-239.471, c2=-230.421 g=-249.729
>196, c1=-240.396, c2=-231.488 g=-250.925
>197, c1=-241.122, c2=-231.515 g=-252.068
>198, c1=-241.815, c2=-231.730 g=-251.589
>199, c1=-241.741, c2=-234.494 g=-253.707
>200, c1=-242.753, c2=-234.385 g=-253.711
>201, c1=-244.773, c2=-236.754 g=-255.172
>202, c1=-246.453, c2=-235.243 g=-256.479
>203, c1=-248.380, c2=-237.319 g=-257.736
>204, c1=-247.247, c2=-238.386 g=-258.455
>205, c1=-247.082, c2=-236.684 g=-259.187
>206, c1=-248.969, c2=-239.874 g=-260.501
>207, c1=-248.722, c2=-240.133 g=-260.769
>208, c1=-251.484, c2=-242.359 g=-262.370
>209, c1=-252.780, c2=-241.710 g=-262.752
>210, c1=-251.642, c2=-243.904 g=-264.319
>211, c1=-252.021, c2=-244.451 g=-265.082
>212, c1=-253.975, c2=-242.861 g=-265.134
>213, c1=-255.029, c2=-246.814 g=-267.011
>214, c1=-255.958, c2=-245.703 g=-267.388
>215, c1=-258.003, c2=-248.523 g=-268.927
>216, c1=-258.334, c2=-248.865 g=-269.147
>217, c1=-260.648, c2=-251



>Saved: generated_plot_0291.png and model_0291.h5
>292, c1=-310.985, c2=-302.238 g=-336.277
>293, c1=-315.546, c2=-299.168 g=-337.478
>294, c1=-313.623, c2=-295.092 g=-335.530
>295, c1=-310.511, c2=-296.916 g=-337.587
>296, c1=-317.077, c2=-299.452 g=-338.085
>297, c1=-301.835, c2=-290.825 g=-336.076
>298, c1=-312.913, c2=-309.520 g=-340.698
>299, c1=-311.919, c2=-298.662 g=-339.628
>300, c1=-313.732, c2=-302.944 g=-340.539
>301, c1=-306.081, c2=-294.683 g=-340.406
>302, c1=-310.721, c2=-304.625 g=-342.064
>303, c1=-313.802, c2=-295.589 g=-339.845
>304, c1=-308.270, c2=-290.561 g=-340.307
>305, c1=-308.389, c2=-283.480 g=-339.863
>306, c1=-303.474, c2=-302.170 g=-341.048
>307, c1=-297.638, c2=-292.665 g=-342.136
>308, c1=-310.041, c2=-292.450 g=-343.320
>309, c1=-304.505, c2=-275.080 g=-339.766
>310, c1=-300.737, c2=-267.848 g=-336.631
>311, c1=-256.843, c2=-218.749 g=-332.221
>312, c1=-243.710, c2=-229.095 g=-326.781
>313, c1=-234.903, c2=-230.737 g=-327.692
>314, c1=-256.348, c2=-227



>Saved: generated_plot_0388.png and model_0388.h5
>389, c1=-368.192, c2=-330.479 g=258.524
>390, c1=-369.817, c2=-332.339 g=265.702
>391, c1=-371.915, c2=-334.419 g=271.069
>392, c1=-374.875, c2=-334.590 g=277.290
>393, c1=-377.464, c2=-336.263 g=282.216
>394, c1=-380.158, c2=-338.979 g=288.937
>395, c1=-380.014, c2=-341.845 g=294.372
>396, c1=-384.251, c2=-341.577 g=297.931
>397, c1=-388.328, c2=-343.055 g=301.931
>398, c1=-388.471, c2=-342.868 g=304.353
>399, c1=-390.846, c2=-344.271 g=307.067
>400, c1=-392.052, c2=-344.865 g=308.305
>401, c1=-393.930, c2=-344.512 g=311.909
>402, c1=-396.196, c2=-345.010 g=312.361
>403, c1=-396.059, c2=-340.153 g=312.154
>404, c1=-395.373, c2=-334.285 g=306.998
>405, c1=-394.843, c2=-322.754 g=293.163
>406, c1=-393.052, c2=-309.316 g=265.083
>407, c1=-392.235, c2=-301.699 g=246.625
>408, c1=-389.877, c2=-306.263 g=242.914
>409, c1=-390.920, c2=-312.782 g=241.518
>410, c1=-391.266, c2=-318.056 g=238.346
>411, c1=-393.160, c2=-322.548 g=239.814
>412, c



>Saved: generated_plot_0485.png and model_0485.h5
>486, c1=-533.499, c2=-484.886 g=485.362
>487, c1=-534.706, c2=-486.155 g=486.496
>488, c1=-538.059, c2=-487.340 g=487.803
>489, c1=-537.761, c2=-488.567 g=489.112
>490, c1=-539.751, c2=-489.783 g=490.390
>491, c1=-540.671, c2=-491.066 g=491.527
>492, c1=-541.023, c2=-492.211 g=492.871
>493, c1=-543.711, c2=-493.480 g=494.055
>494, c1=-544.813, c2=-494.693 g=495.288
>495, c1=-545.766, c2=-495.896 g=496.334
>496, c1=-547.191, c2=-497.005 g=497.605
>497, c1=-547.710, c2=-498.251 g=498.857
>498, c1=-549.385, c2=-499.383 g=500.062
>499, c1=-551.202, c2=-500.514 g=501.172
>500, c1=-552.477, c2=-501.512 g=502.029
>501, c1=-554.877, c2=-502.335 g=504.065
>502, c1=-554.202, c2=-504.213 g=505.315
>503, c1=-557.405, c2=-505.743 g=506.006
>504, c1=-559.397, c2=-506.581 g=507.019
>505, c1=-559.143, c2=-507.979 g=508.367
>506, c1=-558.784, c2=-508.335 g=509.304
>507, c1=-561.367, c2=-508.947 g=510.651
>508, c1=-562.088, c2=-510.329 g=511.538
>509, c



>Saved: generated_plot_0582.png and model_0582.h5
>583, c1=-653.492, c2=-583.406 g=588.931
>584, c1=-653.744, c2=-585.894 g=591.266
>585, c1=-655.590, c2=-591.783 g=593.125
>586, c1=-657.757, c2=-593.796 g=594.920
>587, c1=-659.465, c2=-596.172 g=595.926
>588, c1=-660.226, c2=-599.187 g=599.419
>589, c1=-661.765, c2=-602.711 g=601.141
>590, c1=-665.373, c2=-604.064 g=603.604
>591, c1=-666.244, c2=-607.216 g=604.564
>592, c1=-667.124, c2=-608.027 g=605.368
>593, c1=-669.423, c2=-610.641 g=608.556
>594, c1=-671.140, c2=-611.800 g=609.874
>595, c1=-672.710, c2=-613.279 g=611.643
>596, c1=-674.784, c2=-615.305 g=613.221
>597, c1=-676.281, c2=-617.013 g=614.979
>598, c1=-677.902, c2=-617.764 g=616.410
>599, c1=-679.385, c2=-619.834 g=617.389
>600, c1=-681.424, c2=-620.192 g=619.258
>601, c1=-682.147, c2=-621.598 g=621.506
>602, c1=-683.494, c2=-621.839 g=622.089
>603, c1=-686.181, c2=-623.182 g=623.809
>604, c1=-688.285, c2=-623.991 g=625.137
>605, c1=-689.662, c2=-624.032 g=626.047
>606, c



>Saved: generated_plot_0679.png and model_0679.h5
>680, c1=-762.535, c2=-613.416 g=665.683
>681, c1=-758.757, c2=-620.688 g=663.332
>682, c1=-758.207, c2=-591.841 g=655.847
>683, c1=-761.311, c2=-582.537 g=645.479
>684, c1=-760.813, c2=-571.321 g=627.693
>685, c1=-760.014, c2=-563.557 g=610.483
>686, c1=-760.790, c2=-544.223 g=595.422
>687, c1=-760.911, c2=-556.094 g=569.526
>688, c1=-763.217, c2=-552.595 g=562.512
>689, c1=-768.009, c2=-563.828 g=563.666
>690, c1=-767.081, c2=-562.942 g=574.056
>691, c1=-768.729, c2=-579.848 g=580.201
>692, c1=-770.072, c2=-593.192 g=577.748
>693, c1=-774.788, c2=-599.352 g=581.800
>694, c1=-774.285, c2=-605.470 g=585.436
>695, c1=-776.826, c2=-613.560 g=596.872
>696, c1=-778.853, c2=-622.212 g=604.949
>697, c1=-780.697, c2=-630.049 g=613.648
>698, c1=-782.486, c2=-635.446 g=612.129
>699, c1=-784.868, c2=-641.200 g=617.730
>700, c1=-785.265, c2=-647.828 g=621.634
>701, c1=-788.568, c2=-650.818 g=624.860
>702, c1=-788.513, c2=-656.414 g=625.370
>703, c



>Saved: generated_plot_0776.png and model_0776.h5
>777, c1=-680.211, c2=-580.649 g=-708.169
>778, c1=-676.228, c2=-555.620 g=-714.630
>779, c1=-695.301, c2=-605.641 g=-724.599
>780, c1=-688.858, c2=-526.713 g=-713.892
>781, c1=-689.013, c2=-601.332 g=-732.390
>782, c1=-695.064, c2=-580.362 g=-731.748
>783, c1=-706.912, c2=-609.780 g=-743.529
>784, c1=-736.743, c2=-632.647 g=-751.566
>785, c1=-732.204, c2=-632.875 g=-755.311
>786, c1=-733.043, c2=-629.997 g=-766.692
>787, c1=-731.575, c2=-627.067 g=-766.944
>788, c1=-733.784, c2=-615.585 g=-769.284
>789, c1=-726.144, c2=-627.511 g=-775.201
>790, c1=-729.377, c2=-626.063 g=-779.227
>791, c1=-722.221, c2=-608.475 g=-775.906
>792, c1=-738.537, c2=-675.763 g=-794.497
>793, c1=-737.314, c2=-564.887 g=-773.671
>794, c1=-726.924, c2=-628.290 g=-786.817
>795, c1=-716.425, c2=-605.685 g=-786.469
>796, c1=-739.668, c2=-659.663 g=-798.757
>797, c1=-709.861, c2=-581.925 g=-795.130
>798, c1=-735.855, c2=-680.792 g=-810.532
>799, c1=-729.412, c2=-624



>Saved: generated_plot_0873.png and model_0873.h5
>874, c1=-595.245, c2=474.897 g=-988.640
>875, c1=-521.943, c2=628.834 g=-985.459
>876, c1=-688.965, c2=678.228 g=-978.603
>877, c1=-716.439, c2=821.735 g=-976.665
>878, c1=-766.012, c2=880.823 g=-976.853
>879, c1=-833.174, c2=891.003 g=-976.167
>880, c1=-846.963, c2=901.491 g=-975.254
>881, c1=-881.676, c2=917.005 g=-974.314
>882, c1=-901.091, c2=929.614 g=-970.858
>883, c1=-911.727, c2=934.016 g=-966.854
>884, c1=-925.785, c2=935.767 g=-961.742
>885, c1=-940.488, c2=937.934 g=-955.512
>886, c1=-950.380, c2=937.878 g=-946.805
>887, c1=-961.183, c2=936.150 g=-931.977
>888, c1=-971.059, c2=928.448 g=-910.623
>889, c1=-968.524, c2=908.266 g=-876.241
>890, c1=-966.478, c2=879.639 g=-838.166
>891, c1=-963.707, c2=840.561 g=-785.553
>892, c1=-957.949, c2=793.914 g=-719.933
>893, c1=-942.510, c2=748.655 g=-652.077
>894, c1=-927.720, c2=690.877 g=-577.078
>895, c1=-913.589, c2=649.411 g=-513.658
>896, c1=-888.923, c2=640.535 g=-471.240
>897, c



>Saved: generated_plot_0970.png and model_0970.h5


# **Conclusion**



In this experiment, the implementation of Wasserstein Generative Adversarial Network (WGAN) was successfully carried out, showcasing a practical approach to address traditional GANs' training instability. Through a comparative analysis, it was evident that WGANs provide a more reliable and robust framework for training generative models, underlined by the utilization of the Wasserstein distance metric. The experiment further illuminated the critical alterations made in WGAN, fostering an enhanced understanding of the mechanics and theoretical underpinnings governing its superior performance. This hands-on experience has significantly enriched comprehension of advanced generative modeling techniques, paving the way for exploring further optimizations and applications of GANs in complex data generation tasks.