# GANs: Generative Adversarial Networks
Imagine the following situation: you have a counterfeitor and a cop.   
*  The counterfeitor makes a fake money sample.  Initially, the counterfeitor is very bad at this!
*  The cop examines the fake money sample, compares it with real money, and she determines that the sample is fake.
*  Based on the information from the cop, the counterfeitor improves his performance at making fake money and tries again.
*  The cop examines the new fakes, and she determines - again - that they are indeed fake.
*  The process continues, until at some point, the cop is no longer able to tell the difference between fake and real currency.   

This is the basic outline of how a GAN works.   The GAN starts out with no knowledge of a sample, but through the use of a **generator** (the counterfeitor) and a **discriminator** (the cop), can end up with incredibly realistic versions of images, music, text, etc.

GANs a represent an example of a **unsupervised learning**, in which the model learns features about a dataset without the data being labeled.

GANS are a relatively new idea in machine learning, but they are quite interesting.  Yann LeCun - one of the major figures in machine learning - has called them “The coolest idea in deep learning in the last 20 years.”   Let see if we can understand how they work.

For this workbook, I will rely heavily on the model described in Mike Bernico's book, [Deep Learning Quick Reference](https://www.amazon.com/Deep-Learning-Quick-Reference-optimizing-ebook/dp/B0791JRGPY).

# A simple GAN Model
A simple cartoon of a GAN is shown here (from [this link](https://medium.freecodecamp.org/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394)):
![gan](gan.png)

The basic function of the GAN is the following:
* The **generator** network is fed random noise, and outputs a *fake image*.
* The **discriminator** network is fed a single image, as well as a label indicating that the image is either real or fake.   It should output 1 if the image is real, and 0 if the image is fake.

# Implementation
We will design a GAN which will generate fake - but realistic - looking MNIST digits.  For the most part, we will be using techniques that you have seen before.   However, the interplay of the two networks of generator and discriminator make the successful training of a GAN very tricky.   So we will need to add some new features in order to make the training more stable.



# The Discriminator

Lets start with the discriminator.   We have made models like this before.  In fact, the basic idea here is fairly straightforward: a simple CNN which takes 28x28x1 images (like our MNIST dataset) and then produces a single output: 1 if the image is real, and 0 is the image is fake.  

The discriminator we will use is shown below.   You will notice two new features:
* A **Leaky ReLU** layer.  This replaces the ReLU activations that we have used before.  Leaky ReLU allows the pass of a small gradient signal for negative values. 
* A Batch Normalization layer.  Batch norm works by normalizing the input features of a layer to have zero mean and unit variance. Batch norm helps to deal with problems due to poor parameter initialization.

In [1]:
from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Model
from keras.optimizers import Adam
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np

def build_discriminator(img_shape):
    input = Input(img_shape)
    #32 kernels, each is 3x3.
    x =Conv2D(32, kernel_size=3, strides=2, padding="same")(input)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.25)(x)
    x = Conv2D(64, kernel_size=3, strides=2, padding="same")(x)
    x = ZeroPadding2D(padding=((0, 1), (0, 1)))(x) #Pads the images with a zero rim.
    x = (LeakyReLU(alpha=0.2))(x)
    x = Dropout(0.25)(x)
    x = BatchNormalization(momentum=0.8)(x)
    x = Conv2D(128, kernel_size=3, strides=2, padding="same")(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.25)(x)
    x = BatchNormalization(momentum=0.8)(x)
    x = Conv2D(256, kernel_size=3, strides=1, padding="same")(x)
    x = LeakyReLU(alpha=0.2)(x)
    x = Dropout(0.25)(x)
    x = Flatten()(x)
    out = Dense(1, activation='sigmoid')(x)

    model = Model(input, out)
    print("-- Discriminator -- ")
    model.summary()
    return model

Using TensorFlow backend.


# The Generator
The generator takes a random vector - in this case a vector of length 100 - and via a series of Keras layers produces an image - in our case a 28x28x1 image.   It uses the Batch Normalization layer, as well as an **UpSampling** layer.  We have previosly used UpSampling layers when we first introduced CNN autoencoders.   The UpSampling layer repeats the rows and columns of the data by size[0] and size[1] respectively.


In [2]:
#The generator is fed random noise and predicts the images it represents.
def build_generator(noise_shape=(100,)):
    input = Input(noise_shape)
    x = Dense(128 * 7 * 7, activation="relu")(input)
    x = Reshape((7, 7, 128))(x)
    x = BatchNormalization(momentum=0.8)(x) #The batch norm tries to normalize the features to 
    #have 0 mean and unit variance.
    x = UpSampling2D(size=(2, 2))(x) #Enlarge the sample figure size
    x = Conv2D(128, kernel_size=3, padding="same")(x)
    x = Activation("relu")(x)
    x = BatchNormalization(momentum=0.8)(x)
    x = UpSampling2D(size=(2, 2))(x)
    x = Conv2D(64, kernel_size=3, padding="same")(x)
    x = Activation("relu")(x)
    x = BatchNormalization(momentum=0.8)(x)
    x = Conv2D(1, kernel_size=3, padding="same")(x)
    out = Activation("tanh")(x)
    model = Model(input, out)
    print("-- Generator -- ")
    model.summary() #After building the generator, print the summary of the network.
    return model

# Get the data
The following is a helper function to get the MNIST data.   We will end up just using the training image data, and not the test data.   **We won't use the MNIST labels at all**.  The key thing we know about the MNIST training images is that they are **real**.   We will label images below, but the only labels we need are whether the images are **real** (which is only the case if they come from MNIST) or **fake** (which is only the case if the images come from our generator model above).   We don't need to know if the real image is a 0,1,..,9,  just that it is real.

In [3]:

def load_data():
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train = (X_train.astype(np.float32) - 127.5) / 127.5
    X_train = np.expand_dims(X_train, axis=3)
    return X_train



# Helper function for displaying the generator images
As we train our generator, we will want to inspect the images to see how close they appear to emulating real images.   We will call the function a number of times during each epoch that we train the networks.    The code below uses 25 random vectors of length 100, feeds them into the current version of the generator, and makes 25 fake images.   It does not display them to the screen - it saves them to a directory.

In [4]:
def save_imgs(generator, epoch, batch):
    r, c = 5, 5
    noise = np.random.normal(0, 1, (r * c, 100))
    gen_imgs = generator.predict(noise)

    # Rescale images 0 - 1
    gen_imgs = 0.5 * gen_imgs + 0.5

    fig, axs = plt.subplots(r, c)
    cnt = 0
    for i in range(r):
        for j in range(c):
            axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')
            axs[i, j].axis('off')
            cnt += 1
    fig.savefig("images/mnist_%d_%d.png" % (epoch, batch))
    plt.close()


# Build and compile the discriminator and generator

In [5]:
discriminator = build_discriminator(img_shape=(28, 28, 1))
discriminator.compile(loss='binary_crossentropy',
                               optimizer=Adam(lr=0.0002, beta_1=0.5),
                               metrics=['accuracy'])

generator = build_generator()
generator.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5))


-- Discriminator -- 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 32)        320       
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 64)          18496     
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 8, 8, 64)          0         
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 8, 8, 64)          

# Training the discriminator
We have everything we need to train the discriminator: we have real images (from MNIST) and fake images (which will come from the generator).  So to train the discriminator, we simply have to feed it batches of labeled (remeber just if they are real or fake) images.   But how do we train the generator?

# Training the generator
The generator needs information to determine how poorly (or well) its fake images are.  To do this, we will use a clever trick:  we will use the output of the same discriminator above!  

To do this, we will need to make a **third** model, which below we call the **combined** model :
*  The combined model uses the generator output, and feeds it into the discriminator.  For this step, we will tell the combined model that these fake images are actually real.
*  During the **generator training process**, we will need to freeze the weights of the discriminator, so that only the weights of the generator are adjusted.   

In [9]:
z = Input(shape=(100,))
img = generator(z)
discriminator.trainable = False
real = discriminator(img)
combined = Model(z, real)
combined.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0002, beta_1=0.5))
combined.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 100)               0         
_________________________________________________________________
model_2 (Model)              (None, 28, 28, 1)         856705    
_________________________________________________________________
model_1 (Model)              (None, 1)                 392705    
Total params: 1,249,410
Trainable params: 856,065
Non-trainable params: 393,345
_________________________________________________________________


# The Training loop
The training loop below contains the logic of how we fit all 3 models (really just the two: the generator and the discriminator).  Some things to pay attention to in the code below:

1.  There is an outer loop over epochs, and an inner loop over batches.   With a batch size of 32 and the 60k MNIST images, this means we will run the inner loop 1875 times per epoch.
2.  During each batch iteration, we feed the disrimnator 16 real images and 16 fake images.  The labels for these are 1 and 0 respectively.  Notice that we randomly sample (with replacement) the real images.   So we are not using the full data each epoch!   Might be a good idea to revisit this if we were doing it for real....
3.  In the first half of this loop, we run **train_on_batch** to train the **discriminator only**.   We use train_on_batch rather than **fit** because we need to control **how we feed data to the model when fitting - train_on_batch allows us to do that.**
4.  **After** this, we then send 32 new **fake** images to the **combined** model.   These images are labeled as **real** (even though they are fake).   Remember that the **same** discriminator model is used as in step 3, but for this part of the loop the discriminator weights are frozen to the values that they had at step 3.  **Only the generator** weights are changed during this step.   Since we are asking the disciminator output to be 1, we are adjusting the generator weights so that the generator gets better at producing images which the discriminator will believe are real.
5.  Every 50 batches within each epoch, we generate 50 fake images, using the current version of the generator model.
6.  As the training progresses through epochs/batches, examine how the discriminatr loss and accuracy, as well as the generator loss change.  Note: train_on_batch for the discriminator returns both the loss as well as the metric (which is the accuracy in this case).


**NOTE**: Each epoch will take **alot** of time!   So you probably will want to stop it after two epoch or so.   If you want to see some cool images after 1,2,3,4 epochs, submit the script  **pbs_gan_gpu.sh** to the pbs batch system.   In about 30 minutes you will have some amazingly real digit images in the images/ directory!

In [8]:
X_train = load_data()

epochs=2        # you need about 40 epochs to get good images
batch_size=32
save_interval=1

num_examples = X_train.shape[0]
num_batches = int(num_examples / float(batch_size))
print('Number of examples: ', num_examples)
print('Number of Batches: ', num_batches)
print('Number of epochs: ', epochs)

half_batch = int(batch_size / 2)

for epoch in range(epochs + 1):
    for batch in range(num_batches):

            # noise images for the batch
            #Generate 16 fake images to feed the model
        noise = np.random.normal(0, 1, (half_batch, 100)) 
        fake_images = generator.predict(noise)
        fake_labels = np.zeros((half_batch, 1))

            # real images for batch
            #Choose 16 real images and label them 1
        idx = np.random.randint(0, X_train.shape[0], half_batch)
        real_images = X_train[idx]
        real_labels = np.ones((half_batch, 1))

            # Train the discriminator (real classified as ones and generated as zeros)
        d_loss_real = discriminator.train_on_batch(real_images, real_labels)
        d_loss_fake = discriminator.train_on_batch(fake_images, fake_labels)
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
        
        #Generate another 32 fake images and label them 1
        noise = np.random.normal(0, 1, (batch_size, 100)) 
            # Train the generator
        g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1)))

            # Plot the progress
        print("Epoch %d Batch %d/%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %
                  (epoch,batch, num_batches, d_loss[0], 100 * d_loss[1], g_loss))

        if batch % 50 == 0:
            save_imgs(generator, epoch, batch)


Number of examples:  60000
Number of Batches:  1875
Number of epochs:  2


  'Discrepancy between trainable weights and collected trainable'


Epoch 0 Batch 0/1875 [D loss: 0.669048, acc.: 50.00%] [G loss: 0.732056]
Epoch 0 Batch 1/1875 [D loss: 0.531192, acc.: 68.75%] [G loss: 0.911778]
Epoch 0 Batch 2/1875 [D loss: 0.464409, acc.: 78.12%] [G loss: 0.915664]
Epoch 0 Batch 3/1875 [D loss: 0.373614, acc.: 87.50%] [G loss: 1.184901]
Epoch 0 Batch 4/1875 [D loss: 0.333302, acc.: 93.75%] [G loss: 1.092908]
Epoch 0 Batch 5/1875 [D loss: 0.343000, acc.: 87.50%] [G loss: 1.279772]
Epoch 0 Batch 6/1875 [D loss: 0.243329, acc.: 96.88%] [G loss: 1.359359]
Epoch 0 Batch 7/1875 [D loss: 0.279534, acc.: 96.88%] [G loss: 1.260613]
Epoch 0 Batch 8/1875 [D loss: 0.328098, acc.: 90.62%] [G loss: 1.141092]
Epoch 0 Batch 9/1875 [D loss: 0.354586, acc.: 90.62%] [G loss: 1.533253]
Epoch 0 Batch 10/1875 [D loss: 0.396909, acc.: 78.12%] [G loss: 1.603405]
Epoch 0 Batch 11/1875 [D loss: 0.255061, acc.: 93.75%] [G loss: 1.984185]
Epoch 0 Batch 12/1875 [D loss: 0.200479, acc.: 96.88%] [G loss: 1.914810]
Epoch 0 Batch 13/1875 [D loss: 0.210727, acc.: 1

Epoch 0 Batch 110/1875 [D loss: 0.280792, acc.: 90.62%] [G loss: 3.520653]
Epoch 0 Batch 111/1875 [D loss: 0.026046, acc.: 100.00%] [G loss: 5.033329]
Epoch 0 Batch 112/1875 [D loss: 0.082894, acc.: 100.00%] [G loss: 4.545465]
Epoch 0 Batch 113/1875 [D loss: 0.045436, acc.: 100.00%] [G loss: 4.359184]
Epoch 0 Batch 114/1875 [D loss: 0.016193, acc.: 100.00%] [G loss: 3.382424]
Epoch 0 Batch 115/1875 [D loss: 0.034241, acc.: 100.00%] [G loss: 3.830151]
Epoch 0 Batch 116/1875 [D loss: 0.106492, acc.: 96.88%] [G loss: 3.812376]
Epoch 0 Batch 117/1875 [D loss: 0.111570, acc.: 100.00%] [G loss: 3.802435]
Epoch 0 Batch 118/1875 [D loss: 0.249070, acc.: 87.50%] [G loss: 6.822236]
Epoch 0 Batch 119/1875 [D loss: 0.327625, acc.: 87.50%] [G loss: 4.654359]
Epoch 0 Batch 120/1875 [D loss: 0.155852, acc.: 93.75%] [G loss: 5.753407]
Epoch 0 Batch 121/1875 [D loss: 0.552256, acc.: 75.00%] [G loss: 1.312541]
Epoch 0 Batch 122/1875 [D loss: 0.717480, acc.: 71.88%] [G loss: 5.325219]
Epoch 0 Batch 123/1

Epoch 0 Batch 220/1875 [D loss: 0.781170, acc.: 65.62%] [G loss: 1.304098]
Epoch 0 Batch 221/1875 [D loss: 0.767722, acc.: 59.38%] [G loss: 1.415585]
Epoch 0 Batch 222/1875 [D loss: 0.636241, acc.: 68.75%] [G loss: 1.575399]
Epoch 0 Batch 223/1875 [D loss: 0.874121, acc.: 46.88%] [G loss: 1.921173]
Epoch 0 Batch 224/1875 [D loss: 0.847768, acc.: 59.38%] [G loss: 1.218808]
Epoch 0 Batch 225/1875 [D loss: 0.753734, acc.: 56.25%] [G loss: 1.513452]
Epoch 0 Batch 226/1875 [D loss: 0.644435, acc.: 56.25%] [G loss: 1.541079]
Epoch 0 Batch 227/1875 [D loss: 0.719707, acc.: 59.38%] [G loss: 1.548774]
Epoch 0 Batch 228/1875 [D loss: 1.035522, acc.: 43.75%] [G loss: 1.628227]
Epoch 0 Batch 229/1875 [D loss: 0.609037, acc.: 68.75%] [G loss: 1.942869]
Epoch 0 Batch 230/1875 [D loss: 0.509968, acc.: 81.25%] [G loss: 1.791661]
Epoch 0 Batch 231/1875 [D loss: 0.988984, acc.: 46.88%] [G loss: 1.500057]
Epoch 0 Batch 232/1875 [D loss: 0.762486, acc.: 59.38%] [G loss: 1.933787]
Epoch 0 Batch 233/1875 [D

Epoch 0 Batch 330/1875 [D loss: 0.820485, acc.: 31.25%] [G loss: 1.072415]
Epoch 0 Batch 331/1875 [D loss: 0.679763, acc.: 56.25%] [G loss: 1.185073]
Epoch 0 Batch 332/1875 [D loss: 0.739811, acc.: 56.25%] [G loss: 1.300083]
Epoch 0 Batch 333/1875 [D loss: 0.722232, acc.: 59.38%] [G loss: 1.041570]
Epoch 0 Batch 334/1875 [D loss: 0.547245, acc.: 71.88%] [G loss: 1.311894]
Epoch 0 Batch 335/1875 [D loss: 0.827086, acc.: 56.25%] [G loss: 1.490564]
Epoch 0 Batch 336/1875 [D loss: 0.773469, acc.: 62.50%] [G loss: 1.559684]
Epoch 0 Batch 337/1875 [D loss: 0.846138, acc.: 53.12%] [G loss: 1.250888]
Epoch 0 Batch 338/1875 [D loss: 1.019434, acc.: 46.88%] [G loss: 0.966947]
Epoch 0 Batch 339/1875 [D loss: 0.517915, acc.: 78.12%] [G loss: 1.389895]
Epoch 0 Batch 340/1875 [D loss: 0.670207, acc.: 65.62%] [G loss: 1.671752]
Epoch 0 Batch 341/1875 [D loss: 0.804076, acc.: 50.00%] [G loss: 1.792705]
Epoch 0 Batch 342/1875 [D loss: 0.528410, acc.: 71.88%] [G loss: 1.501886]
Epoch 0 Batch 343/1875 [D

Epoch 0 Batch 440/1875 [D loss: 0.775783, acc.: 53.12%] [G loss: 1.121465]
Epoch 0 Batch 441/1875 [D loss: 0.634647, acc.: 65.62%] [G loss: 1.284183]
Epoch 0 Batch 442/1875 [D loss: 0.705253, acc.: 56.25%] [G loss: 1.206319]
Epoch 0 Batch 443/1875 [D loss: 0.830874, acc.: 50.00%] [G loss: 1.057434]
Epoch 0 Batch 444/1875 [D loss: 0.813501, acc.: 46.88%] [G loss: 1.018571]
Epoch 0 Batch 445/1875 [D loss: 0.672795, acc.: 68.75%] [G loss: 0.956446]
Epoch 0 Batch 446/1875 [D loss: 0.804837, acc.: 53.12%] [G loss: 1.302032]
Epoch 0 Batch 447/1875 [D loss: 0.900498, acc.: 53.12%] [G loss: 1.502282]
Epoch 0 Batch 448/1875 [D loss: 1.063535, acc.: 28.12%] [G loss: 1.296872]
Epoch 0 Batch 449/1875 [D loss: 0.659269, acc.: 59.38%] [G loss: 1.389754]
Epoch 0 Batch 450/1875 [D loss: 0.629155, acc.: 50.00%] [G loss: 1.264502]
Epoch 0 Batch 451/1875 [D loss: 0.868262, acc.: 46.88%] [G loss: 1.019425]
Epoch 0 Batch 452/1875 [D loss: 0.651832, acc.: 68.75%] [G loss: 1.163247]
Epoch 0 Batch 453/1875 [D

Epoch 0 Batch 550/1875 [D loss: 0.595659, acc.: 71.88%] [G loss: 0.839795]
Epoch 0 Batch 551/1875 [D loss: 0.647356, acc.: 56.25%] [G loss: 1.120744]
Epoch 0 Batch 552/1875 [D loss: 0.490600, acc.: 78.12%] [G loss: 1.230301]
Epoch 0 Batch 553/1875 [D loss: 0.651083, acc.: 68.75%] [G loss: 1.381792]
Epoch 0 Batch 554/1875 [D loss: 0.749420, acc.: 40.62%] [G loss: 1.267558]
Epoch 0 Batch 555/1875 [D loss: 0.859007, acc.: 40.62%] [G loss: 1.172243]
Epoch 0 Batch 556/1875 [D loss: 0.874543, acc.: 53.12%] [G loss: 0.914412]
Epoch 0 Batch 557/1875 [D loss: 0.481404, acc.: 78.12%] [G loss: 1.278574]
Epoch 0 Batch 558/1875 [D loss: 0.671202, acc.: 53.12%] [G loss: 1.309962]
Epoch 0 Batch 559/1875 [D loss: 0.585945, acc.: 71.88%] [G loss: 1.448488]
Epoch 0 Batch 560/1875 [D loss: 0.633846, acc.: 71.88%] [G loss: 1.388612]
Epoch 0 Batch 561/1875 [D loss: 0.580808, acc.: 68.75%] [G loss: 1.394008]
Epoch 0 Batch 562/1875 [D loss: 0.779447, acc.: 46.88%] [G loss: 1.115094]
Epoch 0 Batch 563/1875 [D

Epoch 0 Batch 660/1875 [D loss: 0.664955, acc.: 59.38%] [G loss: 1.319702]
Epoch 0 Batch 661/1875 [D loss: 0.811710, acc.: 50.00%] [G loss: 1.221643]
Epoch 0 Batch 662/1875 [D loss: 0.778481, acc.: 46.88%] [G loss: 1.335261]
Epoch 0 Batch 663/1875 [D loss: 0.784419, acc.: 46.88%] [G loss: 1.236444]
Epoch 0 Batch 664/1875 [D loss: 0.637906, acc.: 68.75%] [G loss: 1.359335]
Epoch 0 Batch 665/1875 [D loss: 0.809683, acc.: 59.38%] [G loss: 1.055802]
Epoch 0 Batch 666/1875 [D loss: 0.685899, acc.: 59.38%] [G loss: 1.510076]
Epoch 0 Batch 667/1875 [D loss: 0.772528, acc.: 56.25%] [G loss: 1.727327]
Epoch 0 Batch 668/1875 [D loss: 0.765549, acc.: 46.88%] [G loss: 1.406106]
Epoch 0 Batch 669/1875 [D loss: 0.687421, acc.: 65.62%] [G loss: 1.185078]
Epoch 0 Batch 670/1875 [D loss: 0.610626, acc.: 68.75%] [G loss: 1.169302]
Epoch 0 Batch 671/1875 [D loss: 0.502912, acc.: 78.12%] [G loss: 1.309330]
Epoch 0 Batch 672/1875 [D loss: 0.540080, acc.: 78.12%] [G loss: 1.120699]
Epoch 0 Batch 673/1875 [D

KeyboardInterrupt: 

# A Cool Use Case for GANs



From: https://towardsdatascience.com/generative-adversarial-networks-gans-a-beginners-guide-5b38eceece24

"In addition to generating beautiful pictures, an approach for semi-supervised learning with GANs has been developed that involves the discriminator producing an additional output indicating the label of the input. This approach enables cutting edge results on datasets with very few labeled examples. On MNIST, for example, 99.1% accuracy has been achieved with only 10 labeled examples per class with a fully connected neural network — a result that’s very close to the best known results with fully supervised approaches using all 60,000 labeled examples. This is extremely promising because labeled examples can be quite expensive to obtain in practice."

Here is an example of how this idea is implemented in practice:
https://towardsdatascience.com/semi-supervised-learning-with-gans-9f3cb128c5e

# Extra Credit - 5 pts!
This could be a bit difficult:
* Run the python version of the above code.  You can find a .py file and a .sh file in this directory to start with.  The shell script can be used with **qsub** to submit the script to a GPU.
* Modify the python code to save a version of the generator model.  You will need to train the GAN for about 40 epochs to generate realistic images.   This will take about 30 minutes using a GPU.  It would make sense to use something like the file naming procedure used for the images to name your model (meaning you could save the model every epoch, and label the filename using the current epoch number).
* Use a version of the MNIST classsifier that you made in Assignment 10.  You should have a saved model of the trained version of that classifier - copy it here.  If you can't find it, a version that I made can be found in the scratch area:

/fs/scratch/PAS1495/physics6820/GAN/fully_trained_model_cnn_conv_2_2_2_2.h5

* Load both of these models (the CNN MNIST classifier and the generator model).  Then do the follwoing:  
    * Use the GAN generator model to generate 10000 fake digits
    * Feed them into your MNIST classifier and make two plots: 
        * A histogram of which digit your fakes were classified as.  
        * The probability of your chosen digits. (You could also see if this varies by digit but that could be overkill!)