# Generative Adversarial Networks! 

## Demo - We will generate new Pokemon using a Generative Adversarial Network!
![alt text](https://i.imgur.com/3wW3WWM.jpg "Logo Title Text 1")

## What is the difference between a Generative and a discriminative model?

![alt text](https://image.slidesharecdn.com/spatiallycoherentlatenttopicmodelforconcurrentobjectv1-3-091108054619-phpapp01/95/spatially-coherent-latent-topic-model-for-concurrent-object-segmentation-and-classification-5-728.jpg?cb=1257665696 "Logo Title Text 1")

![alt text](https://i.imgur.com/6Jye8hd.png "Logo Title Text 1")

### Generative Models:

- Aim to model how the data is generated. From P(x|c)×P(c)P(x|c)×P(c) we can obtain P(c|x)P(c|x) (see Bayes' Theorem). - They try to learn a joint probability distribution P(x,c)P(x,c) 

#### Pros:

- We have the knowledge about the data distribution.

#### Cons:

- Very expensive to get (a lot of parameters)
- Need lots of data

### Discriminative Models:

- Aim at learning P(c|x)P(c|x) by using probabilistic approaches (e.g., logistic regression), or by mapping classes from a set of points (e.g., perceptrons and SVMs). 

#### Pros:

- Easy to model

#### Cons:

- To classify, but not to generate the data.

![alt text](https://i.imgur.com/MGGLDsi.png "Logo Title Text 1")

![alt text](https://image.slidesharecdn.com/ch20-161213234552/95/deep-generative-models-2-638.jpg?cb=1481672832 "Logo Title Text 1")

#### So discriminative algorithms learn the boundaries between classeswhile generative algorithms learn the distribution of classes 

## What is a Generative Adversarial Network? 

![alt text](https://d3ansictanv2wj.cloudfront.net/GAN_Overall-7319eab235d83fe971fb769f62cbb15d.png "Logo Title Text 1")

![alt text](https://cdn-images-1.medium.com/max/958/1*-gFsbymY9oJUQJ-A3GTfeg.png "Logo Title Text 1")

#### Generator 
- Draw some parameter z from a source of randomness, e.g. a normal distribution
- Apply a function f such that we get x′=G(u,w)x′=G(u,w)
- Compute the gradient with respect to ww to minimize logp(y=fake|x′)log⁡p(y=fake|x′)

#### Discriminator 
- Improve the accuracy of a binary classifier f, i.e. maximize logp(y=fake|x′)log⁡p(y=fake|x′) and logp(y=true|x)log⁡p(y=true|x) for fake and real data respectively.

- There are two optimization problems running simultaneously, and the optimization terminates if a stalemate has been reached. 

- The models play two distinct (literally, adversarial) roles. 
- Given some real data set R
- G is the generator, trying to create fake data that looks just like the genuine data
- D is the discriminator, getting data from either the real set or G and labeling the difference. 
- G was like a team of forgers trying to match real paintings with their output, while D was the team of detectives trying to tell the difference. 
- Both D and G get better over time until G had essentially becomes a “master forger” of the genuine article and D is at a loss, “unable to differentiate between the two distributions.”

![alt text](https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2017/04/image28.png "Logo Title Text 1")

#### There are really only 5 components to think about:

- R: The original, genuine data set
- I: The random noise that goes into the generator as a source of entropy
- G: The generator which tries to copy/mimic the original data set
- D: The discriminator which tries to tell apart G’s output from R

##### The actual ‘training’ loop is where we teach G to trick D and D to beware G.

## Use cases

![alt text](https://phillipi.github.io/pix2pix/images/teaser_v3.png "Logo Title Text 1")

![alt text](https://qph.ec.quoracdn.net/main-qimg-b85f35dcdcb5f4f48e8063dbf1f6abd3.webp "Logo Title Text 1")

- Generate images/videos/text/any data type!

Researchers from Insilico Medicine proposed an approach of artificially intelligent drug discovery using GANs.
The goal is to train the Generator to sample drug candidates for a given disease as precisely as possible to existing drugs from a Drug Database.

![alt text](https://cdn-images-1.medium.com/max/1600/0*--g8RQpR-Hofpa8G. "Logo Title Text 1")

After training, it’s possible to generate a drug for a previously incurable disease using the Generator, and using the Discriminator to determine whether the sampled drug actually cures the given disease.

## Other Types of GANs 

### Deep Convolutional GANs (DCGANs)

#### DCGANs were the first major improvement on the GAN architecture. They are more stable in terms of training and generate higher quality samples.

![alt text](https://image.slidesharecdn.com/dcgan-howdoesitwork-160923005917/95/dcgan-how-does-it-work-12-638.jpg?cb=1493068156 "Logo Title Text 1")

Important Discoveries

- Batch normalization is a must in both networks.
- Fully hidden connected layers are not a good idea.
- Avoid pooling, simply stride your convolutions!
- ReLU activations are your friend (almost always).
- Vanilla GANs could work on simple datasets, but DCGANs are far better.
- DCGANS are solid baseline to compare with your fancy new state-of-the-art GAN algorithm.

### Conditional GANs

![alt text](http://guimperarnau.com/files/blog/Fantastic-GANs-and-where-to-find-them/cGAN_overview.jpg "Logo Title Text 1")

#### CGANs use extra label information. This results in better quality images and being able to control – to an extent – how generated images will look.


- Here we have conditional information Y that describes some aspect of the data.
- if we are dealing with faces, Y could describe attributes such as hair color or gender. 
- Then, this attribute information is inserted in both the generator and the discriminator.

![alt text](http://guimperarnau.com/files/blog/Fantastic-GANs-and-where-to-find-them/cGAN_disentanglement.jpg "Logo Title Text 1")

### Conditional GANs are interesting for two reasons:

1. As you are feeding more information into the model, the GAN learns to exploit it and, therefore, is able to generate better samples.
2. We have two ways of controlling the representations of the images. Without the conditional GAN, all the image information was encoded in Z. With cGANs, as we add conditional information Y, now these two — Z and Y — will encode different information. For example, let’s suppose Y encodes the digit of a hand-written number (from 0 to 9). Then, Z would encode all the other variations that are not encoded in Y. That could be, for example, the style of the number (size, weight, rotation, etc).

### Wasserstein GANs

#### WGANs Change the loss function to include the Wasserstein distance. As a result, WassGANs have loss functions that correlate with image quality. Also, training stability improves and is not as dependent on the architecture.

![alt text](http://guimperarnau.com/files/blog/Fantastic-GANs-and-where-to-find-them/crazy_loss_function.jpg "Logo Title Text 1")
WTFFFFfff how about this instead...
![alt text](http://guimperarnau.com/files/blog/Fantastic-GANs-and-where-to-find-them/WassGAN_loss_function.jpg "Logo Title Text 1")

- GANs have always had problems with convergence and, as a consequence, you don’t really know when to stop training them. In other words, the loss function doesn’t correlate with image quality. This is a big headache because:

1. you need to be constantly looking at the samples to tell whether you model is training correctly or not.
2. you don’t know when to stop training (no convergence).
3. you don’t have a numerical value that tells you how well are you tuning the parameters.
4. For example, see these two uninformative loss functions plots of a DCGAN perfectly able to generate MNIST samples:

This interpretability issue is one of the problems that Wasserstein GANs aims to solve. How? GANs can be interpreted to minimize the Jensen-Shannon divergence, which is 0 if the real and fake distribution don’t overlap (which is usually the case). So, instead of minimizing the JS divergence, the authors use the Wasserstein distance, which describes the distance between the “points” from one distribution to the other.

So, WassGAN has a loss function that correlates with image quality and enables convergence. It is also more stable, meaning that it is not as dependent on the architecture. For example, it works quite well even if you remove batch normalization or try weird architectures.

### use if 
- you are looking for a state-of-the-art GAN with the highest training stability.
- you want an informative and interpretable loss function.


In [1]:
# resize pokeGAN.py
import os
import cv2

src = "./data" #pokeRGB_black
dst = "./resizedData" # resized

os.mkdir(dst)

for each in os.listdir(src):
    img = cv2.imread(os.path.join(src,each))
    img = cv2.resize(img,(256,256))
    cv2.imwrite(os.path.join(dst,each), img)
    

FileExistsError: [Errno 17] File exists: './resizedData'

In [2]:
from PIL import Image
import os
src = "./resizedData"
dst = "./resized_black/"

for each in os.listdir(src):
    png = Image.open(os.path.join(src,each))
    # print each
    if png.mode == 'RGBA':
        png.load() # required for png.split()
        background = Image.new("RGB", png.size, (0,0,0))
        background.paste(png, mask=png.split()[3]) # 3 is the alpha channel
        background.save(os.path.join(dst,each.split('.')[0] + '.jpg'), 'JPEG')
    else:
        png.convert('RGB')
        png.save(os.path.join(dst,each.split('.')[0] + '.jpg'), 'JPEG')


In [3]:

# -*- coding: utf-8 -*-

# generate new kinds of pokemons

import os
import tensorflow as tf
import numpy as np
import cv2
import random
import scipy.misc
from utils import *

slim = tf.contrib.slim

HEIGHT, WIDTH, CHANNEL = 128, 128, 3
BATCH_SIZE = 64
EPOCH = 5000
os.environ['CUDA_VISIBLE_DEVICES'] = '15'
version = 'newPokemon'
newPoke_path = './' + version

def lrelu(x, n, leak=0.2): 
    return tf.maximum(x, leak * x, name=n) 
 
def process_data():   
    current_dir = os.getcwd()
    # parent = os.path.dirname(current_dir)
    pokemon_dir = os.path.join(current_dir, 'data')
    images = []
    for each in os.listdir(pokemon_dir):
        images.append(os.path.join(pokemon_dir,each))
    # print images    
    all_images = tf.convert_to_tensor(images, dtype = tf.string)
    
    images_queue = tf.train.slice_input_producer(
                                        [all_images])
                                        
    content = tf.read_file(images_queue[0])
    image = tf.image.decode_jpeg(content, channels = CHANNEL)
    # sess1 = tf.Session()
    # print sess1.run(image)
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_brightness(image, max_delta = 0.1)
    image = tf.image.random_contrast(image, lower = 0.9, upper = 1.1)
    # noise = tf.Variable(tf.truncated_normal(shape = [HEIGHT,WIDTH,CHANNEL], dtype = tf.float32, stddev = 1e-3, name = 'noise')) 
    # print image.get_shape()
    size = [HEIGHT, WIDTH]
    image = tf.image.resize_images(image, size)
    image.set_shape([HEIGHT,WIDTH,CHANNEL])
    # image = image + noise
    # image = tf.transpose(image, perm=[2, 0, 1])
    # print image.get_shape()
    
    image = tf.cast(image, tf.float32)
    image = image / 255.0
    
    iamges_batch = tf.train.shuffle_batch(
                                    [image], batch_size = BATCH_SIZE,
                                    num_threads = 4, capacity = 200 + 3* BATCH_SIZE,
                                    min_after_dequeue = 200)
    num_images = len(images)

    return iamges_batch, num_images

def generator(input, random_dim, is_train, reuse=False):
    c4, c8, c16, c32, c64 = 512, 256, 128, 64, 32 # channel num
    s4 = 4
    output_dim = CHANNEL  # RGB image
    with tf.variable_scope('gen') as scope:
        if reuse:
            scope.reuse_variables()
        w1 = tf.get_variable('w1', shape=[random_dim, s4 * s4 * c4], dtype=tf.float32,
                             initializer=tf.truncated_normal_initializer(stddev=0.02))
        b1 = tf.get_variable('b1', shape=[c4 * s4 * s4], dtype=tf.float32,
                             initializer=tf.constant_initializer(0.0))
        flat_conv1 = tf.add(tf.matmul(input, w1), b1, name='flat_conv1')
         #Convolution, bias, activation, repeat! 
        conv1 = tf.reshape(flat_conv1, shape=[-1, s4, s4, c4], name='conv1')
        bn1 = tf.contrib.layers.batch_norm(conv1, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn1')
        act1 = tf.nn.relu(bn1, name='act1')
        # 8*8*256
        #Convolution, bias, activation, repeat! 
        conv2 = tf.layers.conv2d_transpose(act1, c8, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                           kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                           name='conv2')
        bn2 = tf.contrib.layers.batch_norm(conv2, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn2')
        act2 = tf.nn.relu(bn2, name='act2')
        # 16*16*128
        conv3 = tf.layers.conv2d_transpose(act2, c16, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                           kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                           name='conv3')
        bn3 = tf.contrib.layers.batch_norm(conv3, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn3')
        act3 = tf.nn.relu(bn3, name='act3')
        # 32*32*64
        conv4 = tf.layers.conv2d_transpose(act3, c32, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                           kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                           name='conv4')
        bn4 = tf.contrib.layers.batch_norm(conv4, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn4')
        act4 = tf.nn.relu(bn4, name='act4')
        # 64*64*32
        conv5 = tf.layers.conv2d_transpose(act4, c64, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                           kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                           name='conv5')
        bn5 = tf.contrib.layers.batch_norm(conv5, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn5')
        act5 = tf.nn.relu(bn5, name='act5')
        
        #128*128*3
        conv6 = tf.layers.conv2d_transpose(act5, output_dim, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                           kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                           name='conv6')
        # bn6 = tf.contrib.layers.batch_norm(conv6, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn6')
        act6 = tf.nn.tanh(conv6, name='act6')
        return act6


def discriminator(input, is_train, reuse=False):
    c2, c4, c8, c16 = 64, 128, 256, 512  # channel num: 64, 128, 256, 512
    with tf.variable_scope('dis') as scope:
        if reuse:
            scope.reuse_variables()

        #Convolution, activation, bias, repeat! 
        conv1 = tf.layers.conv2d(input, c2, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                 kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                 name='conv1')
        bn1 = tf.contrib.layers.batch_norm(conv1, is_training = is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope = 'bn1')
        act1 = lrelu(conv1, n='act1')
         #Convolution, activation, bias, repeat! 
        conv2 = tf.layers.conv2d(act1, c4, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                 kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                 name='conv2')
        bn2 = tf.contrib.layers.batch_norm(conv2, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn2')
        act2 = lrelu(bn2, n='act2')
        #Convolution, activation, bias, repeat! 
        conv3 = tf.layers.conv2d(act2, c8, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                 kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                 name='conv3')
        bn3 = tf.contrib.layers.batch_norm(conv3, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn3')
        act3 = lrelu(bn3, n='act3')
         #Convolution, activation, bias, repeat! 
        conv4 = tf.layers.conv2d(act3, c16, kernel_size=[5, 5], strides=[2, 2], padding="SAME",
                                 kernel_initializer=tf.truncated_normal_initializer(stddev=0.02),
                                 name='conv4')
        bn4 = tf.contrib.layers.batch_norm(conv4, is_training=is_train, epsilon=1e-5, decay = 0.9,  updates_collections=None, scope='bn4')
        act4 = lrelu(bn4, n='act4')
       
        # start from act4
        dim = int(np.prod(act4.get_shape()[1:]))
        fc1 = tf.reshape(act4, shape=[-1, dim], name='fc1')
      
        
        w2 = tf.get_variable('w2', shape=[fc1.shape[-1], 1], dtype=tf.float32,
                             initializer=tf.truncated_normal_initializer(stddev=0.02))
        b2 = tf.get_variable('b2', shape=[1], dtype=tf.float32,
                             initializer=tf.constant_initializer(0.0))

        # wgan just get rid of the sigmoid
        logits = tf.add(tf.matmul(fc1, w2), b2, name='logits')
        # dcgan
        acted_out = tf.nn.sigmoid(logits)
        return logits #, acted_out


def train():
    random_dim = 100
    print(os.environ['CUDA_VISIBLE_DEVICES'])
    
    with tf.variable_scope('input'):
        #real and fake image placholders
        real_image = tf.placeholder(tf.float32, shape = [None, HEIGHT, WIDTH, CHANNEL], name='real_image')
        random_input = tf.placeholder(tf.float32, shape=[None, random_dim], name='rand_input')
        is_train = tf.placeholder(tf.bool, name='is_train')
    
    # wgan
    fake_image = generator(random_input, random_dim, is_train)
    
    real_result = discriminator(real_image, is_train)
    fake_result = discriminator(fake_image, is_train, reuse=True)
    
    d_loss = tf.reduce_mean(fake_result) - tf.reduce_mean(real_result)  # This optimizes the discriminator.
    g_loss = -tf.reduce_mean(fake_result)  # This optimizes the generator.
            

    t_vars = tf.trainable_variables()
    d_vars = [var for var in t_vars if 'dis' in var.name]
    g_vars = [var for var in t_vars if 'gen' in var.name]
    # test
    # print(d_vars)
    trainer_d = tf.train.RMSPropOptimizer(learning_rate=2e-4).minimize(d_loss, var_list=d_vars)
    trainer_g = tf.train.RMSPropOptimizer(learning_rate=2e-4).minimize(g_loss, var_list=g_vars)
    # clip discriminator weights
    d_clip = [v.assign(tf.clip_by_value(v, -0.01, 0.01)) for v in d_vars]

    
    batch_size = BATCH_SIZE
    image_batch, samples_num = process_data()
    
    batch_num = int(samples_num / batch_size)
    total_batch = 0
    sess = tf.Session()
    saver = tf.train.Saver()
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    # continue training
    save_path = saver.save(sess, "/tmp/model.ckpt")
    ckpt = tf.train.latest_checkpoint('./model/' + version)
    saver.restore(sess, save_path)
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    print('total training sample num:%d' % samples_num)
    print('batch size: %d, batch num per epoch: %d, epoch num: %d' % (batch_size, batch_num, EPOCH))
    print('start training...')
    for i in range(EPOCH):
        print(i)
        for j in range(batch_num):
            print(j)
            d_iters = 5
            g_iters = 1

            train_noise = np.random.uniform(-1.0, 1.0, size=[batch_size, random_dim]).astype(np.float32)
            for k in range(d_iters):
                print(k)
                train_image = sess.run(image_batch)
                #wgan clip weights
                sess.run(d_clip)
                
                # Update the discriminator
                _, dLoss = sess.run([trainer_d, d_loss],
                                    feed_dict={random_input: train_noise, real_image: train_image, is_train: True})

            # Update the generator
            for k in range(g_iters):
                # train_noise = np.random.uniform(-1.0, 1.0, size=[batch_size, random_dim]).astype(np.float32)
                _, gLoss = sess.run([trainer_g, g_loss],
                                    feed_dict={random_input: train_noise, is_train: True})

            # print 'train:[%d/%d],d_loss:%f,g_loss:%f' % (i, j, dLoss, gLoss)
            
        # save check point every 500 epoch
        if i%500 == 0:
            if not os.path.exists('./model/' + version):
                os.makedirs('./model/' + version)
            saver.save(sess, './model/' +version + '/' + str(i))  
        if i%50 == 0:
            # save images
            if not os.path.exists(newPoke_path):
                os.makedirs(newPoke_path)
            sample_noise = np.random.uniform(-1.0, 1.0, size=[batch_size, random_dim]).astype(np.float32)
            imgtest = sess.run(fake_image, feed_dict={random_input: sample_noise, is_train: False})
            # imgtest = imgtest * 255.0
            # imgtest.astype(np.uint8)
            save_images(imgtest, [8,8] ,newPoke_path + '/epoch' + str(i) + '.jpg')
            
            print('train:[%d],d_loss:%f,g_loss:%f' % (i, dLoss, gLoss))
    coord.request_stop()
    coord.join(threads)


# def test():
    # random_dim = 100
    # with tf.variable_scope('input'):
        # real_image = tf.placeholder(tf.float32, shape = [None, HEIGHT, WIDTH, CHANNEL], name='real_image')
        # random_input = tf.placeholder(tf.float32, shape=[None, random_dim], name='rand_input')
        # is_train = tf.placeholder(tf.bool, name='is_train')
    
    # # wgan
    # fake_image = generator(random_input, random_dim, is_train)
    # real_result = discriminator(real_image, is_train)
    # fake_result = discriminator(fake_image, is_train, reuse=True)
    # sess = tf.InteractiveSession()
    # sess.run(tf.global_variables_initializer())
    # variables_to_restore = slim.get_variables_to_restore(include=['gen'])
    # print(variables_to_restore)
    # saver = tf.train.Saver(variables_to_restore)
    # ckpt = tf.train.latest_checkpoint('./model/' + version)
    # saver.restore(sess, ckpt)


#if __name__ == "__main__":
    #train()
    # test()



In [4]:

def test():
    random_dim = 100
    with tf.variable_scope('input'):
        real_image = tf.placeholder(tf.float32, shape = [None, HEIGHT, WIDTH, CHANNEL], name='real_image')
        random_input = tf.placeholder(tf.float32, shape=[None, random_dim], name='rand_input')
        is_train = tf.placeholder(tf.bool, name='is_train')
    
    # # wgan
    fake_image = generator(random_input, random_dim, is_train)
    real_result = discriminator(real_image, is_train)
    fake_result = discriminator(fake_image, is_train, reuse=True)
    sess = tf.InteractiveSession()
    sess.run(tf.global_variables_initializer())
    variables_to_restore = slim.get_variables_to_restore(include=['gen'])
    print(variables_to_restore)
    saver = tf.train.Saver(variables_to_restore)
    ckpt = tf.train.latest_checkpoint('./model/' + version)
    saver.restore(sess, ckpt)


if __name__ == "__main__":
    #train()
    test()

[<tf.Variable 'gen/w1:0' shape=(100, 8192) dtype=float32_ref>, <tf.Variable 'gen/b1:0' shape=(8192,) dtype=float32_ref>, <tf.Variable 'gen/bn1/beta:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'gen/bn1/moving_mean:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'gen/bn1/moving_variance:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'gen/conv2/kernel:0' shape=(5, 5, 256, 512) dtype=float32_ref>, <tf.Variable 'gen/conv2/bias:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'gen/bn2/beta:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'gen/bn2/moving_mean:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'gen/bn2/moving_variance:0' shape=(256,) dtype=float32_ref>, <tf.Variable 'gen/conv3/kernel:0' shape=(5, 5, 128, 256) dtype=float32_ref>, <tf.Variable 'gen/conv3/bias:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'gen/bn3/beta:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'gen/bn3/moving_mean:0' shape=(128,) dtype=float32_ref>, <tf.Variable 'gen/bn3/moving_variance:0' shape=(