# Introduction to Pytorch/TensorFlow with Generative Models 7.11.2017
## Factors to be concerned when building neural networks
### Design Choices
-  network topology
-  initial weights
- activation functions
- loss functions
- backpropagation

### Deep Learning Frameworks
- power of gpus
- programming languages
- learning curves for beginners: pytorch vs. tensorflow

## PyTorch vs. TensorFlow
### Introduction
#### TensorFlow
- TensorFlow was developed by Google Brain. It is an open source software library for numerical computation using data flow graphs.
- Nodes in the graph represent mathematical operations;
- Edges represent the multidimensional data arrays (tensors) communicated between them.


#### PyTorch
- PyTorch  is a  of lua-based Torch framework developed by Facebook

### Difference
#### Static  vs Dynamic Computation Graph
- Category 1: static [used in TensorFlow, Theano, Caffe etc.]: the user first defines a computation graph (a symbolic representation of the computation), and then exampes are fed into an engine that executes this computation and computes its derivatives.
![title](define_and_run.png)

TensorFlow  uses a static graph, which means you need to define graph statically before a model can run
- A **tf.Graph** defines the symoblic represention of the computation. It does neither compute anything nor hold any values. 
- A **tf.Session** executes graphs or part of graphs. It allocates resources for that and holds the actual values of intermediate results and variables

**A visualization tool of tensorboard on jupyter notebook**

A cloud-hosted TensorBoard instance used to do the rendering:  takes the tf.GraphDef, sends it over to the cloud, and embeds an $<iframe>$ with the resulting visualization right in the Jupyter notebook.

In [1]:
# credit: https://stackoverflow.com/questions/41388673/visualizing-a-tensorflow-graph-in-jupyter-doesnt-work/41463991#41463991
# TensorFlow Graph visualizer code
import numpy as np
from IPython.display import clear_output, Image, display, HTML

def strip_consts(graph_def, max_const_size=32):
    """Strip large constant values from graph_def."""
    strip_def = tf.GraphDef()
    for n0 in graph_def.node:
        n = strip_def.node.add() 
        n.MergeFrom(n0)
        if n.op == 'Const':
            tensor = n.attr['value'].tensor
            size = len(tensor.tensor_content)
            if size > max_const_size:
                tensor.tensor_content = "<stripped %d bytes>"%size
    return strip_def

def show_graph(graph_def, max_const_size=32):
    """Visualize TensorFlow graph."""
    if hasattr(graph_def, 'as_graph_def'):
        graph_def = graph_def.as_graph_def()
    strip_def = strip_consts(graph_def, max_const_size=max_const_size)
    code = """
        <script src="//cdnjs.cloudflare.com/ajax/libs/polymer/0.3.3/platform.js"></script>
        <script>
          function load() {{
            document.getElementById("{id}").pbtxt = {data};
          }}
        </script>
        <link rel="import" href="https://tensorboard.appspot.com/tf-graph-basic.build.html" onload=load()>
        <div style="height:600px">
          <tf-graph-basic id="{id}"></tf-graph-basic>
        </div>
    """.format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))

    iframe = """
        <iframe seamless style="width:1200px;height:620px;border:0" srcdoc="{}"></iframe>
    """.format(code.replace('"', '&quot;'))
    display(HTML(iframe))

import libraries in tensorflow

In [2]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import os

**1. Constants** whose values cannot be changed

Example 1

** hello tensorflow**

In [3]:
# at first, create a graph named g0 which contains the "Hello, TensorFlow!" operation 
g0 = tf.Graph() 
with g0.as_default():
    hello = tf.constant("Hello, TensorFlow!")
    print(hello)

Tensor("Const:0", shape=(), dtype=string)


In [4]:
# then, execute the above defined operation by using tf.Session
sess = tf.Session(graph = g0)
print(sess.run(hello)) # 'b' indicates Bytes literals

b'Hello, TensorFlow!'


** 2. Variables**  can hold different values as opposed to constants.

Example 2

In [5]:
g1 = tf.Graph()
with g1.as_default():
    var = tf.Variable(94, name = 'a')
    initialize = tf.global_variables_initializer()
    assign = var.assign(10)
show_graph(g1)

When executing variables in TensorFlow, we need to initialize them at first by running, for example, tf.global_variables_initializer()

In [6]:
with tf.Session(graph = g1) as sess:
    sess.run(initialize)
    sess.run(assign)
    print(sess.run(var))

10


Example 3 with TensorFlow

** Basic operation **

In [7]:
g2 = tf.Graph()
with g2.as_default():
    x = tf.Variable(tf.random_uniform([5,3], 0.0, 1.0))
    y = tf.Variable(tf.random_uniform([5,3], 0.0, 1.0))

In [8]:
show_graph(g2)

Before running tensorflow.session, $x+y$ only contains the symbolic representation of the operation

In [9]:
x+y

<tf.Tensor 'add:0' shape=(5, 3) dtype=float32>

In [10]:
with tf.Session(graph = g2) as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(x+y))

[[ 1.03706896  0.43234122  0.62551689]
 [ 1.50199294  0.6065625   0.93252313]
 [ 0.48913479  1.19993126  0.65068138]
 [ 1.75023031  0.85043037  0.56525409]
 [ 1.39293563  0.27669728  0.82497942]]


** 3. Placeholder ** are tensors which are waiting to be initialized/fed. Placeholders are used for training data which is only fed when the code is actually run inside a session.

Exampe 4 with TensorFLow

In [11]:
g3 = tf.Graph()
with g3.as_default():
    a = tf.placeholder("float")
    b = tf.placeholder("float")
    y = tf.multiply(a, b)
show_graph(g3)

What is fed to Placeholder is called **feed_dict**. Feed_dict are key value pairs for holding data:

In [12]:
feed_dict ={a:2,b:3}
with tf.Session(graph = g3) as sess:
    print(sess.run(y,feed_dict))

6.0


- To show TensorBoard locally, we write the graph into the 'logs' directory
- Open terminal, type: tensorboard --logdir = DIR
- It will be located at localhost:6006

In [13]:
tf.summary.FileWriter("logs", g3).close()

- Category 2: dynamic [used in PyTorch, Chainer, Dynet]
![title](define_by_run.png)

**PyTorch** uses a dynamic graph, which means you can define, change and execute nodes on the fly, no special session interfaces or placeholders. 

- Make it more pythonic!

import libraries in pytorch

In [14]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

Example 3 with PyTorch

In [15]:
x = torch.rand(5,3)
print(x)


 0.4208  0.8971  0.7463
 0.3719  0.4638  0.9866
 0.0475  0.1447  0.5514
 0.5874  0.0788  0.5132
 0.8437  0.8861  0.6513
[torch.FloatTensor of size 5x3]



In [16]:
y = torch.rand(5,3)
print(x + y)


 1.4160  1.2703  0.9148
 0.4415  1.3625  1.0475
 0.1288  0.5747  1.0162
 1.3011  0.3810  1.1712
 1.4749  1.0265  0.8225
[torch.FloatTensor of size 5x3]



** Summary: PyTorch vs TensorFlow **
- Users of PyTorch who had Python programming experiences would have much lower learning curves thanks to Pytorch being more pythonic
- With PyTorch you can quickly try new models out
- Easier to debug with PyTorch
- Better visualization with TensorFlow thanks to TensorBoard
- TensoFlow is more suitable for production and deployment. It can be deployed on mobile platforms
- TensorFlow enables large-scale distributed model training 

## Generative Models
- Generative Adversarial Networks (GANs)
- Variational Autoencoders (VAEs)

#### Datasets: MNIST

In [17]:
def load_data():
    from tensorflow.examples.tutorials.mnist import input_data
    mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)
    return mnist

In [18]:
mnist = load_data()

Extracting ../MNIST_data/train-images-idx3-ubyte.gz
Extracting ../MNIST_data/train-labels-idx1-ubyte.gz
Extracting ../MNIST_data/t10k-images-idx3-ubyte.gz
Extracting ../MNIST_data/t10k-labels-idx1-ubyte.gz


### GANs
![title](gan.png)

![title](gan_alg.png)

global parameters

In [19]:
g_output_size = 784
g_input_size = 100
g_hidden_size = 128
minibatch_size = 128
num_epochs = 50000
learning_rate = 1e-3

**weight initialization** 

(Xavier initialization: http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization)

TensorFLow

In [20]:
xavier_init = tf.contrib.layers.xavier_initializer()
zero_init = tf.zeros_initializer()

**Generative Net**
- Input: *m* noise samples **z** sampled from noise prior $p_g(z)$
- In this tutorial, we use a simple fully connected one hidden layer net for generator and discriminator
- You can try it yourself with deeper convolutional nets
- We follow the orginal GAN paper, where no dropout was used for the generator
- We use exponential linear unit (ELU) as our activation function of the hidden layer
- We use sigmoid as our activation function of the output layer

TensorFlow

In [21]:
def sample_Z_tf(minibatch_size, g_input_size):
    return np.random.uniform(-1.0, 1.0, size=[minibatch_size, g_input_size]).astype(np.float32)

def generator_tf(z):
    with tf.variable_scope('generator'):

        W1 = tf.get_variable('Gen_W1', [g_input_size, g_hidden_size], initializer=xavier_init)
        B1 = tf.get_variable('Gen_B1', [g_hidden_size ], initializer=zero_init)
        W2 = tf.get_variable('Gen_W2', [g_hidden_size , g_output_size],initializer=xavier_init)
                                                
        B2 = tf.get_variable('Gen_B2', [g_output_size], initializer=zero_init)

        # summary
        tf.summary.histogram('weight1', W1)
        tf.summary.histogram('weight2', W2)
        tf.summary.histogram('biases1', B1)
        tf.summary.histogram('biases2', B2)

        Gen_h1 = tf.nn.elu((tf.matmul(z, W1) + B1))
        Gen_out = tf.nn.sigmoid((tf.matmul(Gen_h1, W2) + B2))
    
        return Gen_out

pytorch

- In PyTorch you define your Models as subclasses of torch.nn.Module.

- In the _\_init__ function, you are supposed to initialize the layers you want to use. 

- In the forward method, you specify the connections of your layers, meaning that you will use the layers you already initialized in _\_ini__, in order to re-use the same layer for each forward pass of data you make.

- torch.nn.Functional contains some useful functions such as activation functions. However, these are not full layers so if you want to specify a layer of any kind you should use torch.nn.Module.

- You can use the torch.nn.Functional conv operations to define a customized layer

In [22]:
def sample_Z_pytorch():
    return lambda minibatch_size, input_size: torch.rand(minibatch_size, input_size)  #uniform distribution

class gNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(gNet, self).__init__()
        self.Gen_h1 = nn.Linear(input_size, hidden_size)
        self.Gen_out = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.elu(self.Gen_h1(x))
        x = F.sigmoid(self.Gen_out(x))
        return x 

**Discriminator Net**
- Output: a single scalar representing the probability that **x** came from the data rather than $p_g$
- Dropout is used for avoiding overfitting, the probability of keep a neuron is set at 0.5

Tensorflow

In [23]:
def discriminator_tf(X, reuse=False):
    with tf.variable_scope('discriminator'):
        if reuse: 
            tf.get_variable_scope().reuse_variables()

        W1 = tf.get_variable('Dis_W1', [g_output_size, g_hidden_size ],
                             initializer=xavier_init)
        B1 = tf.get_variable('Dis_B1', [g_hidden_size ], initializer=zero_init)
        W2 = tf.get_variable('Dis_W2', [g_hidden_size , 1],
                             initializer=xavier_init)
        B2 = tf.get_variable('Dis_B2', [1], initializer=zero_init)

        # summary
        tf.summary.histogram('weight1', W1)
        tf.summary.histogram('weight2', W2)
        tf.summary.histogram('biases1', B1)
        tf.summary.histogram('biases2', B2)

        Dis_h1 = tf.nn.elu((tf.matmul(X, W1) + B1))
        Dis_h1_dropout = tf.nn.dropout(Dis_h1, 0.5)
        Dis_logits = tf.matmul(Dis_h1_dropout, W2) + B2

        return Dis_logits

In [24]:
with tf.variable_scope('Placeholder'):
    # Raw image
    X_tf = tf.placeholder(tf.float32, [None, g_output_size])
    tf.summary.image('Raw_Image', tf.reshape(X_tf, [-1, 28, 28, 1]), 3)
    # Noise
    Z_tf = tf.placeholder(tf.float32, [None, g_input_size])  # noise
    tf.summary.histogram('Noise', Z_tf)
    
with tf.variable_scope('GAN'):
    Gen_tf = generator_tf(Z_tf)
    # making sure that the real images and the synthetic image are fit into the same discriminator
    Dis_real_logits_tf = discriminator_tf(X_tf, reuse=False)
    Dis_fake_logits_tf = discriminator_tf(Gen_tf, reuse=True) 
    
tf.summary.image('Generated_Image', tf.reshape(Gen_tf, [-1, 28, 28, 1]), 3)

<tf.Tensor 'Generated_Image:0' shape=() dtype=string>

pytorch

In [25]:
class dNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(dNet, self).__init__()
        self.Dis_h1 = nn.Linear(input_size, hidden_size)
        self.Dis_h1_dropout = nn.Dropout(p=0.5)  # keep_prob = 0.5
        self.Dis_o = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = F.elu(self.Dis_h1(x))
        x = self.Dis_h1_dropout(x)
        x = F.sigmoid(self.Dis_o(x))
        return x

In [26]:
gnet = gNet(input_size = g_input_size, hidden_size = g_hidden_size, output_size = g_output_size)
dnet = dNet(input_size = g_output_size, hidden_size = g_hidden_size, output_size = 1)
z_sampler_pt = sample_Z_pytorch()


**Loss function**

- The loss function of a GAN is cross entropy
- The true labels for the discriminator are the real images
- The true labels for the discriminator are the fake images
- We would like to minimize both the loss of the discriminator and the generator

**Optimization**

- We use Adam as our optimization method

**Training**

Tensorflow

In [27]:
with tf.variable_scope('Dis_loss'):
    Dis_loss_real_tf = tf.reduce_mean(
            tf.nn.sigmoid_cross_entropy_with_logits(
                logits=Dis_real_logits_tf, labels=tf.ones_like(Dis_real_logits_tf)))
    Dis_loss_fake_tf= tf.reduce_mean(
            tf.nn.sigmoid_cross_entropy_with_logits(
                logits=Dis_fake_logits_tf, labels=tf.zeros_like(Dis_fake_logits_tf)))
    Dis_loss_tf = Dis_loss_real_tf + Dis_loss_fake_tf

    tf.summary.scalar('Dis_loss_real', Dis_loss_real_tf)
    tf.summary.scalar('Dis_loss_fake', Dis_loss_fake_tf)
    tf.summary.scalar('Dis_Loss', Dis_loss_tf)

with tf.name_scope('Gen_loss'):
    Gen_loss_tf = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits
                                (logits=Dis_fake_logits_tf, labels=tf.ones_like(Dis_fake_logits_tf)))
    tf.summary.scalar('Gen_Loss', Gen_loss_tf)

- In TensorFlow we could specify the parameters to be trained by optimization
- One way to specify the trainable paramters for the generator and the discriminator respectively is to use the method
tf.trainable_variables(), through which you can get a string of trainable variables 

In [28]:
train_var = tf.trainable_variables()
print(np.array(train_var))
theta_D = [var for var in train_var if 'discriminator' in var.name]
theta_G = [var for var in train_var if 'generator' in var.name]

[<tf.Variable 'GAN/generator/Gen_W1:0' shape=(100, 128) dtype=float32_ref>
 <tf.Variable 'GAN/generator/Gen_B1:0' shape=(128,) dtype=float32_ref>
 <tf.Variable 'GAN/generator/Gen_W2:0' shape=(128, 784) dtype=float32_ref>
 <tf.Variable 'GAN/generator/Gen_B2:0' shape=(784,) dtype=float32_ref>
 <tf.Variable 'GAN/discriminator/Dis_W1:0' shape=(784, 128) dtype=float32_ref>
 <tf.Variable 'GAN/discriminator/Dis_B1:0' shape=(128,) dtype=float32_ref>
 <tf.Variable 'GAN/discriminator/Dis_W2:0' shape=(128, 1) dtype=float32_ref>
 <tf.Variable 'GAN/discriminator/Dis_B2:0' shape=(1,) dtype=float32_ref>]


In [29]:
g = tf.get_default_graph()
tf.summary.FileWriter("logs", g).close()
show_graph(g)

define optimization operation

In [30]:
with tf.name_scope('Optimization'):
    Dis_optimizer_tf = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(Dis_loss_tf, var_list=theta_D)
    Gen_optimizer_tf = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(Gen_loss_tf, var_list=theta_G)

Training

Organize the generated images into 5x5 patches

In [31]:
def plot(samples):
    fig = plt.figure(figsize=(5, 5))
    gs = gridspec.GridSpec(5, 5)
    gs.update(wspace=0.05, hspace=0.05)

    for i, sample in enumerate(samples):
        ax = plt.subplot(gs[i])
        plt.axis('off')
        ax.set_xticklabels([])
        ax.set_yticklabels([])
        ax.set_aspect('equal')
        plt.imshow(sample.reshape(28, 28), cmap='Greys_r')
    return fig

**Loss function**

- The loss function of a GAN is cross entropy
- The true labels for the discriminator are the real images
- The true labels for the discriminator are the fake images
- We would like to minimize both the loss of the discriminator and the generator

**Optimization**

- We use Adam as our optimization method

**Training**

Tensorflow

In [32]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter('tmp/mnist/0')
writer.add_graph(sess.graph)

num_img = 0
if not os.path.exists('out_tf/'):
        os.makedirs('out_tf/')

for epoch in range(num_epochs):
    X_minibatch, _ = mnist.train.next_batch(minibatch_size)
    batch_noise = sample_Z_tf(minibatch_size, g_input_size)
    if epoch % 1000 == 0:
        samples = sess.run(Gen_tf, feed_dict={Z_tf: sample_Z_tf(25, g_input_size)})
        fig = plot(samples)
        plt.savefig('out_tf/{}.png'.format(str(num_img).zfill(3)), bbox_inches='tight')
        num_img += 1
        plt.close(fig)

    _, dis_loss_print = sess.run([Dis_optimizer_tf, Dis_loss_tf],
                                   feed_dict={X_tf: X_minibatch, Z_tf: batch_noise})

    _, gen_loss_print = sess.run([Gen_optimizer_tf, Gen_loss_tf],
                                   feed_dict={Z_tf: batch_noise})

    if epoch % 1000 == 0:
        s = sess.run(merged_summary, feed_dict={X_tf: X_minibatch, Z_tf: batch_noise})
        writer.add_summary(s, epoch)
        print('Epoch-{}; dis_loss_tf: {}; gen_loss_tf: {}'.format(epoch, dis_loss_print, gen_loss_print))

Epoch-0; dis_loss_tf: 1.058664083480835; gen_loss_tf: 4.349091053009033
Epoch-1000; dis_loss_tf: 0.1335725337266922; gen_loss_tf: 5.057978630065918
Epoch-2000; dis_loss_tf: 0.6714189648628235; gen_loss_tf: 2.6645262241363525
Epoch-3000; dis_loss_tf: 0.6968380212783813; gen_loss_tf: 2.1577887535095215
Epoch-4000; dis_loss_tf: 0.8439869284629822; gen_loss_tf: 1.7803955078125
Epoch-5000; dis_loss_tf: 1.060929775238037; gen_loss_tf: 1.240724802017212
Epoch-6000; dis_loss_tf: 1.0403419733047485; gen_loss_tf: 1.4204293489456177
Epoch-7000; dis_loss_tf: 1.0830174684524536; gen_loss_tf: 1.1874723434448242
Epoch-8000; dis_loss_tf: 0.9139600992202759; gen_loss_tf: 1.323685884475708
Epoch-9000; dis_loss_tf: 1.0590968132019043; gen_loss_tf: 1.3762543201446533
Epoch-10000; dis_loss_tf: 1.0897096395492554; gen_loss_tf: 1.338151454925537
Epoch-11000; dis_loss_tf: 1.182374358177185; gen_loss_tf: 1.1691179275512695
Epoch-12000; dis_loss_tf: 1.1625797748565674; gen_loss_tf: 1.3202258348464966
Epoch-1300

KeyboardInterrupt: 

PyTorch

initialization of loss function and optimization

In [34]:
loss = nn.BCELoss() 
ones_label = Variable(torch.ones(minibatch_size))
zeros_label = Variable(torch.zeros(minibatch_size))
Gen_optimizer_pt = optim.Adam(gnet.parameters(), lr= learning_rate)
Dis_optimizer_pt = optim.Adam(dnet.parameters(), lr= learning_rate)

training in pytorch

In [36]:
num_img = 0

if not os.path.exists('out_pt/'):
        os.makedirs('out_pt/')

for epoch in range(num_epochs):
    # sample data
    Z_pt = Variable(z_sampler_pt(minibatch_size, g_input_size))
    X_minibatch_pt, _ = mnist.train.next_batch(minibatch_size)

    X_minibatch_pt = Variable(torch.from_numpy(X_minibatch_pt))  # convert from numpy to tensor

    #reset gradient
    Dis_optimizer_pt.zero_grad()

    Gen_pt = gnet(Z_pt)
    Dis_real_pt = dnet(X_minibatch_pt)
    Dis_fake_pt = dnet(Gen_pt)

    Dis_real_loss_pt = loss(Dis_real_pt, ones_label)
    Dis_fake_loss_pt = loss(Dis_fake_pt, zeros_label)
    Dis_loss_pt = Dis_real_loss_pt + Dis_fake_loss_pt

    Dis_loss_pt.backward(retain_graph=True) #error backpropagation
    Dis_optimizer_pt.step() #make an update step
    
    #reset gradient
    Gen_optimizer_pt.zero_grad()

    Gen_loss_pt = loss(Dis_fake_pt, ones_label)

    Gen_loss_pt.backward(retain_graph=True)
    Gen_optimizer_pt.step()


    # we save images generated by the generator every 1000 epochs
    if epoch % 1000 == 0:
        print(
            'Epoch-{}; dis_loss_pt: {}; gen_loss_pt: {}'.format(epoch, Dis_loss_pt.data.numpy(), Gen_loss_pt.data.numpy()))

        samples_pt = gnet(Z_pt).data.numpy()[:25]

        fig = plot(samples_pt)

        plt.savefig('out_pt/{}.png'.format(str(num_img).zfill(3)), bbox_inches='tight')
        num_img += 1
        plt.close(fig)


  "Please ensure they have the same size.".format(target.size(), input.size()))


Epoch-0; dis_loss_pt: [  7.49839160e-07]; gen_loss_pt: [ 27.06110191]
Epoch-1000; dis_loss_pt: [  1.38491089e-06]; gen_loss_pt: [ 27.215868]


KeyboardInterrupt: 