<a href="https://colab.research.google.com/github/ArghyaPal/Adversarial_Data_Programming/blob/master/Adversarial_Data_Programming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Adversarial Data Programming
Paucity of large curated hand-labeled training data forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label, given a set of weak labeling functions. This notebook, we'll show the validation on MNIST dataset. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.

**TL;DR:**<br>
Learning the parameters of Joint Distribution, i.e. P(Χ, y) and Χ: image, y: label, using Distant Supervision Signals. In this work, we seek to learn the parameters of the Joint Distribution P(Χ, LFB_(L, Theta)(X)). The LFB_(L, Theta) (.) acts as a proxy to the real label y. 

In [1]:
# Please run it once during a runtime
!pip install kmeans-pytorch



In [2]:
import os
import torch
import torchvision
import torch.nn as nn
from torchvision import transforms
from torchvision.utils import save_image
from torch.autograd import Variable
import matplotlib.pyplot as plt
import pylab
import numpy as np
from kmeans_pytorch import kmeans, kmeans_predict

In [3]:
# Decide which device we want to run on
ngpu = 1
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

In [4]:
'''
Hyperparameters used in this projects
'''
# The noise dimension for the Generator, i.e. the z dimension, sampled from a Gaussian Distribution
latent_size = 64
# The network layer dimension of Generator and Discriminator
hidden_size = 256
# The generated image size/image size. MNIST size 28 X 28 = 784
image_size = 784
# Number of epochs
num_epochs = 300
# A mini-batch size
batch_size = 40
# Adam learning rate
lr = 0.0001

# Number of classes
nclass = 10          # For MNIST dataset, please change it for other dataset

### Please mention the number of Labeleing Functions here

In [5]:
theta = 2

### Dataset

In [6]:
sample_dir = 'samples'
save_dir = 'save'

# Create a directory if not exists
if not os.path.exists(sample_dir):
    os.makedirs(sample_dir)

if not os.path.exists(save_dir):
    os.makedirs(save_dir)

# Image processing
transform = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize(mean=(0.5),   # 3 for RGB channels
                                     std=(0.5))])

# MNIST dataset
mnist = torchvision.datasets.MNIST(root='./data/',
                                   train=True,
                                   transform=transform,
                                   download=True)

# Data loader
data_loader = torch.utils.data.DataLoader(dataset=mnist,
                                          batch_size=batch_size, 
                                          shuffle=True)

In [7]:

import torch
import numpy as np

from math import pi
from scipy.special import logsumexp


class GaussianMixture(torch.nn.Module):
    """
    Fits a mixture of k=1,..,K Gaussians to the input data (K is supplied via n_components). Input tensors are expected to be flat with dimensions (n: number of samples, d: number of features).
    The model then extends them to (n, 1, d).
    The model parametrization (mu, sigma) is stored as (1, k, d), and probabilities are shaped (n, k, 1) if they relate to an individual sample, or (1, k, 1) if they assign membership probabilities to one of the mixture components.
    """
    def __init__(self, n_components, n_features, mu_init=None, var_init=None, eps=1.e-6):
        """
        Initializes the model and brings all tensors into their required shape.
        The class expects data to be fed as a flat tensor in (n, d).
        The class owns:
            x:              torch.Tensor (n, 1, d)
            mu:             torch.Tensor (1, k, d)
            var:            torch.Tensor (1, k, d)
            pi:             torch.Tensor (1, k, 1)
            eps:            float
            n_components:   int
            n_features:     int
            log_likelihood: float
        args:
            n_components:   int
            n_features:     int
        options:
            mu_init:        torch.Tensor (1, k, d)
            var_init:       torch.Tensor (1, k, d)
            eps:            float
        """
        super(GaussianMixture, self).__init__()

        self.n_components = n_components
        self.n_features = n_features

        self.mu_init = mu_init
        self.var_init = var_init
        self.eps = eps

        self.log_likelihood = -np.inf

        self._init_params()


    def _init_params(self):
        if self.mu_init is not None:
            assert self.mu_init.size() == (1, self.n_components, self.n_features), "Input mu_init does not have required tensor dimensions (1, %i, %i)" % (self.n_components, self.n_features)
            # (1, k, d)
            self.mu = torch.nn.Parameter(self.mu_init, requires_grad=False)
        else:
            self.mu = torch.nn.Parameter(torch.randn(1, self.n_components, self.n_features), requires_grad=False)

        if self.var_init is not None:
            assert self.var_init.size() == (1, self.n_components, self.n_features), "Input var_init does not have required tensor dimensions (1, %i, %i)" % (self.n_components, self.n_features)
            # (1, k, d)
            self.var = torch.nn.Parameter(self.var_init, requires_grad=False)
        else:
            self.var = torch.nn.Parameter(torch.ones(1, self.n_components, self.n_features), requires_grad=False)

        # (1, k, 1)
        self.pi = torch.nn.Parameter(torch.Tensor(1, self.n_components, 1), requires_grad=False).fill_(1./self.n_components)

        self.params_fitted = False


    def check_size(self, x):
        if len(x.size()) == 2:
            # (n, d) --> (n, 1, d)
            x = x.unsqueeze(1)

        return x


    def bic(self, x):
        """
        Bayesian information criterion for a batch of samples.
        args:
            x:      torch.Tensor (n, d) or (n, 1, d)
        returns:
            bic:    float
        """
        x = self.check_size(x)
        n = x.shape[0]

        # Free parameters for covariance, means and mixture components
        free_params = self.n_features * self.n_components + self.n_features + self.n_components - 1

        bic = -2. * self.__score(x, sum_data=False).mean() * n + free_params * np.log(n)

        return bic


    def fit(self, x, delta=1e-3, n_iter=100, warm_start=False):
        """
        Fits model to the data.
        args:
            x:          torch.Tensor (n, d) or (n, k, d)
        options:
            delta:      float
            n_iter:     int
            warm_start: bool
        """
        if not warm_start and self.params_fitted:
            self._init_params()

        x = self.check_size(x)

        i = 0
        j = np.inf

        while (i <= n_iter) and (j >= delta):

            log_likelihood_old = self.log_likelihood
            mu_old = self.mu
            var_old = self.var

            self.__em(x)
            self.log_likelihood = self.__score(x)

            if (self.log_likelihood.abs() == float("Inf")) or (self.log_likelihood == float("nan")):
                # When the log-likelihood assumes inane values, reinitialize model
                self.__init__(self.n_components,
                    self.n_features,
                    mu_init=self.mu_init,
                    var_init=self.var_init,
                    eps=self.eps)

            i += 1
            j = self.log_likelihood - log_likelihood_old

            if j <= delta:
                # When score decreases, revert to old parameters
                self.__update_mu(mu_old)
                self.__update_var(var_old)

        self.params_fitted = True


    def predict(self, x, probs=False):
        """
        Assigns input data to one of the mixture components by evaluating the likelihood under each.
        If probs=True returns normalized probabilities of class membership.
        args:
            x:          torch.Tensor (n, d) or (n, 1, d)
            probs:      bool
        returns:
            p_k:        torch.Tensor (n, k)
            (or)
            y:          torch.LongTensor (n)
        """
        x = self.check_size(x)

        weighted_log_prob = self._estimate_log_prob(x) + torch.log(self.pi)

        if probs:
            p_k = torch.exp(weighted_log_prob)
            return torch.squeeze(p_k / (p_k.sum(1, keepdim=True)))
        else:
            return torch.squeeze(torch.max(weighted_log_prob, 1)[1].type(torch.LongTensor))


    def predict_proba(self, x):
        """
        Returns normalized probabilities of class membership.
        args:
            x:          torch.Tensor (n, d) or (n, 1, d)
        returns:
            y:          torch.LongTensor (n)
        """
        return self.predict(x, probs=True)


    def score_samples(self, x):
        """
        Computes log-likelihood of samples under the current model.
        args:
            x:          torch.Tensor (n, d) or (n, 1, d)
        returns:
            score:      torch.LongTensor (n)
        """
        x = self.check_size(x)

        score = self.__score(x, sum_data=False)
        return score


    def _estimate_log_prob(self, x):
        """
        Returns a tensor with dimensions (n, k, 1), which indicates the log-likelihood that samples belong to the k-th Gaussian.
        args:
            x:            torch.Tensor (n, d) or (n, 1, d)
        returns:
            log_prob:     torch.Tensor (n, k, 1)
        """
        x = self.check_size(x)

        mu = self.mu
        prec = torch.rsqrt(self.var)

        log_p = torch.sum((mu * mu + x * x - 2 * x * mu) * (prec ** 2), dim=2, keepdim=True)
        log_det = torch.sum(torch.log(prec), dim=2, keepdim=True)

        return -.5 * (self.n_features * np.log(2. * pi) + log_p) + log_det


    def _e_step(self, x):
        """
        Computes log-responses that indicate the (logarithmic) posterior belief (sometimes called responsibilities) that a data point was generated by one of the k mixture components.
        Also returns the mean of the mean of the logarithms of the probabilities (as is done in sklearn).
        This is the so-called expectation step of the EM-algorithm.
        args:
            x:              torch.Tensor (n,d) or (n, 1, d)
        returns:
            log_prob_norm:  torch.Tensor (1)
            log_resp:       torch.Tensor (n, k, 1)
        """
        x = self.check_size(x)

        weighted_log_prob = self._estimate_log_prob(x) + torch.log(self.pi)

        log_prob_norm = torch.logsumexp(weighted_log_prob, dim=1, keepdim=True)
        log_resp = weighted_log_prob - log_prob_norm

        return torch.mean(log_prob_norm), log_resp


    def _m_step(self, x, log_resp):
        """
        From the log-probabilities, computes new parameters pi, mu, var (that maximize the log-likelihood). This is the maximization step of the EM-algorithm.
        args:
            x:          torch.Tensor (n, d) or (n, 1, d)
            log_resp:   torch.Tensor (n, k, 1)
        returns:
            pi:         torch.Tensor (1, k, 1)
            mu:         torch.Tensor (1, k, d)
            var:        torch.Tensor (1, k, d)
        """
        x = self.check_size(x)

        resp = torch.exp(log_resp)

        pi = torch.sum(resp, dim=0, keepdim=True) + self.eps
        mu = torch.sum(resp * x, dim=0, keepdim=True) / pi

        x2 = (resp * x * x).sum(0, keepdim=True) / pi
        mu2 = mu * mu
        xmu = (resp * mu * x).sum(0, keepdim=True) / pi
        var = x2 - 2 * xmu + mu2 + self.eps

        pi = pi / x.shape[0]

        return pi, mu, var


    def __em(self, x):
        """
        Performs one iteration of the expectation-maximization algorithm by calling the respective subroutines.
        args:
            x:          torch.Tensor (n, 1, d)
        """
        _, log_resp = self._e_step(x)
        pi, mu, var = self._m_step(x, log_resp)

        self.__update_pi(pi)
        self.__update_mu(mu)
        self.__update_var(var)


    def __score(self, x, sum_data=True):
        """
        Computes the log-likelihood of the data under the model.
        args:
            x:                  torch.Tensor (n, 1, d)
            sum_data:           bool
        returns:
            score:              torch.Tensor (1)
            (or)
            per_sample_score:   torch.Tensor (n)

        """
        weighted_log_prob = self._estimate_log_prob(x) + torch.log(self.pi)
        per_sample_score = torch.logsumexp(weighted_log_prob, dim=1)

        if sum_data:
            return per_sample_score.sum()
        else:
            return torch.squeeze(per_sample_score)


    def __update_mu(self, mu):
        """
        Updates mean to the provided value.
        args:
            mu:         torch.FloatTensor
        """

        assert mu.size() in [(self.n_components, self.n_features), (1, self.n_components, self.n_features)], "Input mu does not have required tensor dimensions (%i, %i) or (1, %i, %i)" % (self.n_components, self.n_features, self.n_components, self.n_features)

        if mu.size() == (self.n_components, self.n_features):
            self.mu = mu.unsqueeze(0)
        elif mu.size() == (1, self.n_components, self.n_features):
            self.mu.data = mu


    def __update_var(self, var):
        """
        Updates variance to the provided value.
        args:
            var:        torch.FloatTensor
        """

        assert var.size() in [(self.n_components, self.n_features), (1, self.n_components, self.n_features)], "Input var does not have required tensor dimensions (%i, %i) or (1, %i, %i)" % (self.n_components, self.n_features, self.n_components, self.n_features)

        if var.size() == (self.n_components, self.n_features):
            self.var = var.unsqueeze(0)
        elif var.size() == (1, self.n_components, self.n_features):
            self.var.data = var


    def __update_pi(self, pi):
        """
        Updates pi to the provided value.
        args:
            pi:         torch.FloatTensor
        """

        assert pi.size() in [(1, self.n_components, 1)], "Input pi does not have required tensor dimensions (%i, %i, %i)" % (1, self.n_components, 1)

        self.pi.data = pi

### ADP Labeling Function Block $LFB(y|X, \Theta)$

In [8]:
    """
    The labeling functions infer label of an image independently 
    (i.e. independent decision assumption) and the parameter gives relative 
    weight to each of the labeling functions based on their correctness of 
    inferred label for true class y.

    Parameters
    ----------
    X
        Batch of images, torch.Tensor
    theta
        Batch of relative accuracies are weights given to labeling functions 
        based on whether their outputs agree with true class label y of an 
        image X, torch.Tensor
    Returns
    -------
    theta * y
        Batch of approximated labels (in this formulation), torch.Tensor
    """
def lf1(fake_images, theta_1): 
  # fake_images is a batch of generated images from the G_image
  cluster_centers = torch.load('cluster_centers.pt') # please set the path 
                                                                            
  # we use k-means (unsupervised) clustering as one labeling function
  y1 = kmeans_predict(fake_images, cluster_centers, 'euclidean', device=device)
  y1 = y1.view(-1,1).cuda()
  theta_1 = torch.reshape(theta_1, (40, 1))
  return torch.mul(theta_1,y1)

def lf2(fake_images, theta_2):
  model = torch.load('gmm.pt')
  y2 = model.predict(fake_images)
  y2 = y2.view(-1,1).cuda()
  theta_2 = torch.reshape(theta_2, (40, 1))
  return torch.mul(theta_2,y2)

### ADP Generator $G(X, \Theta|z)$

In [9]:
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.G_common = nn.Sequential(
                              nn.Linear(latent_size, hidden_size),
                              nn.ReLU(),
                              nn.Linear(hidden_size, hidden_size),
                              nn.ReLU())
        self.G_image = nn.Sequential(
                              nn.Linear(hidden_size, image_size),
                              nn.Tanh())
        self.G_parameter = nn.Sequential(
                              nn.Linear(hidden_size, theta),
                              # We use softmax to make the theta to be in the 
                              # range [0,1]
                              nn.Softmax(dim=1))
    """
    Parameters
    ----------
    z
        Batch of noise vector, torch.Tensor
    Returns
    -------
    X
        Batch of images, torch.Tensor
    theta
        Batch of a Relative accuracies, torch.Tensor
    """
    def forward(self, input):
        common = self.G_common(input)
        image = self.G_image(common)
        theta = self.G_parameter(common)
        return image, theta

### ADP Discriminator D(X, y)


In [10]:
  '''
  The discriminator class takes a batch of either real or generated image and 
  inferred label from Labeling Functions Block LFB(.) pairs as input.
  And, maps that to a probability score to estimate the aforementioned 
  likelihood of the image-label pair as real labeled-image
  '''
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        '''
        The D_image takes a batch of images
        '''
        self.D_image = nn.Sequential(
                              nn.Linear(image_size, hidden_size),
                              nn.LeakyReLU(0.2))
        '''
        The D_label takes a batch of labels
        '''
        self.D_label = nn.Sequential(
                              nn.Linear(10, hidden_size),
                              nn.LeakyReLU(0.2))
        '''
        The D_common adds the D_image and D_label branches and provides 
        '''
        self.D_common = nn.Sequential(
                              nn.Linear(hidden_size*2, hidden_size),
                              nn.LeakyReLU(0.2),
                              nn.Linear(hidden_size, 1),
                              # This will ensure the output to be in the range of [0,1]
                              nn.Sigmoid())
    """Estimates the probability score as a likelihood of the input image-label pair
    as real image-label pair.
    This function provides a batch of probability scores, i.e. P(X, y), stored as a 
    vector P. We will use that vector to give loss signal to the Discriminator D(.) 
    and Generator G(.).

    Parameters
    ----------
    X
        Batch of images, torch.Tensor
    y
        Batch of labels, torch.Tensor
    Returns
    -------
    result
        Batch of probabilities, torch.Tensor
    """
    def forward(self, image, label):
        # image is the batch of images given to the D_image branch
        image = self.D_image(image)
        # label is the corresponding labels given to the D_label branch
        label = self.D_label(label)
        # common is the branch that concates D_image and D_label
        common = torch.cat([image, label], 1)
        # result is the batch of probability score
        result = self.D_common(common)
        return result

In [11]:
# Device setting
G = Generator().cuda()
D = Discriminator().cuda()

# Binary cross entropy loss and optimizer
criterion = nn.BCELoss()
d_optimizer = torch.optim.Adam(D.parameters(), lr=lr)
g_optimizer = torch.optim.Adam(G.parameters(), lr=lr)

def denorm(x):
    out = (x + 1) / 2
    return out.clamp(0, 1)

def reset_grad():
    d_optimizer.zero_grad()
    g_optimizer.zero_grad()

In [12]:
# Statistics to be saved
d_losses = np.zeros(num_epochs)
g_losses = np.zeros(num_epochs)
real_scores = np.zeros(num_epochs)
fake_scores = np.zeros(num_epochs)

In [13]:
# We will start Adversarial Data Programming training
total_step = len(data_loader)

# To change num_epochhs please visit the hyperparameters section
for epoch in range(num_epochs):
  # We will be training discriminator one time and then the generator one time
  # using the for loop
    for i, (images, original_labels) in enumerate(data_loader):

        # Reading real MNIST images
        images = images.view(batch_size, -1).cuda()
        images = Variable(images)
        # Reading real MNIST labeles of the corresponding images
        original_labels = torch.nn.functional.one_hot(original_labels, num_classes=10)
        # We are making it in one-hot encoding
        original_labels = original_labels.view(batch_size, -1).cuda()
        original_labels = Variable(original_labels).float()
        

        # Labels for the discriminator. Real Labels are 1 and Fake Labels are 0
        real_labels = torch.ones(batch_size, 1).cuda()
        real_labels = Variable(real_labels)
        fake_labels = torch.zeros(batch_size, 1).cuda()
        fake_labels = Variable(fake_labels)

        #######################################################################
        '''
        Below line of codes will train the discriminator.
        The discriminator will take: (i) a batch of images; and (ii) a batch of labels
        The object, i.e. outputs = D(images, original_labels), of the Discriminator class
        We compute BCE_Loss using real images where:
        BCE_Loss(x, y): - y * log(D(x)) - (1-y) * log(1 - D(x))
        Second term of the loss is always zero since real_labels == 1
        '''
        outputs = D(images, original_labels)
        d_loss_real = criterion(outputs, real_labels)
        real_score = outputs
        #######################################################################

        #######################################################################
        '''
        We sample z from the Isotropic Gaussian distribution
        Compute BCELoss using fake images
        First term of the loss is always zero since fake_labels == 0
        '''
        z = torch.randn(batch_size, latent_size).cuda()
        z = Variable(z)
        # We put that in the G_common
        fake_images, theta = G(z)

        # generated labels from labeling function (here, we are showing one labeling function)
        gen_labels_1 = lf1(fake_images, theta[:,0])
        #gen_labels_1 = torch.nn.functional.one_hot(gen_labels_1.to(torch.int64), num_classes=nclass).float()
        gen_labels_2 = lf2(fake_images, theta[:,1])
        #gen_labels_2 = torch.nn.functional.one_hot(gen_labels_2.to(torch.int64), num_classes=nclass).float()
    

        gen_labels = gen_labels_1 + gen_labels_2
        gen_labels = torch.nn.functional.one_hot(gen_labels.to(torch.int64), num_classes=nclass).float()
        
        gen_labels = gen_labels.view(batch_size, 10)
        
        # 
        outputs = D(fake_images, gen_labels)
        d_loss_fake = criterion(outputs, fake_labels)
        fake_score = outputs
        
        # Backprop and optimize
        # If D is trained so well, then don't update
        d_loss = d_loss_real + d_loss_fake
        reset_grad()
        d_loss.backward()
        d_optimizer.step()
        #######################################################################

        '''
        Train generator 
        '''

        # Compute loss with fake images
        z = torch.randn(batch_size, latent_size).cuda()
        z = Variable(z)
        fake_images, theta = G(z)
        # generated labels from labeling function (here, we are showing one labeling function)
        gen_labels_1 = lf1(fake_images, theta[:,0])
        #gen_labels_1 = torch.nn.functional.one_hot(gen_labels_1.to(torch.int64), num_classes=nclass).float()
        gen_labels_2 = lf2(fake_images, theta[:,1])
        #gen_labels_2 = torch.nn.functional.one_hot(gen_labels_2.to(torch.int64), num_classes=nclass).float()
    

        gen_labels = gen_labels_1 + gen_labels_2
        gen_labels = torch.nn.functional.one_hot(gen_labels.to(torch.int64), num_classes=nclass).float()
        
        gen_labels = gen_labels.view(batch_size, 10)
        
        outputs = D(fake_images, gen_labels)
        
        # We train G to maximize log(D(G(z)) instead of minimizing log(1-D(G(z)))
        # For the reason, see the last paragraph of section 3. https://arxiv.org/pdf/1406.2661.pdf
        g_loss = criterion(outputs, real_labels)
        
        # Backprop and optimize
        # if G is trained so well, then don't update
        reset_grad()
        g_loss.backward()
        g_optimizer.step()


         #=================================================================== #
        #                          Update Statistics                          #
        # =================================================================== #
        d_losses[epoch] = d_losses[epoch]*(i/(i+1.)) + d_loss.data*(1./(i+1.))
        g_losses[epoch] = g_losses[epoch]*(i/(i+1.)) + g_loss.data*(1./(i+1.))
        real_scores[epoch] = real_scores[epoch]*(i/(i+1.)) + real_score.mean().data*(1./(i+1.))
        fake_scores[epoch] = fake_scores[epoch]*(i/(i+1.)) + fake_score.mean().data*(1./(i+1.))


        if (i+1) % 200 == 0:
            print('Epoch [{}/{}], Step [{}/{}], d_loss: {:.4f}, g_loss: {:.4f}, D(x): {:.2f}, D(G(z)): {:.2f}' 
                  .format(epoch, num_epochs, i+1, total_step, d_loss.data, g_loss.data, 
                          real_score.mean().data, fake_score.mean().data))
            
    # Save real images
    if (epoch+1) == 1:
        images = images.view(images.size(0), 1, 28, 28)
        save_image(denorm(images.data), os.path.join(sample_dir, 'real_images.png'))
    
    # Save sampled images
    fake_images = fake_images.view(fake_images.size(0), 1, 28, 28)
    print(gen_labels)
    save_image(denorm(fake_images.data), os.path.join(sample_dir, 'fake_images-{}.png'.format(epoch+1)))
    
    # Save and plot Statistics
    np.save(os.path.join(save_dir, 'd_losses.npy'), d_losses)
    np.save(os.path.join(save_dir, 'g_losses.npy'), g_losses)
    np.save(os.path.join(save_dir, 'fake_scores.npy'), fake_scores)
    np.save(os.path.join(save_dir, 'real_scores.npy'), real_scores)
    
    plt.figure()
    pylab.xlim(0, num_epochs + 1)
    plt.plot(range(1, num_epochs + 1), d_losses, label='d loss')
    plt.plot(range(1, num_epochs + 1), g_losses, label='g loss')    
    plt.legend()
    plt.savefig(os.path.join(save_dir, 'loss.pdf'))
    plt.close()

    plt.figure()
    pylab.xlim(0, num_epochs + 1)
    pylab.ylim(0, 1)
    plt.plot(range(1, num_epochs + 1), fake_scores, label='fake score')
    plt.plot(range(1, num_epochs + 1), real_scores, label='real score')    
    plt.legend()
    plt.savefig(os.path.join(save_dir, 'accuracy.pdf'))
    plt.close()

    # Save model at checkpoints
    if (epoch+1) % 50 == 0:
        torch.save(G.state_dict(), os.path.join(save_dir, 'G--{}.ckpt'.format(epoch+1)))
        torch.save(D.state_dict(), os.path.join(save_dir, 'D--{}.ckpt'.format(epoch+1)))

# Save the model checkpoints 
torch.save(G.state_dict(), 'G.ckpt')
torch.save(D.state_dict(), 'D.ckpt')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on cuda:0..
predicting on c

KeyboardInterrupt: ignored