# Generating Realistic Handwritten Digit Images using DCGAN

My mini-project aims at generating realistic handwritten digit images with the Deep Convolutional Generative Adversarial Network (DCGAN) using the MNIST dataset. The project involved designing and training both the Generator and Discriminator models using PyTorch deep learning framework.

## Deep Convolutional GAN (DCGAN)
DCGAN stands for "Deep Convolutional Generative Adversarial Network." It is a type of artificial intelligence model used in the field of machine learning and computer vision. DCGANs consist of two neural networks: a generator and a discriminator.

## Importing Pytorch Framework and other Libraries

I have imported some useful packages and data to create GAN model.

PyTorch is an open-source deep learning framework that provides a seamless experience for building and training neural networks.

The CUDA support is where the real magic happens! PyTorch with CUDA enables high-speed computations on NVIDIA GPUs, accelerating DCGAN training and generation processes. The parallel processing capabilities of CUDA significantly boost the performance of DCGAN, making it faster and more efficient in creating realistic and artistic images.


In [1]:
import torch
from torch import nn
from tqdm.auto import tqdm
from torchvision import transforms
from torchvision.datasets import MNIST
from torchvision.utils import make_grid
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
torch.manual_seed(0) # Set for testing purposes

torch.__version__


'2.0.1+cu118'

## Building the show_tensor_images() function

I have created a visualizer function to see the images the GAN will create.

It defines a function show_tensor_images to visualize a tensor of images useful for displaying the generated images in the context of DCGAN.

In [2]:
def show_tensor_images(image_tensor, num_images=25, size=(1, 28, 28)):
    '''
    Function prints the images in an uniform grid.
    '''
    image_tensor = (image_tensor + 1) / 2
    image_unflat = image_tensor.detach().cpu()
    image_grid = make_grid(image_unflat[:num_images], nrow=5)
    plt.imshow(image_grid.permute(1, 2, 0).squeeze())
    plt.show()


## Code: Defining the Generator

The Generator's role is to take random noise as input and generate fake images that resemble the real images from the dataset.

Generator Class:

Inherits from nn.Module.
Parameters : z_dim (default: 10), im_chan (default: 1 for black-and-white images), hidden_dim (default: 64).

Generator Architecture:
Uses nn.Sequential for sequential layers.
Contains blocks of transposed convolutions, batch normalization, and ReLU activations.

Final layer uses a Tanh activation function.
get_noise function generates random noise vectors of shape (n_samples, z_dim) for training the generator.


In [3]:
class Generator(nn.Module):
    '''
    Generator Class
    '''
    def __init__(self, z_dim=10, im_chan=1, hidden_dim=64):
        super(Generator, self).__init__()
        self.z_dim = z_dim
        # Build the neural network
        self.gen = nn.Sequential(
            self.make_gen_block(z_dim, hidden_dim * 4),
            self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1),
            self.make_gen_block(hidden_dim * 2, hidden_dim),
            self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True),
        )

    def make_gen_block(self, input_channels, output_channels, kernel_size=3, stride=2, final_layer=False):

        # Build the neural block
        if not final_layer:
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.BatchNorm2d(output_channels),
                nn.ReLU(inplace=True)
            )
        else: # Final Layer
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.Tanh()
            )

    def unsqueeze_noise(self, noise):

        return noise.view(len(noise), self.z_dim, 1, 1)

    def forward(self, noise):

        x = self.unsqueeze_noise(noise)
        return self.gen(x)

def get_noise(n_samples, z_dim, device='cpu'):

    return torch.randn(n_samples, z_dim, device=device)

## Testing the Generator Block

Initialize the Generator (gen).

Generate random noise for testing (test_hidden_noise, test_final_noise, test_gen_noise).

Test the hidden block, the final block and finally the entire generator.
The test results are not included in this response as the code is intended to verify the correctness of the make_gen_block() function by checking if it works with different configurations.

The test ensures the generator block creates the expected output and functions correctly within the DCGAN architecture.


If all the assertions pass without raising any errors, the unit tests will print "Success!" indicating that the make_gen_block() function is working correctly.


In [4]:

'''
Testing make_gen_block() function
'''
gen = Generator()
num_test = 100

# Test the hidden block
test_hidden_noise = get_noise(num_test, gen.z_dim)
test_hidden_block = gen.make_gen_block(10, 20, kernel_size=4, stride=1)
test_uns_noise = gen.unsqueeze_noise(test_hidden_noise)
hidden_output = test_hidden_block(test_uns_noise)

# Check that it works with other strides
test_hidden_block_stride = gen.make_gen_block(20, 20, kernel_size=4, stride=2)

test_final_noise = get_noise(num_test, gen.z_dim) * 20
test_final_block = gen.make_gen_block(10, 20, final_layer=True)
test_final_uns_noise = gen.unsqueeze_noise(test_final_noise)
final_output = test_final_block(test_final_uns_noise)

# Test the whole thing:
test_gen_noise = get_noise(num_test, gen.z_dim)
test_uns_gen_noise = gen.unsqueeze_noise(test_gen_noise)
gen_output = gen(test_uns_gen_noise)

Here's the test for your generator block:

In [5]:
# UNIT TESTS
assert tuple(hidden_output.shape) == (num_test, 20, 4, 4)
assert hidden_output.max() > 1
assert hidden_output.min() == 0
assert hidden_output.std() > 0.2
assert hidden_output.std() < 1
assert hidden_output.std() > 0.5

assert tuple(test_hidden_block_stride(hidden_output).shape) == (num_test, 20, 10, 10)

assert final_output.max().item() == 1
assert final_output.min().item() == -1

assert tuple(gen_output.shape) == (num_test, 1, 28, 28)
assert gen_output.std() > 0.5
assert gen_output.std() < 0.8

print("Successfully TESTED!")

Successfully TESTED!


## Discriminator Block

Discriminator Class:

Inherits from nn.Module.
Parameters : im_chan(default: 1 for black-and-white images), hidden_dim (default: 16).

Discriminator Architecture:

Uses nn.Sequential for sequential layers.
Contains blocks of convolutions, batch normalization, and LeakyReLU activations.
make_disc_block: Returns a sequence of operations for a discriminator block in DCGAN.

forward: Performs a forward pass to classify images as real or fake.
The provided code creates the backbone of the discriminator model, ready to be trained with the generator in a DCGAN setup to distinguish real and fake images effectively.



In [6]:

class Discriminator(nn.Module):
    '''
    Discriminator Class
    '''
    def __init__(self, im_chan=1, hidden_dim=16):
        super(Discriminator, self).__init__()
        self.disc = nn.Sequential(
            self.make_disc_block(im_chan, hidden_dim),
            self.make_disc_block(hidden_dim, hidden_dim * 2),
            self.make_disc_block(hidden_dim * 2, 1, final_layer=True),
        )

    def make_disc_block(self, input_channels, output_channels, kernel_size=4, stride=2, final_layer=False):


        # Build the neural block
        if not final_layer:
            return nn.Sequential(
                nn.Conv2d(input_channels, output_channels, kernel_size, stride),
                nn.BatchNorm2d(output_channels),
                nn.LeakyReLU(0.2, inplace=True)
            )
        else: # Final Layer
            return nn.Sequential(
                nn.Conv2d(input_channels, output_channels, kernel_size, stride)
            )

    def forward(self, image):

        disc_pred = self.disc(image)
        return disc_pred.view(len(disc_pred), -1)

## Testing the Discriminator Block

Initialize the Generator and Discriminator (gen, disc).

Generate random noise and use the Generator to create test images (test_images).
Testing the hidden block, the final block and the entire discriminator.

The test results are not included in this response as the code is intended to verify the correctness of the make_disc_block() function by checking if it creates the expected output for different configurations.

The tests ensure the discriminator block operates correctly within the DCGAN architecture for classification.

If all the assertions pass without raising any errors, the unit tests will print "Success!".


In [7]:

'''
Testing make_disc_block() function
'''
num_test = 100

gen = Generator()
disc = Discriminator()
test_images = gen(get_noise(num_test, gen.z_dim))

# Test the hidden block
test_hidden_block = disc.make_disc_block(1, 5, kernel_size=6, stride=3)
hidden_output = test_hidden_block(test_images)

# Test the final block
test_final_block = disc.make_disc_block(1, 10, kernel_size=2, stride=5, final_layer=True)
final_output = test_final_block(test_images)

# Test the whole thing:
disc_output = disc(test_images)

In [8]:
# Test the hidden block
assert tuple(hidden_output.shape) == (num_test, 5, 8, 8)
# Because of the LeakyReLU slope
assert -hidden_output.min() / hidden_output.max() > 0.15
assert -hidden_output.min() / hidden_output.max() < 0.25
assert hidden_output.std() > 0.5
assert hidden_output.std() < 1

# Test the final block

assert tuple(final_output.shape) == (num_test, 10, 6, 6)
assert final_output.max() > 1.0
assert final_output.min() < -1.0
assert final_output.std() > 0.3
assert final_output.std() < 0.6

# Test the whole thing:

assert tuple(disc_output.shape) == (num_test, 1)
assert disc_output.std() > 0.25
assert disc_output.std() < 0.5
print("Successfully tested Discriinator!")

Successfully tested Discriinator!


## Training Parameters and Data Setup

**Parameters**:
criterion: The loss criterion nn.BCEWithLogitsLoss() is used, which combines a Sigmoid activation and binary cross-entropy loss.

z_dim(default: 64), display_step(default: 500), batch_size(default: 128), lr (default: 0.0002), beta_1 (default: 0.5), beta_2(default: 0.999), device('cuda' if available, otherwise 'cpu').


**Data Setup**:

1. transform: A composition of transformations applied to the images. It converts the image values to be between -1 and 1.

2. dataloader: Creates a DataLoader for the MNIST dataset and shuffle the data during training.


With these parameters and data setup, the DCGAN model is ready for training using the MNIST dataset.


In [9]:
criterion = nn.BCEWithLogitsLoss()
z_dim = 64
display_step = 500
batch_size = 128
# A learning rate of 0.0002
lr = 0.0002

# These parameters control the optimizer's momentum
beta_1 = 0.5
beta_2 = 0.999
device = 'cuda'

# tranform the image values to be between -1 and 1 (the range of the tanh activation)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,)),
])

dataloader = DataLoader(
    MNIST('.', download=True, transform=transform),
    batch_size=batch_size,
    shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 145691567.53it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 122980399.82it/s]


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 38472488.05it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 4546665.58it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



## Initializing the Generator and Discriminator for DCGAN Training

**Generator Initialization**:
The Generator model (gen) is initialized using the Generator class with the given z_dim (dimension of the noise vector) and weights are initialized to the normal distribution.

**Discriminator Initialization**:
The Discriminator model (disc) is initialized using the Discriminator class and weights are initialized to the normal distribution.

**Optimizer Setup**:
With these initializations, the Generator and Discriminator models are ready to be trained using the DCGAN architecture on the MNIST dataset.


In [10]:
gen = Generator(z_dim).to(device)
gen_opt = torch.optim.Adam(gen.parameters(), lr=lr, betas=(beta_1, beta_2))
disc = Discriminator().to(device)
disc_opt = torch.optim.Adam(disc.parameters(), lr=lr, betas=(beta_1, beta_2))

def weights_init(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
    if isinstance(m, nn.BatchNorm2d):
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
        torch.nn.init.constant_(m.bias, 0)

gen = gen.apply(weights_init)
disc = disc.apply(weights_init)

## DCGAN Training Loop

**Training Loop**:
The training loop runs for n_epochs (default: 50) and for each epoch, the dataloader returns batches of real images.
The training loop consists of two steps: updating the Discriminator and updating the Generator.

**Updating the Discriminator**:

**Updating the Generator**:

**Visualization**: After every display_step iterations, the Generator and Discriminator losses are displayed along with visualizations of the fake and real images using the show_tensor_images() function.
This training loop iteratively updates the Generator and Discriminator parameters based on the generated fake images and real images from the dataset.


In [12]:
# @title Default title text
n_epochs = 50
cur_step = 0
mean_generator_loss = 0
mean_discriminator_loss = 0
for epoch in range(n_epochs):
    # Dataloader returns the batches
    for real, _ in tqdm(dataloader):
        cur_batch_size = len(real)
        real = real.to(device)

        ## Update discriminator ##
        disc_opt.zero_grad()
        fake_noise = get_noise(cur_batch_size, z_dim, device=device)
        fake = gen(fake_noise)
        disc_fake_pred = disc(fake.detach())
        disc_fake_loss = criterion(disc_fake_pred, torch.zeros_like(disc_fake_pred))
        disc_real_pred = disc(real)
        disc_real_loss = criterion(disc_real_pred, torch.ones_like(disc_real_pred))
        disc_loss = (disc_fake_loss + disc_real_loss) / 2

        # Keep track of the average discriminator loss
        mean_discriminator_loss += disc_loss.item() / display_step
        # Update gradients
        disc_loss.backward(retain_graph=True)
        # Update optimizer
        disc_opt.step()

        ## Update generator ##
        gen_opt.zero_grad()
        fake_noise_2 = get_noise(cur_batch_size, z_dim, device=device)
        fake_2 = gen(fake_noise_2)
        disc_fake_pred = disc(fake_2)
        gen_loss = criterion(disc_fake_pred, torch.ones_like(disc_fake_pred))
        gen_loss.backward()
        gen_opt.step()

        # Keep track of the average generator loss
        mean_generator_loss += gen_loss.item() / display_step

        ## Visualization code ##
        if cur_step % display_step == 0 and cur_step > 0:
            print(f"Epoch {epoch}, step {cur_step}: Generator loss: {mean_generator_loss}, discriminator loss: {mean_discriminator_loss}")
            show_tensor_images(fake)
            show_tensor_images(real)
            mean_generator_loss = 0
            mean_discriminator_loss = 0
        cur_step += 1


Output hidden; open in https://colab.research.google.com to view.