<h1 style="text-align:center;font-weight: bold">DCGAN (Deep Convolutional GAN)</h1>
    <h3 style="text-align:left;font-weight: bold">DCGAN is a type of GAN that uses deep convolutional networks for both the generator and discriminator. It was introduced by Radford et al. in 2015 and established architectural guidelines that made training GANs more stable. Key features include using strided convolutions instead of pooling, batch normalization, and using ReLU/LeakyReLU activations.</h3>


# 主要改动说明 (What Changed)

本 notebook 已从混合架构改为**标准 DCGAN 实现**:

## 1. Generator 改动

- **之前**: 使用 ResNet 残差块结构
- **现在**: 标准 DCGAN 生成器，使用 ConvTranspose2d 层
- **架构**: latent(100) → 4×4 → 8×8 → 16×16 → 32×32 → 64×64

## 2. Discriminator 改动

- **之前**: 使用 Spectral Normalization
- **现在**: 标准 DCGAN 判别器，使用 BatchNorm2d
- **架构**: 64×64 → 32×32 → 16×16 → 8×8 → 4×4 → 1

## 3. Loss Function 改动

- **之前**: WGAN-GP loss (Wasserstein 距离 + 梯度惩罚)
- **现在**: Binary Cross Entropy Loss (标准 GAN loss)

## 4. Training Loop 改动

- **之前**: 使用 n_critic 训练策略，有梯度惩罚
- **现在**: 标准 DCGAN 训练，D 和 G 每个 batch 都更新一次

## 5. 参数调整

- Batch size: 16 → 128
- Learning rate: 保持 0.0002 (DCGAN 推荐值)
- Beta1: 0.5 (DCGAN 推荐值)
- Latent dim: 128 → 100 (DCGAN 标准)


In [None]:
import os
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from torchvision.utils import save_image
from torchvision.utils import make_grid
from torch.utils.tensorboard import SummaryWriter
from torchsummary import summary
import datetime
import matplotlib.pyplot as plt

In [None]:
!pip install torchsummary
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
torch.manual_seed(1)

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
batch_size = 128
learning_rate = 0.0002
num_epochs = 200
image_size = 64
latent_dim = 100
ngf = 64  # Number of generator filters
ndf = 64  # Number of discriminator filters
nc = 1  # Number of channels (1 for grayscale, 3 for RGB)

In [None]:
import os
import shutil

# Source directory with your images
source_dir = "/kaggle/input/efficientnet-data/my_label_data/0"
target_dir = "./train/data"

# Create target directory
os.makedirs(target_dir, exist_ok=True)

# Copy images to training directory
# Adjust these paths according to your actual data location
if os.path.exists(source_dir):
    shutil.copytree(source_dir, target_dir, dirs_exist_ok=True)
    print(f"Copied images from {source_dir}")
else:
    print(f"Warning: Source directory {source_dir} not found")

# Count images
image_count = sum(
    1
    for file_name in os.listdir(target_dir)
    if file_name.lower().endswith((".jpg", ".jpeg", ".png", ".bmp", ".tiff", ".tif"))
)

print(f"Total images in training directory: {image_count}")

In [None]:
train_transform = transforms.Compose(
    [
        transforms.Resize((64, 64)),
        transforms.Grayscale(num_output_channels=1),  # Convert to grayscale
        transforms.ToTensor(),
        transforms.Normalize([0.5], [0.5]),  # Normalize to [-1, 1]
    ]
)

train_dataset = datasets.ImageFolder(root="./train", transform=train_transform)
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset, batch_size=batch_size, shuffle=True, num_workers=4
)

print(f"Dataset size: {len(train_dataset)}")
print(f"Number of batches: {len(train_loader)}")

<h1 style="text-align:center;font-weight: bold;">Exploratory Data Analysis</h1>


In [None]:
import torch
import matplotlib.pyplot as plt
from torchvision.utils import make_grid


# 显示图像的函数
def show_images(images, nrow=8, figsize=(10, 10)):
    # 创建图像网格
    fig, ax = plt.subplots(figsize=figsize)
    ax.set_xticks([])  # 隐藏x轴刻度
    ax.set_yticks([])  # 隐藏y轴刻度

    # 将图片网格的大小调整并转置为 (height, width, channels)
    grid_img = make_grid(images, nrow=nrow).permute(1, 2, 0).cpu().numpy()

    # 显示图片
    ax.imshow(grid_img)


# 显示批次图像的函数
def show_batch(dl, n_images=64, nrow=8):
    # 只显示部分图像
    for images, _ in dl:
        # 只取前 n_images 张图像
        images = images[:n_images]
        show_images(images, nrow=nrow)
        break  # 只显示一个批次的图像


# 使用 train_loader 来展示图像
show_batch(train_loader, n_images=64, nrow=8)

In [None]:
# These parameters are now defined in the configuration cell above
print(f"Image channels: {nc}")
print(f"Image size: {image_size}x{image_size}")
print(f"Latent dimension: {latent_dim}")

<h2 style="text-align:center;font-weight: bold;">Initializing Weights</h2>


In [None]:
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find("Conv") != -1:
        torch.nn.init.normal_(m.weight, 0.0, 0.02)
    elif classname.find("BatchNorm") != -1:
        torch.nn.init.normal_(m.weight, 1.0, 0.02)
        torch.nn.init.zeros_(m.bias)

## Key DCGAN Architecture Guidelines

Following the DCGAN paper (Radford et al., 2015):

1. **Replace pooling layers** with strided convolutions (discriminator) and fractional-strided/transposed convolutions (generator)
2. **Use batchnorm** in both the generator and the discriminator
3. **Remove fully connected hidden layers** for deeper architectures
4. **Use ReLU activation** in generator for all layers except the output, which uses Tanh
5. **Use LeakyReLU activation** in the discriminator for all layers
6. **Initialize weights** from a Normal distribution with mean=0, std=0.02


<h2 style="text-align:center;font-weight: bold;">Generator Network</h2>


In [None]:
class Generator(nn.Module):
    """
    DCGAN Generator
    Input: latent vector z of dimension (latent_dim, 1, 1)
    Output: Generated image of size (nc, 64, 64)
    """

    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            # Input is latent_dim x 1 x 1
            nn.ConvTranspose2d(latent_dim, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # State size: (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # State size: (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # State size: (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2, ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # State size: (ngf) x 32 x 32
            nn.ConvTranspose2d(ngf, nc, 4, 2, 1, bias=False),
            nn.Tanh(),
            # State size: (nc) x 64 x 64
        )

    def forward(self, input):
        return self.main(input)

In [None]:
generator = Generator().to(device)
generator.apply(weights_init)
print(generator)

In [None]:
summary(generator, (latent_dim, 1, 1))

<h2 style="text-align:center;font-weight: bold;">Descriminator Network</h2>


In [None]:
class Discriminator(nn.Module):
    """
    DCGAN Discriminator
    Input: Image of size (nc, 64, 64)
    Output: Single scalar value (probability of being real)
    """

    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            # Input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # State size: (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # State size: (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # State size: (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # State size: (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),
            nn.Sigmoid(),
            # State size: 1 x 1 x 1
        )

    def forward(self, input):
        return self.main(input).view(-1, 1).squeeze(1)

In [None]:
discriminator = Discriminator().to(device)
discriminator.apply(weights_init)
print(discriminator)

In [None]:
summary(discriminator, (nc, 64, 64))

In [None]:
# Use Binary Cross Entropy Loss for DCGAN
criterion = nn.BCELoss()

## Loss Functions for DCGAN

In DCGAN, we use Binary Cross Entropy (BCE) loss:

- **Discriminator**: Tries to classify real images as 1 and fake images as 0
- **Generator**: Tries to fool the discriminator by making it classify fake images as 1


## Labels for Training

- Real images are labeled as 1
- Fake (generated) images are labeled as 0


In [None]:
# Fixed noise for visualization
fixed_noise = torch.randn(64, latent_dim, 1, 1, device=device)

# Labels
real_label = 1.0
fake_label = 0.0

In [None]:
# Setup Adam optimizers for both G and D
# Using beta1=0.5 as suggested in DCGAN paper
optimizerD = optim.Adam(
    discriminator.parameters(), lr=learning_rate, betas=(0.5, 0.999)
)
optimizerG = optim.Adam(generator.parameters(), lr=learning_rate, betas=(0.5, 0.999))

<h2 style="text-align:center;font-weight: bold;">Training our network</h2>


In [None]:
import torch
import os
from torchvision.utils import save_image

# Lists to keep track of progress
G_losses = []
D_losses = []
img_list = []
iters = 0

# Create directories for saving results
os.makedirs("./dcgan_weights", exist_ok=True)
os.makedirs("./dcgan_images", exist_ok=True)

print("Starting Training Loop...")
print("-" * 50)

for epoch in range(num_epochs):
    for i, (real_images, _) in enumerate(train_loader):
        ############################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        ###########################
        ## Train with all-real batch
        discriminator.zero_grad()
        real_images = real_images.to(device)
        batch_size = real_images.size(0)
        label = torch.full((batch_size,), real_label, dtype=torch.float, device=device)

        # Forward pass real batch through D
        output = discriminator(real_images)
        # Calculate loss on all-real batch
        errD_real = criterion(output, label)
        # Calculate gradients for D in backward pass
        errD_real.backward()
        D_x = output.mean().item()

        ## Train with all-fake batch
        # Generate batch of latent vectors
        noise = torch.randn(batch_size, latent_dim, 1, 1, device=device)
        # Generate fake image batch with G
        fake = generator(noise)
        label.fill_(fake_label)
        # Classify all fake batch with D
        output = discriminator(fake.detach())
        # Calculate D's loss on the all-fake batch
        errD_fake = criterion(output, label)
        # Calculate the gradients for this batch
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        # Add the gradients from the all-real and all-fake batches
        errD = errD_real + errD_fake
        # Update D
        optimizerD.step()

        ############################
        # (2) Update G network: maximize log(D(G(z)))
        ###########################
        generator.zero_grad()
        label.fill_(real_label)  # fake labels are real for generator cost
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = discriminator(fake)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item()
        # Update G
        optimizerG.step()

        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())

        # Output training stats
        if i % 50 == 0:
            print(
                f"[{epoch}/{num_epochs}][{i}/{len(train_loader)}] "
                f"Loss_D: {errD.item():.4f} Loss_G: {errG.item():.4f} "
                f"D(x): {D_x:.4f} D(G(z)): {D_G_z1:.4f} / {D_G_z2:.4f}"
            )

        iters += 1

    # Check how the generator is doing by saving G's output on fixed_noise
    if (epoch % 10 == 0) or (epoch == num_epochs - 1):
        with torch.no_grad():
            fake = generator(fixed_noise).detach().cpu()
        save_image(
            fake,
            f"./dcgan_images/fake_samples_epoch_{epoch:03d}.png",
            normalize=True,
            nrow=8,
        )

    # Save model checkpoints
    if (epoch % 50 == 0) or (epoch == num_epochs - 1):
        torch.save(
            generator.state_dict(), f"./dcgan_weights/generator_epoch_{epoch}.pth"
        )
        torch.save(
            discriminator.state_dict(),
            f"./dcgan_weights/discriminator_epoch_{epoch}.pth",
        )

print("Training Complete!")

In [None]:
# Plot the training losses
plt.figure(figsize=(10, 5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses, label="G")
plt.plot(D_losses, label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.show()

<h1 style="text-align:center;font-weight: bold">Outputing Results</h1>


In [None]:
def getImagePaths(path):
    """Get all image file paths from a directory"""
    image_names = []
    for dirname, _, filenames in os.walk(path):
        for filename in filenames:
            if filename.lower().endswith((".png", ".jpg", ".jpeg")):
                fullpath = os.path.join(dirname, filename)
                image_names.append(fullpath)
    return sorted(image_names)

In [None]:
import cv2
import math
import matplotlib.pyplot as plt


def display_multiple_img(images_paths):
    # 计算自适应的行列数
    num_images = len(images_paths)
    cols = int(math.ceil(math.sqrt(num_images)))  # 列数 = 根号下的图像数量，四舍五入
    rows = int(math.ceil(num_images / cols))  # 行数 = 图像数量 / 列数，四舍五入

    # 设置图形大小，调整到适合的比例，增加图像的显示大小
    figure, ax = plt.subplots(nrows=rows, ncols=cols, figsize=(cols * 3, rows * 3))

    # 遍历图像路径列表
    for ind, image_path in enumerate(images_paths):
        # 尝试读取并显示图像
        try:
            image = cv2.imread(image_path)  # 读取图像
            if image is None:
                raise ValueError(f"Image at {image_path} could not be loaded.")

            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # 转换为 RGB

            ax.ravel()[ind].imshow(image)  # 显示图像
            ax.ravel()[ind].set_axis_off()  # 隐藏轴
        except Exception as e:
            print(f"Error displaying image at {image_path}: {e}")

    # 隐藏未使用的子图（如果图像少于网格数）
    for i in range(num_images, rows * cols):
        ax.ravel()[i].set_visible(False)

    plt.tight_layout(pad=2.0)  # 增加子图间距
    plt.show()

In [None]:
display_multiple_img(getImagePaths("./dcgan_images"))

In [None]:
# Generate images using the trained generator
generated_images_dir = "./dcgan_generated"
os.makedirs(generated_images_dir, exist_ok=True)

# Load the best generator model
generator_eval = Generator().to(device)
generator_eval.load_state_dict(torch.load("./dcgan_weights/generator_epoch_199.pth"))
generator_eval.eval()

print("Generating images...")
num_images = 100
with torch.no_grad():
    for i in range(num_images):
        noise = torch.randn(1, latent_dim, 1, 1, device=device)
        fake_image = generator_eval(noise)
        save_path = os.path.join(generated_images_dir, f"generated_{i:04d}.png")
        save_image(fake_image, save_path, normalize=True)

print(f"Generated {num_images} images in {generated_images_dir}")

In [None]:
display_multiple_img(getImagePaths(generated_images_dir))

## Generate More Images (Optional)

You can generate more images by running the code below:


In [None]:
# Generate a larger batch of images
num_additional = 500
print(f"Generating {num_additional} more images...")

with torch.no_grad():
    for i in range(num_additional):
        noise = torch.randn(1, latent_dim, 1, 1, device=device)
        fake_image = generator_eval(noise)
        save_path = os.path.join(generated_images_dir, f"generated_{100+i:04d}.png")
        save_image(fake_image, save_path, normalize=True)

print(f"Total images generated: {100 + num_additional}")

## Summary

This notebook implements a standard DCGAN (Deep Convolutional GAN) with the following key components:

1. **Generator**: Uses transposed convolutions to upsample from a latent vector (100-dim) to a 64x64 image
2. **Discriminator**: Uses strided convolutions to downsample a 64x64 image to a single probability score
3. **Training**: Uses Binary Cross-Entropy loss for both networks
4. **Architecture Guidelines** (following DCGAN paper):
   - Replace pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator)
   - Use batch normalization in both networks
   - Remove fully connected hidden layers
   - Use ReLU activation in generator (except output layer which uses Tanh)
   - Use LeakyReLU activation in discriminator

Key differences from other GAN variants:

- **Not using WGAN-GP**: No gradient penalty or Wasserstein distance
- **Not using Spectral Normalization**: Standard batch normalization instead
- **BCE Loss**: Classic GAN loss function, not hinge loss or other variants
