Skip to content

TheWitcher41/GenAI-Assignment2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GenAI-Assignment2: Deep Learning for Vision and NLP

Python 3.8+ PyTorch 2.5+ License: MIT

A comprehensive deep learning assignment implementing three state-of-the-art generative AI tasks:

  1. CycleGAN for Face-to-Sketch Translation
  2. Transformer for English-to-Urdu Machine Translation
  3. Diffusion Transformers (SiT) for Image Generation

Author: Muhammad Ibraheem (i212508)
Course: Generative AI
Institution: FAST-NUCES


πŸ“‹ Table of Contents


🎯 Overview

This repository contains implementations of three cutting-edge generative AI models:

Task Model Description
Task 1 CycleGAN Unpaired image-to-image translation for photo↔sketch conversion
Task 2 Transformer Sequence-to-sequence translation for English→Urdu
Task 3 SiT (Scalable Interpolant Transformer) Diffusion-based image generation on CIFAR-10

πŸ“ Project Structure

GenAI-Assignment2/
β”œβ”€β”€ Task1/                          # CycleGAN Face-Sketch Translation
β”‚   β”œβ”€β”€ model.py                    # Generator & Discriminator architectures
β”‚   β”œβ”€β”€ train.py                    # Training loop with adversarial loss
β”‚   β”œβ”€β”€ test.py                     # Inference and evaluation
β”‚   β”œβ”€β”€ data_loader.py              # Custom dataset loader
β”‚   β”œβ”€β”€ classifier.py               # Auxiliary classifier
β”‚   β”œβ”€β”€ gui.py                      # PyQt5 GUI for demo
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ checkpoints/                # Saved models (.pth, .safetensors)
β”‚   └── Data/                       # Train/Val/Test splits
β”‚       β”œβ”€β”€ train/
β”‚       β”œβ”€β”€ val/
β”‚       └── test/
β”‚
β”œβ”€β”€ Task2/                          # English-Urdu Translation
β”‚   β”œβ”€β”€ model.py                    # Transformer architecture
β”‚   β”œβ”€β”€ train.py                    # Training with teacher forcing
β”‚   β”œβ”€β”€ evaluate.py                 # BLEU score evaluation
β”‚   β”œβ”€β”€ evaluate_mbart.py           # mBART fine-tuning evaluation
β”‚   β”œβ”€β”€ finetune_mbart.py           # Fine-tune mBART for translation
β”‚   β”œβ”€β”€ dataset.py                  # Parallel corpus loader
β”‚   β”œβ”€β”€ preprocess.py               # Data preprocessing
β”‚   β”œβ”€β”€ train_tokenizers.py         # BPE tokenizer training
β”‚   β”œβ”€β”€ demo.py                     # Interactive translation demo
β”‚   β”œβ”€β”€ gui.py                      # PyQt5 GUI
β”‚   β”œβ”€β”€ en_tokenizer.json           # English BPE tokenizer
β”‚   β”œβ”€β”€ ur_tokenizer.json           # Urdu BPE tokenizer
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ checkpoints/                # Model weights
β”‚   └── Data/                       # Parallel corpus
β”‚       β”œβ”€β”€ train.en / train.ur
β”‚       β”œβ”€β”€ val.en / val.ur
β”‚       β”œβ”€β”€ test.en / test.ur
β”‚       └── umc005-corpus/          # Additional corpora
β”‚
β”œβ”€β”€ Task3/                          # Diffusion Transformers
β”‚   β”œβ”€β”€ Task3_Diffusion_Transformers.ipynb  # Complete notebook
β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”œβ”€β”€ checkpoints/                # SiT model weights
β”‚   β”œβ”€β”€ data/                       # CIFAR-10 dataset
β”‚   β”œβ”€β”€ runs/                       # TensorBoard logs
β”‚   └── samples/                    # Generated images
β”‚
β”œβ”€β”€ runs/                           # TensorBoard logs (CycleGAN)
β”œβ”€β”€ samples/                        # Sample outputs
└── README.md

🎨 Task 1: CycleGAN Face-Sketch Translation

Architecture

Implements the CycleGAN architecture from "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" (Zhu et al., 2017).

Generator (ResNet-based):

  • Initial convolution: c7s1-64
  • Downsampling: d128, d256
  • 9 Residual blocks: R256Γ—9
  • Upsampling: u128, u64
  • Output: c7s1-3 with Tanh activation

Discriminator (PatchGAN):

  • 70Γ—70 PatchGAN discriminator
  • Instance normalization

Features

  • βœ… Cycle consistency loss
  • βœ… Adversarial loss (LSGAN)
  • βœ… Identity mapping loss
  • βœ… Learning rate scheduling
  • βœ… Interactive GUI demo

Quick Start

cd Task1
pip install -r requirements.txt

# Train
python train.py

# Test
python test.py

# Launch GUI
python gui.py

🌐 Task 2: English-Urdu Machine Translation

Architecture

Implements the vanilla Transformer from "Attention is All You Need" (Vaswani et al., 2017).

Model Configuration:

  • Encoder: 6 layers, 8 attention heads
  • Decoder: 6 layers, 8 attention heads
  • d_model: 512
  • d_ff: 2048 (feedforward dimension)
  • Dropout: 0.1

Tokenization:

  • BPE (Byte Pair Encoding) tokenizers for both languages
  • Custom-trained on English-Urdu parallel corpus

Features

  • βœ… Sinusoidal positional encoding
  • βœ… Multi-head self-attention
  • βœ… Teacher forcing during training
  • βœ… BLEU score evaluation
  • βœ… mBART fine-tuning option
  • βœ… Interactive GUI demo

Quick Start

cd Task2
pip install -r requirements.txt

# Train tokenizers
python train_tokenizers.py

# Train model
python train.py

# Evaluate (BLEU score)
python evaluate.py

# Launch GUI
python gui.py

πŸ–ΌοΈ Task 3: Diffusion Transformers (SiT)

Architecture

Implements SiT (Scalable Interpolant Transformer) with REG (Representation Entanglement for Generation) based on recent diffusion transformer research.

Key Components:

  • Patch embedding for image tokenization
  • Transformer blocks with self-attention
  • Adaptive layer normalization (adaLN)
  • Continuous-time diffusion with DDPM/DDIM sampling
  • DINOv2 integration for representation entanglement

Features

  • βœ… CIFAR-10 subset training (cats & dogs)
  • βœ… REG loss for improved generation
  • βœ… Classifier-free guidance support
  • βœ… DDPM & DDIM sampling
  • βœ… SafeTensors model saving
  • βœ… TensorBoard visualization

Quick Start

cd Task3
pip install -r requirements.txt

# Open and run the notebook
jupyter notebook Task3_Diffusion_Transformers.ipynb

πŸ› οΈ Installation

Prerequisites

  • Python 3.8+
  • CUDA-capable GPU (recommended)
  • PyTorch 2.5+ (for CVE-2025-32434 mitigation)

Setup

# Clone the repository
git clone https://github.com/thewitcher41/GenAI-Assignment2.git
cd GenAI-Assignment2

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows

# Install dependencies for each task
pip install -r Task1/requirements.txt
pip install -r Task2/requirements.txt
pip install -r Task3/requirements.txt

πŸš€ Usage

Training Models

# Task 1: CycleGAN
cd Task1 && python train.py

# Task 2: Transformer Translation
cd Task2 && python train.py

# Task 3: Diffusion Transformer
# Open and run Task3_Diffusion_Transformers.ipynb

GUI Applications

Both Task 1 and Task 2 include interactive PyQt5 GUIs:

# Face-Sketch GUI
python Task1/gui.py

# Translation GUI
python Task2/gui.py

TensorBoard

Monitor training progress:

tensorboard --logdir=runs

πŸ“Š Results

Task 1: Face-Sketch Translation

Metric Value
Cycle Consistency Loss Low
Visual Quality High fidelity sketch generation

Task 2: English-Urdu Translation

Metric Value
BLEU Score See predictions.txt
Tokenizer BPE with 32K vocab

Task 3: Diffusion Transformer

Metric Value
Dataset CIFAR-10 (Cats & Dogs)
Sampling DDPM & DDIM

πŸ“š References

  1. CycleGAN: Zhu, J.Y., et al. "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks." ICCV 2017.

  2. Transformer: Vaswani, A., et al. "Attention is All You Need." NeurIPS 2017.

  3. Diffusion Models: Ho, J., et al. "Denoising Diffusion Probabilistic Models." NeurIPS 2020.

  4. DiT: Peebles, W., & Xie, S. "Scalable Diffusion Models with Transformers." ICCV 2023.

  5. REG: "Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think."


πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • FAST-NUCES for the course structure
  • PyTorch team for the deep learning framework
  • Hugging Face for Transformers library
  • Original paper authors for their groundbreaking research

Made with ❀️ for Generative AI Course

About

Deep Learning for Vision and NLP - CycleGAN, Transformer Translation, Diffusion Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published