A comprehensive deep learning assignment implementing three state-of-the-art generative AI tasks:
- CycleGAN for Face-to-Sketch Translation
- Transformer for English-to-Urdu Machine Translation
- Diffusion Transformers (SiT) for Image Generation
Author: Muhammad Ibraheem (i212508)
Course: Generative AI
Institution: FAST-NUCES
- Overview
- Project Structure
- Task 1: CycleGAN Face-Sketch Translation
- Task 2: English-Urdu Machine Translation
- Task 3: Diffusion Transformers
- Installation
- Usage
- Results
- References
This repository contains implementations of three cutting-edge generative AI models:
| Task | Model | Description |
|---|---|---|
| Task 1 | CycleGAN | Unpaired image-to-image translation for photoβsketch conversion |
| Task 2 | Transformer | Sequence-to-sequence translation for EnglishβUrdu |
| Task 3 | SiT (Scalable Interpolant Transformer) | Diffusion-based image generation on CIFAR-10 |
GenAI-Assignment2/
βββ Task1/ # CycleGAN Face-Sketch Translation
β βββ model.py # Generator & Discriminator architectures
β βββ train.py # Training loop with adversarial loss
β βββ test.py # Inference and evaluation
β βββ data_loader.py # Custom dataset loader
β βββ classifier.py # Auxiliary classifier
β βββ gui.py # PyQt5 GUI for demo
β βββ requirements.txt
β βββ checkpoints/ # Saved models (.pth, .safetensors)
β βββ Data/ # Train/Val/Test splits
β βββ train/
β βββ val/
β βββ test/
β
βββ Task2/ # English-Urdu Translation
β βββ model.py # Transformer architecture
β βββ train.py # Training with teacher forcing
β βββ evaluate.py # BLEU score evaluation
β βββ evaluate_mbart.py # mBART fine-tuning evaluation
β βββ finetune_mbart.py # Fine-tune mBART for translation
β βββ dataset.py # Parallel corpus loader
β βββ preprocess.py # Data preprocessing
β βββ train_tokenizers.py # BPE tokenizer training
β βββ demo.py # Interactive translation demo
β βββ gui.py # PyQt5 GUI
β βββ en_tokenizer.json # English BPE tokenizer
β βββ ur_tokenizer.json # Urdu BPE tokenizer
β βββ requirements.txt
β βββ checkpoints/ # Model weights
β βββ Data/ # Parallel corpus
β βββ train.en / train.ur
β βββ val.en / val.ur
β βββ test.en / test.ur
β βββ umc005-corpus/ # Additional corpora
β
βββ Task3/ # Diffusion Transformers
β βββ Task3_Diffusion_Transformers.ipynb # Complete notebook
β βββ requirements.txt
β βββ checkpoints/ # SiT model weights
β βββ data/ # CIFAR-10 dataset
β βββ runs/ # TensorBoard logs
β βββ samples/ # Generated images
β
βββ runs/ # TensorBoard logs (CycleGAN)
βββ samples/ # Sample outputs
βββ README.md
Implements the CycleGAN architecture from "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks" (Zhu et al., 2017).
Generator (ResNet-based):
- Initial convolution: c7s1-64
- Downsampling: d128, d256
- 9 Residual blocks: R256Γ9
- Upsampling: u128, u64
- Output: c7s1-3 with Tanh activation
Discriminator (PatchGAN):
- 70Γ70 PatchGAN discriminator
- Instance normalization
- β Cycle consistency loss
- β Adversarial loss (LSGAN)
- β Identity mapping loss
- β Learning rate scheduling
- β Interactive GUI demo
cd Task1
pip install -r requirements.txt
# Train
python train.py
# Test
python test.py
# Launch GUI
python gui.pyImplements the vanilla Transformer from "Attention is All You Need" (Vaswani et al., 2017).
Model Configuration:
- Encoder: 6 layers, 8 attention heads
- Decoder: 6 layers, 8 attention heads
- d_model: 512
- d_ff: 2048 (feedforward dimension)
- Dropout: 0.1
Tokenization:
- BPE (Byte Pair Encoding) tokenizers for both languages
- Custom-trained on English-Urdu parallel corpus
- β Sinusoidal positional encoding
- β Multi-head self-attention
- β Teacher forcing during training
- β BLEU score evaluation
- β mBART fine-tuning option
- β Interactive GUI demo
cd Task2
pip install -r requirements.txt
# Train tokenizers
python train_tokenizers.py
# Train model
python train.py
# Evaluate (BLEU score)
python evaluate.py
# Launch GUI
python gui.pyImplements SiT (Scalable Interpolant Transformer) with REG (Representation Entanglement for Generation) based on recent diffusion transformer research.
Key Components:
- Patch embedding for image tokenization
- Transformer blocks with self-attention
- Adaptive layer normalization (adaLN)
- Continuous-time diffusion with DDPM/DDIM sampling
- DINOv2 integration for representation entanglement
- β CIFAR-10 subset training (cats & dogs)
- β REG loss for improved generation
- β Classifier-free guidance support
- β DDPM & DDIM sampling
- β SafeTensors model saving
- β TensorBoard visualization
cd Task3
pip install -r requirements.txt
# Open and run the notebook
jupyter notebook Task3_Diffusion_Transformers.ipynb- Python 3.8+
- CUDA-capable GPU (recommended)
- PyTorch 2.5+ (for CVE-2025-32434 mitigation)
# Clone the repository
git clone https://github.com/thewitcher41/GenAI-Assignment2.git
cd GenAI-Assignment2
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install dependencies for each task
pip install -r Task1/requirements.txt
pip install -r Task2/requirements.txt
pip install -r Task3/requirements.txt# Task 1: CycleGAN
cd Task1 && python train.py
# Task 2: Transformer Translation
cd Task2 && python train.py
# Task 3: Diffusion Transformer
# Open and run Task3_Diffusion_Transformers.ipynbBoth Task 1 and Task 2 include interactive PyQt5 GUIs:
# Face-Sketch GUI
python Task1/gui.py
# Translation GUI
python Task2/gui.pyMonitor training progress:
tensorboard --logdir=runs| Metric | Value |
|---|---|
| Cycle Consistency Loss | Low |
| Visual Quality | High fidelity sketch generation |
| Metric | Value |
|---|---|
| BLEU Score | See predictions.txt |
| Tokenizer | BPE with 32K vocab |
| Metric | Value |
|---|---|
| Dataset | CIFAR-10 (Cats & Dogs) |
| Sampling | DDPM & DDIM |
-
CycleGAN: Zhu, J.Y., et al. "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks." ICCV 2017.
-
Transformer: Vaswani, A., et al. "Attention is All You Need." NeurIPS 2017.
-
Diffusion Models: Ho, J., et al. "Denoising Diffusion Probabilistic Models." NeurIPS 2020.
-
DiT: Peebles, W., & Xie, S. "Scalable Diffusion Models with Transformers." ICCV 2023.
-
REG: "Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think."
This project is licensed under the MIT License - see the LICENSE file for details.
- FAST-NUCES for the course structure
- PyTorch team for the deep learning framework
- Hugging Face for Transformers library
- Original paper authors for their groundbreaking research
Made with β€οΈ for Generative AI Course