A comparative implementation of Vision Transformer (ViT) and Differential Vision Transformer (Diff-ViT) on CIFAR-10, exploring how differential attention mechanisms compare to standard multi-head attention in the vision domain.
For more details, see the full reports: ViT Report | Diff-ViT Report
# Install dependencies using uv
uv sync
# Dataset (CIFAR-10) downloads automatically on first run to {vit,dvit}/data/cd vit # or dvit
uv run python -m src.vit --mode train # or vis




