Skip to content

ChrisDong-THU/GaussianToken

Repository files navigation

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting

arXiv GitHub last commit GitHub issues

Banner

Dataset

  1. Prepare the datasets as the following structures.

    • CIFAR-100:
      <path-to-dataset>/cifar-100-python/
      ├── file.txt~
      ├── meta
      ├── test
      └── train
    • Mini-ImageNet:
      <path-to-dataset>/mini-imagenet/
      ├── images
      ├── test.csv
      ├── train.csv
      └── val.csv
    • ImageNet:
      <path-to-dataset>/imagenet
      ├── ImageNet_class_index.json
      ├── ImageNet_val_label.txt
      ├── train
      │   ├── n01440764
      │   │   ├── n01440764_10026.JPEG
      │   │   ├── n01440764_10027.JPEG
      │   │   ├── ...
      │   ├── n01443537
      │   │   ├── n01443537_10007.JPEG
      │   │   ├── n01443537_10014.JPEG
      │   │   ├── ...
      │   ├── ...
      └── val
          ├── ...
  2. Write the following environment variables into file .bashrc.

    # dataset env
    export DATASET_ROOT="<path-to-dataset>"
    export MINI_IMAGENET_ROOT="${DATASET_ROOT}/mini-imagenet"
    export CIFAR100_ROOT="${DATASET_ROOT}"
    export IMAGENET_ROOT="${DATASET_ROOT}/imagenet"

Installation

  1. Create a conda environment.
    conda create -n gstk python=3.9.13
    conda activate gstk
  2. Install the dependent packages.
    pip install -r requirements.txt
    pip install -r requirements-extra.txt
  3. Install gsplat & deformable attn modules.
    cd gstk/modules/gsplat && python setup.py build install
    cd ../gaussianembed/ops && python setup.py build install

Note: Please modify requirements-extra.txt first to match the appropriate PyTorch version based on the CUDA version (default is v12.1).

Training

  1. Change the script permissions.
    chmod +x ./scripts/*
  2. Running script:
    • CIFAR:
      ./scripts/cifar-gqgan-1.sh
    • Mini-ImageNet:
      ./scripts/mini-gqgan-1.sh
    • ImageNet:
      ./scripts/in-gqgan-1.sh
    The training log files will be saved in the ./logs folder.

Evaluating

  • Image reconstruction.
    ./scripts/rec-1.sh
  • Metrics calculation.
    ./scripts/val-1.sh

Pretrained Models

Dataset rFID Link Comments
CIFAR100 (f=4) 12.94 cifar_gs64_cb1024 SOTA
ImageNet-1K (f=16) 1.61 imagenet_gs256_cb1024 SOTA1

Acknowledgments

Footnotes

  1. Without altering traditional vector quantization methods, the GaussianEmbed module achieves the lowest FID across all tokenizers.

About

Official PyTorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors