-
Prepare the datasets as the following structures.
- CIFAR-100:
<path-to-dataset>/cifar-100-python/ ├── file.txt~ ├── meta ├── test └── train
- Mini-ImageNet:
<path-to-dataset>/mini-imagenet/ ├── images ├── test.csv ├── train.csv └── val.csv
- ImageNet:
<path-to-dataset>/imagenet ├── ImageNet_class_index.json ├── ImageNet_val_label.txt ├── train │ ├── n01440764 │ │ ├── n01440764_10026.JPEG │ │ ├── n01440764_10027.JPEG │ │ ├── ... │ ├── n01443537 │ │ ├── n01443537_10007.JPEG │ │ ├── n01443537_10014.JPEG │ │ ├── ... │ ├── ... └── val ├── ...
- CIFAR-100:
-
Write the following environment variables into file
.bashrc.# dataset env export DATASET_ROOT="<path-to-dataset>" export MINI_IMAGENET_ROOT="${DATASET_ROOT}/mini-imagenet" export CIFAR100_ROOT="${DATASET_ROOT}" export IMAGENET_ROOT="${DATASET_ROOT}/imagenet"
- Create a conda environment.
conda create -n gstk python=3.9.13 conda activate gstk
- Install the dependent packages.
pip install -r requirements.txt pip install -r requirements-extra.txt
- Install
gsplat&deformable attnmodules.cd gstk/modules/gsplat && python setup.py build install cd ../gaussianembed/ops && python setup.py build install
Note: Please modify
requirements-extra.txtfirst to match the appropriate PyTorch version based on the CUDA version (default is v12.1).
- Change the script permissions.
chmod +x ./scripts/* - Running script:
- CIFAR:
./scripts/cifar-gqgan-1.sh
- Mini-ImageNet:
./scripts/mini-gqgan-1.sh
- ImageNet:
./scripts/in-gqgan-1.sh
./logsfolder. - CIFAR:
- Image reconstruction.
./scripts/rec-1.sh
- Metrics calculation.
./scripts/val-1.sh
| Dataset | rFID | Link | Comments |
|---|---|---|---|
| CIFAR100 (f=4) | 12.94 | cifar_gs64_cb1024 | SOTA |
| ImageNet-1K (f=16) | 1.61 | imagenet_gs256_cb1024 | SOTA1 |
- Deformable DETR
- GaussianImage
- vector-quantize-pytorch
- taming-transformers
- GaussianFormer
- Open-MAGVIT2
Footnotes
-
Without altering traditional vector quantization methods, the GaussianEmbed module achieves the lowest FID across all tokenizers. ↩
