Guojun Xu, Mingyang Zhang, Jianwen Xiang, Cheng Tan, Yanchao Yang, Junwei Zhou
School of Computer Science and Artificial Intelligence, Wuhan University of Technology
Distributed Image Compression (DIC) is crucial for multi-view transmission, especially when operating at extremely low bitrates (
- Ubuntu 22.04.5 LTS
- Python 3.10.16
- PyTorch 2.3.0 + CUDA 12.1
conda create -n mdic python==3.10
conda activate mdic
pip install -r requirements.txtWe conduct experiments on the KITTI Stereo and Cityscapes datasets.
Download the datasets from:
After downloading, run:
# KITTI 2012
unzip data_stereo_flow_multiview.zip
mkdir data_stereo_flow_multiview
mv training data_stereo_flow_multiview
mv testing data_stereo_flow_multiview
# KITTI 2015
unzip data_scene_flow_multiview.zip
mkdir data_scene_flow_multiview
mv training data_scene_flow_multiview
mv testing data_scene_flow_multiviewDownload:
leftImg8bit_trainvaltest.ziprightImg8bit_trainvaltest.zip
from the Cityscapes official website.
Then run:
mkdir cityscape_dataset
unzip leftImg8bit_trainvaltest.zip
mv leftImg8bit cityscape_dataset
unzip rightImg8bit_trainvaltest.zip
mv rightImg8bit cityscape_datasetExample training command on Cityscapes:
CUDA_VISIBLE_DEVICES=0 python src/train_sd_perco_2.py \
--pretrained_model_name_or_path 'stable-diffusion-2-1' \
--validation_frequency 5 \
--allow_tf32 \
--dataloader_num_workers 4 \
--resolution 512 \
--center_crop \
--random_flip \
--train_batch_size 4 \
--gradient_accumulation_steps 1 \
--num_train_epochs 50000 \
--max_train_steps 500 \
--validation_steps 500 \
--prediction_type v_prediction \
--checkpointing_steps 500 \
--learning_rate 8e-05 \
--adam_weight_decay 1e-2 \
--max_grad_norm 1 \
--lr_scheduler constant_with_warmup \
--lr_warmup_steps 10000 \
--checkpoints_total_limit 2 \
--dataset_name_KC Cityscape \
--dataset_path ./cityscape_dataset \
--output_dir PATH/result \
--resume_from_checkpoint PATH/checkpoint.ptFor diffusion-related pretrained weights and model downloads, please refer to the official PerCo repository:
- PerCo Repository: https://github.com/Nikolai10/PerCo
We follow the same setup and pretrained model configuration as the PerCo baseline.
Our implementation is built upon the excellent work of PerCo. We sincerely thank the authors for open-sourcing their code and contributions to perceptual compression research.
- PerCo Paper: https://arxiv.org/abs/2310.10325
- PerCo Project: https://github.com/Nikolai10/PerCo
If you find our work useful for your research, please consider citing:
@inproceedings{xu2026mdic,
title={Distributed Image Compression with Multimodal Side Information at Extremely Low Bitrates},
author={Xu, Guojun and Zhang, Mingyang and Xiang, Jianwen and Tan, Cheng and Yang, Yanchao and Zhou, Junwei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}If you like this project, please give us a ⭐ on GitHub!
