Multi-Realism Image Compression (MRIC)

This repository provides a TensorFlow 2 implementation of MRIC based on:

Multi-Realism Image Compression with a Conditional Generator (CVPR 2023).

Abstract
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.

Updates

01/11/2024

Initial release of this project

Visual Example

The image below (left) is taken from the CLIC 2020 test set, external to the training set. The right image is its corresponding reconstruction, when using MRIC ($\beta=2.56$) with $\lambda=0.128$ (lowest quality setting).

CLIC 2020: ad24 | Bits per pixel: 0.1501 (59kB)

More example reconstructions can be found here.

Quantitative Performance

We trained two models using $\lambda \in \{0.128, 0.032\}$ for 2.3M steps as described here.

In this section we quantitatively compare the performance of MRIC (reimpl) to the officially reported numbers. We add VTM-20.0 (state-of-the-art for non-learned codecs) and HiFiC (long-standing previous state-of-the art for generative image compression) for the sake of completeness. The FID/256-computation is based on Torch-Fidelity, similar to MS-ILLM, as common in the literature.

We generally find that MRIC (reimpl) tends to favor low FID scores over high PSNR values. For MRIC ($\beta=2.56$) we obtain competitive results in terms of statistical fidelity, while having slightly higher distortion. For MRIC ($\beta=0.0$), we obtain some different operating point along the rate-distortion-perception plane that resembles more the traditional compression setting (high PSNR). For MRIC ($0 < \beta < 2.56$) we obtain any operating mode in between, providing great flexibility to user/ application-based preferences.

We leave the exploration for better trade-offs to future work.

For MRIC reimpl we use $\beta \in \{2.56, 1.28, 0.64, 0.32, 0.16, 0.08, 0.04, 0.0\}$.

Install

$ git clone https://github.com/Nikolai10/MRIC.git

This project has been developed using docker; we recommend using the tensorflow:2.14.0-gpu-jupyter docker image, which uses tfc==2.14.0 by default (latest).

A tensorflow/ docker installation guideline is provided here.

Training/ Inference

Please have a look at the example Colab notebook for more information.

Quality Assertions/ Deviations

The general goal of this project is to provide an exact reimplementation of MRIC. In this section we highlight some minor technical deviations from the official work that we have made to achieve a better trade-off between stability and performance for our particular setup.

	Official	Reimplementation
Data	proprietary dataset	Open Images
optimization strategy	end-to-end from scratch	multi-stage training (similar to HiFiC, Sec. A6)
optimization steps	3M	2.3M = 2M (stage 1) + 0.3M (stage 2)
higher $\lambda$	$10\times$ in the first 15% steps	-
learning rate decay	1e-4 -> 1e-5 for the last 15% steps	1e-4 -> 1e-5 for the last 15% steps of stage 1; we use a constant learning rate for stage 2 (1e-4)
entropy model	small variant of ChARM (10 slices)	TBTC-inspired variant of ChARM (see Figure 12)

Note that the entropy model probably plays a minor role in the overall optimization procedure; at the time of development, we simply did not have access to the official ChARM configuration.

If you find better hyper-parameters, please share them with the community.

Pre-trained Models

All pre-trained models ($\lambda=0.128, 0.032$) can be downloaded here.

Directions for Improvement

add sophisticated data pre-processing methods (e.g. random resized cropping, random horizontal flipping), see _get_dataset (HiFiC) for some inspiration.
explore different hyper-parameters; can we obtain a single model that obtains both state-of-the-art results for distortion (MRIC $\beta=0.0$) and perception (MRIC $\beta=2.56$)?

File Structure

Note that we have taken great care to follow the official works - e.g. if you are already familiar with HiFiC, you will find that hific_tf2 follows the exact same structure (similar applies to compare_gan_tf2, amtm2023.py).

 res
     ├── data/                                      # e.g. training data; LPIPS weights etc.
     ├── doc/                                       # addtional resources
     ├── eval/                                      # sample images + reconstructions
     ├── train_amtm2023/                            # model checkpoints + tf.summaries
     ├── amtm2023/                                  # saved model
 src
     ├── compare_gan_tf2/                           # partial TF 2 port of compare_gan (mirrors structure)
            ├── arch_ops.py                         # building blocks used in PatchGAN       
            ├── loss_lib.py                         # non_saturating GAN loss
            ├── utils.py                            # convenient utilities     
     ├── hific_tf2/                                 # partial TF 2 port of HiFiC (mirrors structure)
            ├── archs.py                            # PatchGAN discriminator 
            ├── helpers.py                          # LPIPS downloader
            ├── model.py                            # perceptual loss
     ├── amtm2023.py                                # >> core of this repo <<
     ├── config.py                                  # configurations
     ├── elic.py                                    # ELIC transforms based on VCT
     ├── fourier_cond.py                            # Fourier conditioning
     ├── synthesis.py                               # conditional synthesis transform

Acknowledgment

This project is based on:

TensorFlow Compression (TFC), a TF library dedicated to data compression. Particularly, we base our work on the well known MS2020 and HiFiC, while closely following the official structure.
VCT: A Video Compression Transformer - we make use of the ELIC analysis transform.
NeRF: Neural Radiance Fields - we make use of the Fourier feature computation.
compare_gan, a TF 1 library dedicated to GANs - we translate some functionality to TF 2.

We thank the authors for providing us with the official evaluation points as well as helpful insights.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
res		res
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

res

res

src

src

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Multi-Realism Image Compression (MRIC)

Updates

Visual Example

Quantitative Performance

Install

Training/ Inference

Quality Assertions/ Deviations

Pre-trained Models

Directions for Improvement

File Structure

Acknowledgment

License

About

Releases

Packages

Languages

License

Nikolai10/MRIC

Folders and files

Latest commit

History

Repository files navigation

Multi-Realism Image Compression (MRIC)

Updates

Visual Example

Quantitative Performance

Install

Training/ Inference

Quality Assertions/ Deviations

Pre-trained Models

Directions for Improvement

File Structure

Acknowledgment

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages