Skip to content

Nikolai10/MRIC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Realism Image Compression (MRIC)

This repository provides a TensorFlow 2 implementation of MRIC based on:

Abstract
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.

Updates

01/11/2024

  1. Initial release of this project

Visual Example

The image below (left) is taken from the CLIC 2020 test set, external to the training set. The right image is its corresponding reconstruction, when using MRIC ($\beta=2.56$) with $\lambda=0.128$ (lowest quality setting).

ad249bba099568403dc6b97bc37f8d74_inp.png ad249bba099568403dc6b97bc37f8d74_otp.png
CLIC 2020: ad24 | Bits per pixel: 0.1501 (59kB)

More example reconstructions can be found here.

Quantitative Performance

We trained two models using $\lambda \in \{0.128, 0.032\}$ for 2.3M steps as described here.

In this section we quantitatively compare the performance of MRIC (reimpl) to the officially reported numbers. We add VTM-20.0 (state-of-the-art for non-learned codecs) and HiFiC (long-standing previous state-of-the art for generative image compression) for the sake of completeness. The FID/256-computation is based on Torch-Fidelity, similar to MS-ILLM, as common in the literature.

We generally find that MRIC (reimpl) tends to favor low FID scores over high PSNR values. For MRIC ($\beta=2.56$) we obtain competitive results in terms of statistical fidelity, while having slightly higher distortion. For MRIC ($\beta=0.0$), we obtain some different operating point along the rate-distortion-perception plane that resembles more the traditional compression setting (high PSNR). For MRIC ($0 < \beta < 2.56$) we obtain any operating mode in between, providing great flexibility to user/ application-based preferences.

We leave the exploration for better trade-offs to future work.

MRIC vs. Reimpl

For MRIC reimpl we use $\beta \in \{2.56, 1.28, 0.64, 0.32, 0.16, 0.08, 0.04, 0.0\}$.

Install

$ git clone https://github.com/Nikolai10/MRIC.git 

This project has been developed using docker; we recommend using the tensorflow:2.14.0-gpu-jupyter docker image, which uses tfc==2.14.0 by default (latest).

A tensorflow/ docker installation guideline is provided here.

Training/ Inference

Please have a look at the example Colab notebook for more information.

Quality Assertions/ Deviations

The general goal of this project is to provide an exact reimplementation of MRIC. In this section we highlight some minor technical deviations from the official work that we have made to achieve a better trade-off between stability and performance for our particular setup.

Official Reimplementation
Data proprietary dataset Open Images
optimization strategy end-to-end from scratch multi-stage training (similar to HiFiC, Sec. A6)
optimization steps 3M 2.3M = 2M (stage 1) + 0.3M (stage 2)
higher $\lambda$ $10\times$ in the first 15% steps -
learning rate decay 1e-4 -> 1e-5 for the last 15% steps 1e-4 -> 1e-5 for the last 15% steps of stage 1; we use a constant learning rate for stage 2 (1e-4)
entropy model small variant of ChARM (10 slices) TBTC-inspired variant of ChARM (see Figure 12)

Note that the entropy model probably plays a minor role in the overall optimization procedure; at the time of development, we simply did not have access to the official ChARM configuration.

If you find better hyper-parameters, please share them with the community.

Pre-trained Models

All pre-trained models ($\lambda=0.128, 0.032$) can be downloaded here.

Directions for Improvement

  • add sophisticated data pre-processing methods (e.g. random resized cropping, random horizontal flipping), see _get_dataset (HiFiC) for some inspiration.
  • explore different hyper-parameters; can we obtain a single model that obtains both state-of-the-art results for distortion (MRIC $\beta=0.0$) and perception (MRIC $\beta=2.56$)?

File Structure

Note that we have taken great care to follow the official works - e.g. if you are already familiar with HiFiC, you will find that hific_tf2 follows the exact same structure (similar applies to compare_gan_tf2, amtm2023.py).

 res
     ├── data/                                      # e.g. training data; LPIPS weights etc.
     ├── doc/                                       # addtional resources
     ├── eval/                                      # sample images + reconstructions
     ├── train_amtm2023/                            # model checkpoints + tf.summaries
     ├── amtm2023/                                  # saved model
 src
     ├── compare_gan_tf2/                           # partial TF 2 port of compare_gan (mirrors structure)
            ├── arch_ops.py                         # building blocks used in PatchGAN       
            ├── loss_lib.py                         # non_saturating GAN loss
            ├── utils.py                            # convenient utilities     
     ├── hific_tf2/                                 # partial TF 2 port of HiFiC (mirrors structure)
            ├── archs.py                            # PatchGAN discriminator 
            ├── helpers.py                          # LPIPS downloader
            ├── model.py                            # perceptual loss
     ├── amtm2023.py                                # >> core of this repo <<
     ├── config.py                                  # configurations
     ├── elic.py                                    # ELIC transforms based on VCT
     ├── fourier_cond.py                            # Fourier conditioning
     ├── synthesis.py                               # conditional synthesis transform

Acknowledgment

This project is based on:

We thank the authors for providing us with the official evaluation points as well as helpful insights.

License

Apache License 2.0

About

TensorFlow implementation of MRIC (Multi-Realism Image Compression with a Conditional Generator, CVPR 2023)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages