FDIM

Introduction

FDIM is a feature-distance-based generic VQA metric that generalizes across:

Traditional and neural codecs
SDR and HDR formats
Diverse resolutions and content types

FDIM adopts a hybrid architecture:

Deep branch: learns multi-scale representations to capture distortions from low-level fidelity degradation to high-level semantic differences, with a content-adaptive feature comparison mechanism.
Hand-crafted branch: enhances robustness and generalization across domains.

FDIM is trained on the large-scale DCVQA dataset (16k+ samples with both conventional and neural codecs), and demonstrates strong and consistent performance across multiple public SDR/HDR VQA benchmarks.

The software package offers one video quality metrics to evaluate a single compressed video or a set of compressed videos, in YUV or RGB formats.

Installation

Prerequisite

[verified] CUDA 12.2
python 3.9
ffmpeg in PATH

Setup Virtualenv

conda create -n fdim python=3.9.20
conda activate fdim

Install packages for inference

Run the following command from the repository root:

pip install -e .

You can also use:

bash install.sh

Note: install.sh will create and activate a conda environment named fdim automatically. If you already created and activated the environment manually, prefer pip install -e . to avoid duplicated setup.

Instruction to run FDIM

1. Download the chekpoint/model

The default checkpoint path used by the scripts is:

fdim: put it in ./fdim/dist/checkpoints/

The current repository uses ./fdim/dist/checkpoints/dist_5.0.0.ckpt by default. If you want to use another checkpoint, pass it explicitly with --model_path <path_to_checkpoint>.

2. Prepare video information file

Create a CSV file in ./data/dataset/ and enter the information of all the video you want to evaluate as follow:

ref_name	dis_name	mos	ref_width	ref_height	dis_width	dis_height	ref_bits	dis_bits
SRC1001_1920x1080_25_yuv420p.mp4	SRC1001_1920x1080_25_yuv420p.mp4.x265.r0.265.mp4	4.854890404	1920	1080	1920	1080	8	8

ref_name: The name of the reference video.
dis_name: The name of the test video.
mos: The ground truth of video quality. If unavailable, set it to 0.
ref_width, ref_height: The video resolution of reference video.
dis_width, dis_height: The video resolution of distorted video.

If your yuv videos are 8bit, you don't need the "ref_bits" and "dis_bits" columns.

3. Inference

Evaluate the quality of all videos in a dataset

Add the execute permission.

chmod +x ./fdim/vmaf/vmaf

If the reference video and distorted video is not in YUV format.

python dataset_test.py \
    --metric fdim \
    --save_dir ./data/result \
    --save_name fdim_test \
    --ref_dir <path_to_reference_video_dir> \
    --dis_dir <path_to_distorted_video_dir> \
    --csv_path <path_to_csv_file> \
    --ref_fmt rgb \
    --dis_fmt rgb \
    --preprocess none \
    --video_temp_path ./data/video_temp/ \
    --gpu_idx 0

If the reference video is in YUV format, --ref_width_column , --ref_height_column and --ref_fmt must be provided, if bit_depth is not 8,--ref_bit_depth_column must be provided.

If the distorted video is in YUV format, -dis_width_column , --dis_height_column and --dis_fmt must be provided, if bit_depth is not 8,--dis_bit_depth_column must be provided.

python dataset_test.py \
    --metric fdim \
    --save_dir data/result \
    --save_name fdim_eem_sample \
    --csv_path <path_to_csv_file> \
    --ref_dir <path_to_reference_video_dir> \
    --dis_dir  <path_to_distorted_video_dir> \
    --ref_column <reference video name column in csv file> \
    --dis_column <distorted video name column in csv file> \
    --ref_fmt <reference video format column in csv file> \
    --dis_fmt <distorted video format column in csv file> \
    --ref_width_column <reference video width column in csv file> \
    --ref_height_column <reference video height column in csv file> \
    --dis_width_column <distorted video width column in csv file> \
    --dis_height_column <distorted video height column in csv file> \
    --ref_bit_depth_column <reference video bitdepth column in csv file> \
    --dis_bit_depth_column <distorted video bitdepth column in csv file> \
    --video_temp_path ./data/video_temp/ \
    --gpu_idx 0

Evaluate the quality of a test video

If the reference video and distorted video is not in YUV format.

python single_test.py --metric fdim --ref_video_root <ref_video_path> --dis_video_root <dis_video_path> --video_temp_path ./data/video_temp/ --gpu_idx 0

If the reference/distorted video and distorted video is in YUV format.

python single_test.py \
    --metric fdim \
    --ref_video_root <ref_video_path> \
    --dis_video_root <dis_video_path> \
    --ref_fmt <reference video format, such as yuv420p, yuv420p10le> \
    --dis_fmt <distorted video format, such as yuv420p, yuv420p10le> \
    --ref_width <reference video width> \
    --ref_height <reference video height> \
    --dis_width <distorted video width> \
    --dis_height <distorted video height> \
    --ref_bit_depth <reference video bitdepth> \
    --dis_bit_depth <distorted video bitdepth> \
    --video_temp_path ./data/video_temp/ \
    --gpu_idx 0

Inference HDR content

For PQ/HLG content, enable PU21 preprocessing (--preprocess pu21) and select (or customize) the correct display model (--display_model <name>). The available display model definitions are stored in fdim/dist/pycvvdp/vvdp_data/display_models.json.

If --display_model is not provided while --preprocess pu21 is enabled, the code uses standard_hdr_pq_tv by default.

Reference notes:

pycvvdp in this repository is a vendored third-party module adapted from ColorVideoVDP, which provides the display model and video source utilities used by the HDR preprocessing path.
PU21 refers to the perceptually uniform HDR encoding proposed in "PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR" and is used here through the integrated pycvvdp implementation.

Example:

python single_test.py \
    --metric fdim \
    --ref_video_root /path/to/ref.mp4 \
    --dis_video_root /path/to/dis.mp4 \
    --preprocess pu21 \
    --display_model standard_hdr_pq_tv \
    --video_temp_path ./data/video_temp/ \
    --gpu_idx 0

Low-complexity implementation for 4K videos

If your input videos are 4K and you want faster inference, set the resolution parameter --input_resolution 1080 to downsample frames before the deep model. In our experiments, this significantly improves runtime while only slightly reducing objective-subjective consistency.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
asset		asset
common_utils		common_utils
data/dataset		data/dataset
fdim		fdim
LICENSE		LICENSE
README.md		README.md
dataset_test.py		dataset_test.py
install.sh		install.sh
pyproject.toml		pyproject.toml
single_test.py		single_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FDIM

Introduction

Installation

Prerequisite

Setup Virtualenv

Install packages for inference

Instruction to run FDIM

1. Download the chekpoint/model

2. Prepare video information file

3. Inference

Evaluate the quality of all videos in a dataset

Evaluate the quality of a test video

Inference HDR content

Low-complexity implementation for 4K videos

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FDIM

Introduction

Installation

Prerequisite

Setup Virtualenv

Install packages for inference

Instruction to run FDIM

1. Download the chekpoint/model

2. Prepare video information file

3. Inference

Evaluate the quality of all videos in a dataset

Evaluate the quality of a test video

Inference HDR content

Low-complexity implementation for 4K videos

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages