FDIM is a feature-distance-based generic VQA metric that generalizes across:
- Traditional and neural codecs
- SDR and HDR formats
- Diverse resolutions and content types
FDIM adopts a hybrid architecture:
- Deep branch: learns multi-scale representations to capture distortions from low-level fidelity degradation to high-level semantic differences, with a content-adaptive feature comparison mechanism.
- Hand-crafted branch: enhances robustness and generalization across domains.
FDIM is trained on the large-scale DCVQA dataset (16k+ samples with both conventional and neural codecs), and demonstrates strong and consistent performance across multiple public SDR/HDR VQA benchmarks.
The software package offers one video quality metrics to evaluate a single compressed video or a set of compressed videos, in YUV or RGB formats.
- [verified] CUDA 12.2
- python 3.9
- ffmpeg in PATH
conda create -n fdim python=3.9.20
conda activate fdim
Run the following command from the repository root:
pip install -e .
You can also use:
bash install.sh
Note: install.sh will create and activate a conda environment named fdim automatically. If you already created and activated the environment manually, prefer pip install -e . to avoid duplicated setup.
The default checkpoint path used by the scripts is:
- fdim: put it in
./fdim/dist/checkpoints/
The current repository uses ./fdim/dist/checkpoints/dist_5.0.0.ckpt by default. If you want to use another checkpoint, pass it explicitly with --model_path <path_to_checkpoint>.
Create a CSV file in ./data/dataset/ and enter the information of all the video you want to evaluate as follow:
| ref_name | dis_name | mos | ref_width | ref_height | dis_width | dis_height | ref_bits | dis_bits |
|---|---|---|---|---|---|---|---|---|
| SRC1001_1920x1080_25_yuv420p.mp4 | SRC1001_1920x1080_25_yuv420p.mp4.x265.r0.265.mp4 | 4.854890404 | 1920 | 1080 | 1920 | 1080 | 8 | 8 |
- ref_name: The name of the reference video.
- dis_name: The name of the test video.
- mos: The ground truth of video quality. If unavailable, set it to 0.
- ref_width, ref_height: The video resolution of reference video.
- dis_width, dis_height: The video resolution of distorted video.
If your yuv videos are 8bit, you don't need the "ref_bits" and "dis_bits" columns.
Add the execute permission.
chmod +x ./fdim/vmaf/vmaf
-
If the reference video and distorted video is not in YUV format.
python dataset_test.py \ --metric fdim \ --save_dir ./data/result \ --save_name fdim_test \ --ref_dir <path_to_reference_video_dir> \ --dis_dir <path_to_distorted_video_dir> \ --csv_path <path_to_csv_file> \ --ref_fmt rgb \ --dis_fmt rgb \ --preprocess none \ --video_temp_path ./data/video_temp/ \ --gpu_idx 0 -
If the reference video is in YUV format,
--ref_width_column,--ref_height_columnand--ref_fmtmust be provided, if bit_depth is not 8,--ref_bit_depth_columnmust be provided.If the distorted video is in YUV format,
-dis_width_column,--dis_height_columnand--dis_fmtmust be provided, if bit_depth is not 8,--dis_bit_depth_columnmust be provided.python dataset_test.py \ --metric fdim \ --save_dir data/result \ --save_name fdim_eem_sample \ --csv_path <path_to_csv_file> \ --ref_dir <path_to_reference_video_dir> \ --dis_dir <path_to_distorted_video_dir> \ --ref_column <reference video name column in csv file> \ --dis_column <distorted video name column in csv file> \ --ref_fmt <reference video format column in csv file> \ --dis_fmt <distorted video format column in csv file> \ --ref_width_column <reference video width column in csv file> \ --ref_height_column <reference video height column in csv file> \ --dis_width_column <distorted video width column in csv file> \ --dis_height_column <distorted video height column in csv file> \ --ref_bit_depth_column <reference video bitdepth column in csv file> \ --dis_bit_depth_column <distorted video bitdepth column in csv file> \ --video_temp_path ./data/video_temp/ \ --gpu_idx 0
-
If the reference video and distorted video is not in YUV format.
python single_test.py --metric fdim --ref_video_root <ref_video_path> --dis_video_root <dis_video_path> --video_temp_path ./data/video_temp/ --gpu_idx 0
-
If the reference/distorted video and distorted video is in YUV format.
python single_test.py \ --metric fdim \ --ref_video_root <ref_video_path> \ --dis_video_root <dis_video_path> \ --ref_fmt <reference video format, such as yuv420p, yuv420p10le> \ --dis_fmt <distorted video format, such as yuv420p, yuv420p10le> \ --ref_width <reference video width> \ --ref_height <reference video height> \ --dis_width <distorted video width> \ --dis_height <distorted video height> \ --ref_bit_depth <reference video bitdepth> \ --dis_bit_depth <distorted video bitdepth> \ --video_temp_path ./data/video_temp/ \ --gpu_idx 0
For PQ/HLG content, enable PU21 preprocessing (--preprocess pu21) and select (or customize) the correct display model (--display_model <name>). The available display model definitions are stored in fdim/dist/pycvvdp/vvdp_data/display_models.json.
If --display_model is not provided while --preprocess pu21 is enabled, the code uses standard_hdr_pq_tv by default.
Reference notes:
pycvvdpin this repository is a vendored third-party module adapted from ColorVideoVDP, which provides the display model and video source utilities used by the HDR preprocessing path.PU21refers to the perceptually uniform HDR encoding proposed in "PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR" and is used here through the integratedpycvvdpimplementation.
Example:
python single_test.py \
--metric fdim \
--ref_video_root /path/to/ref.mp4 \
--dis_video_root /path/to/dis.mp4 \
--preprocess pu21 \
--display_model standard_hdr_pq_tv \
--video_temp_path ./data/video_temp/ \
--gpu_idx 0
If your input videos are 4K and you want faster inference, set the resolution parameter --input_resolution 1080 to downsample frames before the deep model. In our experiments, this significantly improves runtime while only slightly reducing objective-subjective consistency.



