<a target="_blank" href="https://colab.research.google.com/github/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Lecture12.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

<font size="8" color ="Brown"><center>
Lecture 12, Video Coding \
Quality Measurement
</center></font>
<br>
<p style="line-height:1.5">
<font size="6">Assume, we have a video codec. How do we evaluate this codec, or compare it with other codecs? Especially if we use irrelevance reduction (properties of the eye), the best would be a subjective test, where we have several test viewers look at images under controlled conditions, and have them evaluate the decoded images or videos. This is called subjective quality measurement. Especially for small impairments, it is helpful to give the test viewers a comparison to the original, uncoded video, as can be seen in the following picture, where there is a switch between the original and the decoded video.<br>
    
![Lecture12-1.PNG](https://github.com/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Img-Lecture12/Lecture12-1.PNG?raw=1)
**Fig.1** DSCQS testing system<br>
(source: Richardson: “Video Codec Design”)<br>
<br>
Using this system, the test viewers then evaluate the quality according to a scale, as in following picture:<br>

![Lecture12-2.PNG](https://github.com/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Img-Lecture12/Lecture12-2.PNG?raw=1)
**Fig.2** DSCQS rating form
(source: Richardson: “Video Codec Design”)
<br>   <br>
Observe that we have items A and B. Often, one of the 2 is the original, and the other the item under test. In this way it can be tested if the test viewer recognized the original. A similar methodology can be found for perceptual testing of audio material. To measure small differences, it is useful to have trained test viewers.
<br><br>
This subjective measurement yields the most reliable results, because it involves the human eye, as the target for our application (as long as the eye is the intended viewer). The drawback of it is, that it is quite time consuming and expensive to conduct. To solve this problem, also objective measurements are used, measurements which can be done by using a computer or machine. The most well known and most widely used in the context of video coding is the so-called Peak-SNR or PSNR. If we compute the mean squared error between the original and the decoded video or image as MSE (the average of all squared differences of the pixels of original and decoded version), then the PSNR is computed as:    
 <br>
 $$PSNR= 10\log_{10}(\frac{(2^n-1)^2}{MSE})$$
 <br>
 where n is the number of bits per pixel value (usually 8). Hence $(2^n-1)$represents the peak value in the image (usually 255), and hence the name “Peak” SNR (observe that we conduct this calculation per channel, i.e. for R,G,B or Y,Cb,Cr). The PSNR has the advantage, compared to the SNR, that it is independent of the internal representation of the image, meaning the number of bits per pixel (the SNR would change with the number of pixels).
</font></p>    


<p style="line-height:1.5">
<font size="6">

**Advantage:** Very easy to compute.<br>
**Disadvantage:**<br>
    a) We need the original (not always available)<br><br>
    b) Does not correlate well with human vision (it is only useful for instance to compare different versions of an algorithm, because its basic artifacts would not change)
The latter reason is the bigger problem, because quality comparison is its main objective. Consider for instance the following pictures:<br>
![Lecture12-3.PNG](https://github.com/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Img-Lecture12/Lecture12-3.PNG?raw=1)
![Lecture12-4.PNG](https://github.com/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Img-Lecture12/Lecture12-4.PNG?raw=1) \
**Fig.3** Quality comparison of images<br>
<br>
Observe that the image a) appears with a higher quality than image b), because there the faces are sharper. But the PSNR is higher for image b), saying that image b) should have a higher quality. This is because in image a) the larger background is unsharp. But the PSNR does not distinguish between areas which are important for the eye (faces), and areas which are unimportant (background).
<br><br>
For this reason there are works to improve objective (machine based) measurements to include properties of the eye, to make the resulting quality scores more similar to subjective measurements. One example is the so-called Structural Similarity Measurement (SSIM, or MSSIM, from Al Bovik et al.) which emphasises “structure”, meaning edges in images.<br>
Its performance is illustrated in following images:<br>
![Lecture12-5.PNG](https://github.com/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Img-Lecture12/Lecture12-5.PNG?raw=1)
<br>
**Fig.4** Comparison of “Boat” images with different types of distortions, all with MSE = 210. (a) Original image (8 bits/pixel; cropped from 512 x 512 to 256 x 256 for visibility). (b) Contrast-stretched image, MSSIM = 0,9168.(c) Mean-shifted image, MSSIM = 0,9900. (d) JPEG compressed image, MSSIM = 0,6949. (e) Blurred image, MSSIM = 0,7052. (f) Salt-pepper impulsive noise contaminated image, MSSIM = 0,7748.<br>
<br>
(source: Wan, Bovik, Sheikh: “Image Quality Assessment: From Error Visibility to Structural Similarity”, IEEE Trans. on Image Processing, April 2004.)
<br><br>


</font></p>   

<p style="line-height:1.5">
<font size="6"><b>More Image Quality Measures </b>
<br>
<br>
There are more image quality measure, among them also neural network based measures. Examples are:
VMAF, DISTS, LPIPS
<br>
For an overview, read:
https://www.elecard.com/page/article_interpretation_of_metrics
<br>
For VMAF, read:
https://de.wikipedia.org/wiki/Video_Multi-Method_Assessment_Fusion
<br>
For LPIS, read:
https://github.com/richzhang/PerceptualSimilarity
<br>
For LPIPS: No distortion means value Zero.
Observe that it needs a long time to install, due to the large number of coefficients for its neural network.
</font></p>   

# Python example for using LPIPS:

In [10]:
!git clone https://github.com/TUIlmenauAMS/Videocoding
!pip install lpips
import numpy as np
import cv2
import torch
import lpips

def convtolpips(photo):
    #takes photo from cv2 and converts it for LPIPS
    photoRGB=photo[:,:,::-1] #convert from BGR to RGB
    phototranspose=np.transpose(photoRGB, (2,0,1))
    #primary color index first
    photolpips=torch.from_numpy((
    phototranspose/255).astype(np.float32))
    return  photolpips

photo=cv2.imread('/content/Videocoding/LecturesJupterNotebooks/IMGP1690.JPG');
photodist=0.5*photo #less contrast distortion
photolpips=convtolpips(photo)
photodistlpips=convtolpips(photodist)

loss_fn_alex = lpips.LPIPS(net='alex')
#loss_fn_vgg = lpips.LPIPS(net='vgg')
#loss_fn_vgg(photolpips, photodistlpips)
loss=loss_fn_alex (photolpips, photodistlpips)
print("LPIPS loss= ", loss)

fatal: destination path 'Videocoding' already exists and is not an empty directory.
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /usr/local/lib/python3.10/dist-packages/lpips/weights/v0.1/alex.pth
LPIPS loss=  tensor([[[[0.1260]]]], grad_fn=<AddBackward0>)


<p style="line-height:1.5">
<font size="6">**Other literature:**<br>
- IEEE Transactions on Image Processing, vol. 9, np. 4, pp. 636-650, Apr. 2000<br>
“Image Quality Assessment Based on a Degradation Model”<br>
Niranjan Damera-Venkata, and Thomas D. Kite, Wilson S. Geisler, Brian L. Evans, and Alan C. Bovik.
<br>
- IEEE International Conference on Acoustics, Speech, & Signal Processing, May 2002<br>
“Why is Image Quality Assessment So Difficult?”<br>
Zhou Wang1, Alan C. Bovik1 and Ligang Lu2.<br>
<br>
The area of improved objective measurements is still an active research area. All available on ieeexplore.ieee.org within the network of the TU Ilmenau! (Try it for the  paper of Al Bovic above).
<br>
Future coders often deliver higher resolution at lower bit rates, meaning we have fewer bits per pixel. This means we need more sophisticated models of vision (at least implied) in those coders to obtain a good perceptual quality. This means we also need more sophisticated models of vision for our objective measurement systems.
    
</font></p>   


<font size="8" color ="Brown"><center>
Standards for Representing Digital Video
</center></font>
<br>

<p style="line-height:1.5">
<font size="6">The first digital standards where derived from the analog TV standards (from: Richardson, “Video Codec Design”):
<br><br>
**Table 1** ITU-R BT.601-5 parameters<br>

![Lecture12-6.PNG](https://github.com/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Img-Lecture12/Lecture12-6.PNG?raw=1)
This then leads to the following standards:<br><br>    
**Table 2** Intermediate formats<br>
![Lecture12-7.PNG](https://github.com/TUIlmenauAMS/Videocoding/blob/main/LecturesJupterNotebooks/Lecture12/Img-Lecture12/Lecture12-7.PNG?raw=1)
<br>
The **Common Interchange Format** (CIF) is a widely used format for digital video. 4CIF has a resolution of 4 times the number of pixels of CIF, and has a resolution similar to standard definition TV.
</font></p>    