# Code for evaluating results after running inpainting models
The inpainting models tries to recreate the background  behind an object that we want to remove. To evaluate performance, we are using perceptual hasinh, structural similarity and peak signal noise ratio to compute how "realistic" the output image is. If the model is able to recreate the background in a very realistic way, it should score well in these metrics.
Note that there are two ways of evaluating performance here: 
* For unlabeled data we are using perceptual hashing and hamming distance to compute how realistic the output image is compared to the input image
* For labeled data we are using structural similarity and peak SNR to compare the output image to the "true output".

In [9]:
import cv2
import numpy as np
from skimage.metrics import structural_similarity as ssim
from skimage.metrics import peak_signal_noise_ratio as psnr
import imagehash
from PIL import Image


## Evaluating performance on unlabeled data
Unlabeled data means that the only images we have available is the input image (where some object is present), and the output image (where the inpainting model has tried to recreate the background behind the object). We have no knowledge of what the "true background" behind the object is. Therefore, we are evaluating performance through perceptual hashing. Perceptual hashing provides a fingerprint of the image content, and a smaller hamming distance between hashes of the input and output image indicates higher realism in the generated output image.

In [14]:
# Performance on unlabeled data is evaluated by comparing similarity between perceptual
# hashes of the input and output images. Perceptual hashes provide a fingerprint of the image 
# content, and a smaller Hamming distance between the hashes of the input and output images indicates higher realism.
def perceptual_hash(image):
    return imagehash.average_hash(Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)))

def evaluate_unlabeled(input_image_path, output_image_path):
    # Load images
    input_image = cv2.imread(input_image_path)
    output_image = cv2.imread(output_image_path)

    # Calculate perceptual hashes
    input_hash = perceptual_hash(input_image)
    output_hash = perceptual_hash(output_image)
    
    # Calculate Hamming distance
    hamming_distance = input_hash - output_hash  # Lower values indicate higher similarity
    
    return hamming_distance



## Evaluating performance on labeled data
Labeled data means that in addition to the input image and the generated output image, we have some "true output" image available. In our case, that means that we have some image available where we have physically removed the object that the inpainting models tries to remove. This means that we now know what the background behind the given object looks like. To evaluate perfomance, we are computing the structural similarity (SSIM) and peak signal to noise ratio (PSNR) between the generated output image and the true output image.

In [5]:
def evaluate_labeled(output_image, true_output_image):
    # Convert images to grayscale
    output_gray = cv2.cvtColor(output_image, cv2.COLOR_BGR2GRAY)
    true_output_gray = cv2.cvtColor(true_output_image, cv2.COLOR_BGR2GRAY)
    
    # Calculate SSIM and PSNR
    ssim_score = ssim(true_output_gray, output_gray)
    psnr_score = psnr(true_output_gray, output_gray)
    
    return ssim_score, psnr_score

In [18]:
# Testing
input_image_path="images/inputs/truck_input.png"
output_image_path="images/outputs/truck_output.png"
output_image_path2="images/outputs/truck_output_tight.png"

hamming1 = evaluate_unlabeled(input_image_path,output_image_path)
hamming2 = evaluate_unlabeled(input_image_path,output_image_path2)
hamming3 = evaluate_unlabeled(input_image_path,input_image_path)

print(hamming1)
print(hamming2)
print(hamming3)
 

7
13
0
