# **Automated algorithmic bias analysis of Twitter saliency filter**



## Author: 
## [**Dr. Rahul Remanan**](https://linkedin.com/in/rahulremanan), 
### [**CEO, Moad Computer (A division of Ekaveda Inc.)**](https://moad.computer)

This notebook introduces a few broad concepts, that will help develop automated testing tools to detect algorithmic bias in machine vision tools, such as saliency filters.

The tool evaluated here is the [Twitter saliency filter](https://github.com/twitter-research/image-crop-analysis).

[FairFace: the face attribute dataset that is balanced for gender, race and age](https://arxiv.org/abs/1908.04913v1); is used here to generate the random image pairs for performing the saliency filter tests.

Quantification of the statisitcal significance in differences between the carefully manipulated saliency filter outputs and the baseline saliency filter outputs, is performed using the [Wilcoxon signed rank test](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test).

### Additional requirements

* Valid Google account
* This notebook by default assumes that the user is working inside the original [Google Colab environment](https://colab.research.google.com/drive/1eZpt6KPtrlA2egvuTnyS31v3UqDJScCD?usp=sharing). To run locally or in other cloud environments, please make sure that the data dependencies are satisfied.
* Google Drive access to save the FairFace dataset and the experiment history



[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eZpt6KPtrlA2egvuTnyS31v3UqDJScCD?usp=sharing)

```
Parts of the code used in this notebook are copyright protected.
Copyright 2021 Twitter, Inc.
SPDX-License-Identifier: Apache-2.0
```

# Install Twitter saliency filter

In [None]:
import logging
from pathlib import Path

logging.basicConfig(level=logging.ERROR)
BIN_MAPS = {"Darwin": "mac", "Linux": "linux"}

HOME_DIR = Path("../").expanduser()

try:
  import google.colab
  !python3 -m pip install -q pandas scikit-learn scikit-image statsmodels requests dash
  ![[ -d image-crop-analysis ]] || git clone https://github.com/twitter-research/image-crop-analysis.git
  HOME_DIR = Path("./image-crop-analysis").expanduser()
  IN_COLAB = True
except:
  IN_COLAB = False

In [None]:
import sys, platform
sys.path.append(str(HOME_DIR / "src"))
bin_dir = HOME_DIR / Path("./bin")
bin_path = bin_dir / BIN_MAPS[platform.system()] / "candidate_crops"
model_path = bin_dir / "fastgaze.vxm"
data_dir = HOME_DIR / Path("./data/")
data_dir.exists()

# Import dependencies

In [None]:
import os,gc,json,glob,shlex,random,platform,warnings,subprocess,numpy as np, \
       pandas as pd,matplotlib.pyplot as plt,matplotlib.image as mpimg

from PIL import Image
from tqdm.auto import tqdm
from scipy.stats import wilcoxon
from collections import namedtuple
from matplotlib.patches import Rectangle
from image_manipulation import join_images
from matplotlib.collections import PatchCollection
from crop_api import ImageSaliencyModel, is_symmetric, parse_output, reservoir_sampling

# Mount Google Drive

By default this notebook assumes that the FairFace dataset is stored in the Google Drive attached here. Also, the experimental histories are saved to the Google Drive attached to this Colab notebook in `csv` format.

## Data download
Download the FairFace dataset **`fairface-img-margin125-trainval.zip`** file and the labels **`fairface_label_train.csv`** file from the official **[FairFace GitHub repo](https://github.com/joojs/fairface)**.

In [None]:
img_dir = './'
if IN_COLAB:
  from google.colab import drive
  drive.mount('/content/drive')
  img_dir = '/content/drive/MyDrive/'
fairface_dir = f'{img_dir}/FairFace/'
if not os.path.exists(f'{fairface_dir}/fairface-img-margin125-trainval.zip'):
  raise ValueError(f'Please check whether the FairFace dataset zip file exists at: {fairface_dir}/fairface-img-margin125-trainval.zip')
if not os.path.exists(f'{fairface_dir}/fairface_label_train.csv'):
  raise ValueError(f'Please check whether the FairFace data labels csv file exists at: {fairface_dir}/fairface_label_train.csv')

# FairFace helper functions

In [None]:
def random_imgID_generator(df, pairs=True):
  num_images = len(df)
  id1 = random.SystemRandom().choice(range(0,num_images))
  if pairs:
    id2 = random.SystemRandom().choice(range(0,num_images))
    return id1, id2
  return id1

In [None]:
def eval_conditions(df, id1, id2):
  id_condition = id1 == id2
  race_condition = str(df.iloc[id2].race).lower()==str(df.iloc[id1].race).lower()
  return id_condition, race_condition

In [None]:
def img_pairs_filter(df,id1,id2,max_retries=100):
  id_condition, race_condition = eval_conditions(df, id1, id2)
  if id_condition or race_condition:
    for i in tqdm(range(max_retries)):
      id2 = random_imgID_generator(df, pairs=False)
      tqdm.write(f'FairFace pair generation attempt {i+1}/{max_retries}')
      id_condition, race_condition = eval_conditions(df, id1, id2)
      if not id_condition and not race_condition:
        break
    print(f'Generated FairFace pairs in attempt: {i+1}/{max_retries}')    
  print(f'FairFace images {id1+1} and {id2+1} selected for evaluation using Twitter Saliency algorithm ...\n')
  return id1, id2

In [None]:
def img_info(df, id1, id2=None, verbose=False):
  if verbose:
    print(f'Labels for {id1+1} ...\n')
    print(df.iloc[id1])
    print('\n','-'*32)
  info1 = { 'file': df['file'].iloc[id1].split('/')[-1].replace('.jpg',''),
            'race': df['race'].iloc[id1],
            'gender': df['gender'].iloc[id1],
            'age': df['age'].iloc[id1] }
  if id2 is not None:
    info2 = { 'file': df['file'].iloc[id2].split('/')[-1].replace('.jpg',''),
              'race': df['race'].iloc[id2],
              'gender': df['gender'].iloc[id2],
              'age': df['age'].iloc[id2] }
    if verbose:
      print(f'\nLabels for {id2+1} ...\n')
      print(df.iloc[id2])
    return info1, info2
  return info1

In [None]:
def execute_in_shell(command, verbose=False):
    """ 
        command -- keyword argument, takes a list as input
        verbsoe -- keyword argument, takes a boolean value as input
    
        This is a function that executes shell scripts from within python.
        
        Keyword argument 'command', should be a list of shell commands.
        Keyword argument 'versboe', should be a boolean value to set verbose level.
        
        Example usage: execute_in_shell(command = ['ls ./some/folder/',
                                                    ls ./some/folder/  -1 | wc -l'],
                                        verbose = True ) 
                                        
        This command returns dictionary with elements: Output and Error.
        
        Output records the console output,
        Error records the console error messages.
                                        
    """
    error = []
    output = []
    
    if isinstance(command, list):
        for i in range(len(command)):
            try:
                process = subprocess.Popen(command[i], shell=True, stdout=subprocess.PIPE)
                process.wait()
                out, err = process.communicate()
                error.append(err)
                output.append(out)
                if verbose:
                    print ('Success running shell command: {}'.format(command[i]))
            except Exception as e:
                print ('Failed running shell command: {}'.format(command[i]))
                if verbose:
                    print(type(e))
                    print(e.args)
                    print(e)
                    print(logging.error(e, exc_info=True))
    else:
        raise ValueError('Expects a list input ...')
    return {'Output': output, 'Error': error }

In [None]:
def clear_image_history(out_dir):
   _ = execute_in_shell([f'rm -r {out_dir}/*.jpg'])

In [None]:
def get_fairface_img(df, img_id, out_dir, fairface_data):
  file_ = str(df.iloc[img_id].file)
  _ = execute_in_shell([f'unzip -j -q {fairface_data} {file_} -d {out_dir}'])

In [None]:
def randomID_generator():
  return ''.join(
           random.SystemRandom().sample(
             list(
               'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmopqrstuvwxyz0123456789'
               ),8))

In [None]:
def fairface_data_checks(fairface_data):
  if not os.path.exists(fairface_data):
    raise ValueError(f"Couldn't find FairFace data archive: {fairface_data}. \nPlease download FairFace data from: https://github.com/joojs/fairface and save the zip file in: {fairface_dir}")
  fairface_labels = f'{fairface_dir}/fairface_label_train.csv'
  if not os.path.exists(fairface_labels):
    raise ValueError(f"Couldn't find FairFace data labels: {fairface_labels}. \nPlease download FairFace data labels from: https://github.com/joojs/fairface and save the csv file in: {fairface_labels}")
  return fairface_labels

# Read FairFace data
The FairFace dataset should be downloaded and placed insides the `{img_dir}/FairFace` directory. By default the notebook uses the `fairface-img-margin125-trainval.zip` FairFace data zip archive.

In [None]:
unzip_dir = str(data_dir.absolute())
fairface_data = f'{fairface_dir}/fairface-img-margin125-trainval.zip'

## Checks for FairFace data

In [None]:
img_labels = pd.read_csv(fairface_data_checks(fairface_data))
img_labels.head()
num_images = len(img_labels)
print(f'Total number of FairFace images: {num_images}')

# Generate random face pairings

In [None]:
img_idx1,img_idx2 = random_imgID_generator(img_labels)  
max_retries = 2000
img_idx1, img_idx2 = img_pairs_filter(img_labels,img_idx1,img_idx2,
                                      max_retries=max_retries)

In [None]:
img_info(img_labels, img_idx1, img_idx2)

# Numerical encoding of the FairFace labels

In [None]:
twitter_saliency_eval_dir = f'{img_dir}//Twitter_saliency'
if not os.path.exists(twitter_saliency_eval_dir):
  print(f'No outputs directory: {twitter_saliency_eval_dir} found ...')
  execute_in_shell([f'mkdir {twitter_saliency_eval_dir}'])
  print(f'Created outputs directory: {twitter_saliency_eval_dir}')

In [None]:
labels_encoder_file = f'{twitter_saliency_eval_dir}/labels_encoder.json'
if os.path.exists(labels_encoder_file):
  with open(labels_encoder_file) as f:
    labels_encoder = json.loads(f.read())
  print(labels_encoder)
  print(f'Loaded labels encoder data from: {labels_encoder_file} ...')  
else:
  print(f'No saved labels encoder data: {labels_encoder_file} ...')
  labels_encoder = {}
  for i, race in enumerate(sorted(list(set(img_labels['race'].values)))):
    labels_encoder.update({race: i})
  print(labels_encoder)
  with open(labels_encoder_file, 'w+') as f:
    json.dump(labels_encoder, f)
  print(f'Saved labels encoder data to: {labels_encoder_file} ...')

In [None]:
def encoded_labels(input_label, labels_encoder):
  return labels_encoder[input_label]
def decoded_labels(input_label, labels_encoder):
  return list(labels_encoder.keys())[list(labels_encoder.values()).index(input_label)]

# Build pairwise image comparisons using the Twitter saliency filter

In [None]:
clear_image_history(unzip_dir)
get_fairface_img(img_labels, img_idx1, unzip_dir, fairface_data)
get_fairface_img(img_labels, img_idx2, unzip_dir, fairface_data)

In [None]:
img_path = next(data_dir.glob("./*.jpg"))
img_path

In [None]:
for img_file in data_dir.glob("./*.jpg"):
  img = mpimg.imread(img_file)
  plt.figure()
  plt.imshow(img)
  plt.gca().add_patch(
      Rectangle((0, 0), 200, 112, linewidth=1, edgecolor="r", facecolor="none")
  )

In [None]:
cmd = f"{str(bin_path)} {str(model_path)} '{img_path.absolute()}' show_all_points"
cmd

In [None]:
output = subprocess.check_output(cmd, shell=True)  # Success!
print(output.splitlines())

In [None]:
!{str(bin_path)} {str(model_path)} '{img_path.absolute()}' show_all_points | head

In [None]:
parse_output(output).keys()

In [None]:
model = ImageSaliencyModel(crop_binary_path=bin_path, crop_model_path=model_path)

In [None]:
plt.matplotlib.__version__

In [None]:
list(data_dir.glob("./*.jpg"))

In [None]:
for img_path in data_dir.glob("*.jpg"):
    print(img_path)
    model.plot_img_crops(img_path)

In [None]:
for img_path in reservoir_sampling(data_dir.glob("./*.jpg"), K=5):
  model.plot_img_crops(img_path)

## Crop an image generated using combination of images

* The top 3 crops are sampled based on saliency scores converted into probs using the following formula:

$$
\begin{equation}
p_i = \frac{exp(s_i)}{Z}\\
Z = \sum_{j=0}^{j=N} exp(s_j)
\end{equation}
$$

In [None]:
img_id1 = str(img_labels.iloc[img_idx1].file).split('/')[-1].replace('.jpg','')
img_race1 = str(img_labels.iloc[img_idx1].race)
img_gender1 = str(img_labels.iloc[img_idx1].gender)
img_id2 = str(img_labels.iloc[img_idx2].file).split('/')[-1].replace('.jpg','')
img_race2 = str(img_labels.iloc[img_idx2].race)
img_gender2 = str(img_labels.iloc[img_idx2].gender)
file_id = f'{img_id1}_{img_race1}_{img_gender1}--{img_id2}_{img_race2}_{img_gender2}'

In [None]:
output_dir = './'
padding = 0
instance_id = randomID_generator()
filename = f'{instance_id}_{file_id}_p{padding}'

# Helper functions to map the saliency filter output to FairFace data

In [None]:
def saliency_to_image(img, saliency_point, img_files, padding=0, image_mode='horizontal'):
  if image_mode == 'horizontal':
    saliency_idx = 0
  elif image_mode == 'vertical':
    saliency_idx = 1
  else:
    raise ValueError('Unsupported image mode. \nOnly horizontal and vertical image combinations are currently supported ...')
  for i in range(len(img_files)):
    if len(saliency_point)>1:
      warnings.warn('Only reading the first saliency point. \nParsing  one saliency point is currently supported ...')
    saliency_image_idx =  0  
    if (img.size[saliency_idx]-saliency_point[0][saliency_idx]) < (
        img.size[saliency_idx]-(i*img.size[saliency_idx]/len(img_files))):
       saliency_image_idx = i
  if saliency_image_idx < len(img_files):
    return img_files[saliency_image_idx]
  else:
    return img_files[-1]

In [None]:
def saliency_point_to_info(input_file, img_files, model, image_mode='horizontal'):
  saliency_point = model.get_output(Path(input_file))['salient_point']
  img = Image.open(input_file)
  saliency_img_file = saliency_to_image(img, saliency_point, img_files, image_mode=image_mode)
  try:
    saliency_filename = saliency_img_file.absolute()
  except AttributeError:
    saliency_filename = str(saliency_img_file)
  saliencyID = str(saliency_filename).split('/')[-1].replace('.jpg','')
  saliency_info = img_info(img_labels, int(saliencyID)-1)
  return saliency_info, saliency_point

In [None]:
img_files = list(data_dir.glob("./*.jpg"))
images = [Image.open(x) for x in img_files]
img = join_images(images, col_wrap=2, img_size=(128, -1))
img

In [None]:
img.save(f"{output_dir}/{filename}_h.jpeg", "JPEG")
model.plot_img_crops_using_img(img, topK=5, col_wrap=6)
plt.savefig(f"{output_dir}/{filename}_h_sm.jpeg",bbox_inches="tight")

In [None]:
saliency_info,sp = saliency_point_to_info(f"{output_dir}/{filename}_h.jpeg", img_files, model, image_mode='horizontal')
encoded_labels(saliency_info['race'],labels_encoder)
decoded_labels(encoded_labels(saliency_info['race'],labels_encoder),labels_encoder)

In [None]:
images = [Image.open(x) for x in img_files]
img = join_images(images, col_wrap=1, img_size=(128, -1))
img

In [None]:
img.save(f"{output_dir}/{filename}_v.jpeg", "JPEG")
model.plot_img_crops_using_img(img, topK=5, col_wrap=6)
plt.savefig(f"{output_dir}/{filename}_v_sm.jpeg",bbox_inches="tight")

In [None]:
saliency_point = model.get_output(Path(f"{output_dir}/{filename}_v.jpeg"))['salient_point']
print(saliency_point)
saliency_image = saliency_to_image(img, saliency_point, img_files, image_mode='vertical')
saliency_filename = saliency_image.absolute()
print(f'Image picked by saliency filter: {saliency_filename}')
saliencyID = str(saliency_filename).split('/')[-1].replace('.jpg','')
saliency_info = img_info(img_labels, int(saliencyID)-1)
print(saliency_info)

# Evaluate horizontal and vertical padding invariance

## Load experiment history
The experiment hisotry is stored in `{img_dir}/Twitter_saliency/FairFace_pairwise_tests.csv`.

In [None]:
pairwise_tests_data = f'{img_dir}/Twitter_saliency/FairFace_pairwise_tests.csv'
if os.path.exists(pairwise_tests_data):
  pairwise_df = pd.read_csv(pairwise_tests_data)
  print(f'Loaded pairwise experiments history from: {pairwise_tests_data} ...')
  experiment_ids = list(pairwise_df['experiment_id'].values)
  instance_ids   = list(pairwise_df['instance_id'].values)
  img1           = list(pairwise_df['img1'].values)
  img2           = list(pairwise_df['img2'].values)
  baseline_h1    = list(pairwise_df['baseline_h1'].values)
  baseline_h2    = list(pairwise_df['baseline_h2'].values)
  baseline_v1    = list(pairwise_df['baseline_v1'].values)
  baseline_v2    = list(pairwise_df['baseline_v2'].values)
  saliency_out   = list(pairwise_df['saliency_out'].values)
  combine_mode   = list(pairwise_df['combine_mode'].values)
else:
  pairwise_df = pd.DataFrame()
  experiment_ids = []
  instance_ids   = []
  img1           = []
  img2           = []
  baseline_h1    = []
  baseline_h2    = []
  baseline_v1    = []
  baseline_v2    = []
  saliency_out   = []
  combine_mode   = []

In [None]:
debug = False

In [None]:
padding_eval = {'horizontal': {'padding_blocks': {1: {'max': 25, 'min': 0}}},
                'vertical': {'padding_blocks': {1: {'max': 25, 'min': 0}}}} if debug else \
              {'horizontal': { 
                    'padding_blocks': {
                         1: {'min': 0, 'max': 25},
                         2: {'min': 25, 'max': 75},
                         3: {'min': 75, 'max': 300},
                      }
                    },
                 'vertical': { 
                     'padding_blocks': {
                         1: {'min': 0, 'max': 25},
                         2: {'min': 25, 'max': 75},
                         3: {'min': 75, 'max': 300},
                     }
                   }
               }

In [None]:
output_dir =f'{img_dir}/Twitter_saliency/FairFace_pairwise_tests/'
num_eval = 1
for i in range(len(padding_eval)):
  eval_key = list(padding_eval.keys())[i]
  label_id = eval_key
  if  eval_key == 'horizontal':
    label_id = 'h'
    num_cols = 2
  elif eval_key == 'vertical':
    label_id = 'v'
    num_cols = 1
  padding_blocks = padding_eval[eval_key]['padding_blocks'] 
  for j in range(len(padding_blocks)):
    for k in tqdm(range(num_eval)):
      instance_id = randomID_generator()
      image_files = glob.glob(str(data_dir / Path("./*.jpg")))
      random.SystemRandom().shuffle(image_files)
      images = [Image.open(f)for f in image_files]
      padding_ranges = padding_blocks[j+1]
      padding = random.SystemRandom().choice(range(padding_ranges['min'],
                                                   padding_ranges['max']))
      print(f'Using a padding value: {padding}')
      img = join_images(images, col_wrap=num_cols, img_size=(128,128),
                        padding=padding)
      filename = f'{instance_id}_{file_id}_p{padding}_t{k}_{label_id}'
      output_file = f"{output_dir}/{filename}.jpeg"
      img.save(output_file, "JPEG")
      model.plot_img_crops_using_img(img, topK=5, col_wrap=6)
      saliency_info,sp = saliency_point_to_info(output_file, img_files, model, image_mode='horizontal')
      plt.savefig(f"{output_dir}/{filename}_sm.jpeg",bbox_inches="tight")

In [None]:
model.plot_img_crops(data_dir / Path(f"{img_id1}.jpg"), topK=2, aspectRatios=[0.56])
plt.savefig(f"{img_id1}_{img_race1}_{img_gender1}_saliency.jpeg", bbox_inches="tight")

In [None]:
model.plot_img_crops(data_dir / Path(f"{img_id2}.jpg"), topK=2, aspectRatios=[0.56])
plt.savefig(f"{img_id2}_{img_race2}_{img_gender2}_saliency.jpeg", bbox_inches="tight")

# Randomized saliency filter testing for padding invariance

## Null hypothesis
**H₀** --> There are no differences between the baseline outputs of the saliency filter and the saliency filter outputs following randomized image paddings.

## Methodology for generating randomized image pairs from FairFace data
Randomization of the images for the pairwise comparisons are generated using the `random.SystemRandom()` class in the [Python **`random`** library](https://docs.python.org/3/library/random.html). 

The use of **`random.SystemRandom()`** class means, the exact image pairings are always dependent on the random numbers provided by the operating system sources. This method of random number generation is not available on all systems. Since this does not rely on the software state, the image pairing sequences are not reproducible. 

The goal of this experiment is to identify the existence of any statistical significant differences between the saliency filter outputs using baseline image pairs and the saliency filter outputs following randomized image padding. Therefore, the exact image pairing sequences used for the saliency filter output comparisons are immaterial for the reproducibility of this experiment.

In [None]:
num_pairwise_tests = 1 if debug else 2
num_eval = 1 if debug else 10
len(experiment_ids)

In [None]:
for _ in tqdm(range(num_pairwise_tests)):
  img_idx1,img_idx2 = random_imgID_generator(img_labels)
  max_retries = 2000
  img_idx1, img_idx2 = img_pairs_filter(img_labels,img_idx1,img_idx2,
                                        max_retries=max_retries)
  img1_info,img2_info = img_info(img_labels, img_idx1, img_idx2)

  clear_image_history(unzip_dir)
  get_fairface_img(img_labels, img_idx1, unzip_dir, fairface_data)
  get_fairface_img(img_labels, img_idx2, unzip_dir, fairface_data)

  img_id1 = str(img_labels.iloc[img_idx1].file).split('/')[-1].replace('.jpg','')
  img_race1 = str(img_labels.iloc[img_idx1].race)
  img_gender1 = str(img_labels.iloc[img_idx1].gender)

  img_id2 = str(img_labels.iloc[img_idx2].file).split('/')[-1].replace('.jpg','')
  img_race2 = str(img_labels.iloc[img_idx2].race)
  img_gender2 = str(img_labels.iloc[img_idx2].gender)

  file_id = f'{img_id1}_{img_race1}_{img_gender1}--{img_id2}_{img_race2}_{img_gender2}'
  experiment_id = randomID_generator()

  image_files = glob.glob(str(data_dir / Path("./*.jpg")))

  output_dir =f'{img_dir}/Twitter_saliency/FairFace_pairwise_tests/'
  filename = f'{experiment_id}_{file_id}_{label_id}'

  images = [Image.open(f)for f in image_files]
  img = join_images(images, col_wrap=1, img_size=(128,128))
  output_file = f"{output_dir}/{filename}_baseline_v1.jpeg"
  img.save(output_file, "JPEG")
  model.plot_img_crops_using_img(img, topK=5, col_wrap=6)
  baselinev1_saliency_info,sp = saliency_point_to_info(Path(output_file).as_posix(), image_files, model, image_mode='vertical')
  if debug:
    print(image_files)
    print(baselinev1_saliency_info,sp)
  plt.savefig(f"{output_dir}/{filename}_baseline_v1_sm.jpeg",bbox_inches="tight")
  if not debug:
    plt.close()
  _=gc.collect()

  image_files.reverse()
  images = [Image.open(f)for f in image_files]
  img = join_images(images, col_wrap=1, img_size=(128,128))
  output_file = f"{output_dir}/{filename}_baseline_v2.jpeg"
  img.save(output_file, "JPEG")
  model.plot_img_crops_using_img(img, topK=5, col_wrap=6)
  baselinev2_saliency_info,sp = saliency_point_to_info(Path(output_file).as_posix(), image_files, model, image_mode='vertical')
  if debug:
    print(image_files)
    print(baselinev2_saliency_info,sp)
  plt.savefig(f"{output_dir}/{filename}_baseline_v2_sm.jpeg",bbox_inches="tight")
  if not debug:
    plt.close()
  _=gc.collect()

  images = [Image.open(f)for f in image_files]
  img = join_images(images, col_wrap=2, img_size=(128,128))
  output_file = f"{output_dir}/{filename}_baseline_h1.jpeg"
  img.save(output_file, "JPEG")
  model.plot_img_crops_using_img(img, topK=5, col_wrap=6)
  baselineh1_saliency_info,sp = saliency_point_to_info(Path(output_file).as_posix(), image_files, model, image_mode='horizontal')
  if debug:
    print(image_files)
    print(baselineh1_saliency_info,sp)
  plt.savefig(f"{output_dir}/{filename}_baseline_h1_sm.jpeg",bbox_inches="tight")
  if not debug:
    plt.close()
  _=gc.collect()

  image_files.reverse()
  images = [Image.open(f)for f in image_files]
  img = join_images(images, col_wrap=2, img_size=(128,128))
  output_file = f"{output_dir}/{filename}_baseline_h2.jpeg"
  img.save(output_file, "JPEG")
  model.plot_img_crops_using_img(img, topK=5, col_wrap=6)
  baselineh2_saliency_info,sp = saliency_point_to_info(Path(output_file).as_posix(), image_files, model, image_mode='horizontal')
  if debug:
    print(image_files)
    print(baselineh2_saliency_info,sp)
  plt.savefig(f"{output_dir}/{filename}_baseline_h2_sm.jpeg",bbox_inches="tight")
  if not debug:
    plt.close()
  _=gc.collect()

  for i in range(len(padding_eval)):
    eval_key = list(padding_eval.keys())[i]
    label_id = eval_key
    if  eval_key == 'horizontal':
      label_id = 'h'
      num_cols = 2
    elif eval_key == 'vertical':
      label_id = 'v'
      num_cols = 1

    padding_blocks = padding_eval[eval_key]['padding_blocks'] 
    for j in range(len(padding_blocks)):
      for k in tqdm(range(num_eval)):
        instance_id = randomID_generator()
        random.SystemRandom().shuffle(image_files)
        images = [Image.open(f)for f in image_files]
        padding_ranges = padding_blocks[j+1]
        padding = random.SystemRandom().choice(range(padding_ranges['min'],
                                                     padding_ranges['max']))
        img = join_images(images, col_wrap=num_cols, img_size=(128,128),
                          padding=padding)
        filename = f'{instance_id}_{file_id}_p{padding}_t{k}_{label_id}'
        output_file = f"{output_dir}/{filename}.jpeg"
        img.save(output_file, "JPEG")
        model.plot_img_crops_using_img(img, topK=5, col_wrap=3)
        sm_output_file = f"{output_dir}/{filename}_sm.jpeg"
        plt.savefig(sm_output_file,bbox_inches="tight")
        saliency_info,sp = saliency_point_to_info(Path(output_file).as_posix(), image_files, model, image_mode=eval_key)
        if debug:
          print(image_files)
          print(saliency_info,sp)
        
        experiment_ids.append(experiment_id)
        instance_ids.append(instance_id)
        img1.append(img1_info)
        img2.append(img2_info)
        baseline_h1.append(encoded_labels(baselineh1_saliency_info['race'],labels_encoder))
        baseline_h2.append(encoded_labels(baselineh2_saliency_info['race'],labels_encoder))
        baseline_v1.append(encoded_labels(baselinev1_saliency_info['race'],labels_encoder))
        baseline_v2.append(encoded_labels(baselinev2_saliency_info['race'],labels_encoder))
        saliency_out.append(encoded_labels(saliency_info['race'],labels_encoder))
        combine_mode.append(eval_key)

        if not debug:
          plt.close()
        _=gc.collect()

In [None]:
pairwise_df = pd.DataFrame()

In [None]:
pairwise_df['experiment_id']= experiment_ids
pairwise_df['instance_id']=instance_ids
pairwise_df['img1']=img1
pairwise_df['img2']=img2
pairwise_df['baseline_h1']=baseline_h1
pairwise_df['baseline_h2']=baseline_h2
pairwise_df['baseline_v1']=baseline_v1
pairwise_df['baseline_v2']=baseline_v2
pairwise_df['saliency_out']=saliency_out
pairwise_df['combine_mode']=combine_mode 

In [None]:
print(len(pairwise_df))

# Calculate statistical significance

[Wilcoxon signed rank test](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is used to calculate whether there are any statistically significant differences between the baseline saliency filter outputs and the saliency filter outputs following image padding. The Wilcoxon signed rank test is performed using the [SciPy library](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html).

In [None]:
w, p = wilcoxon(pairwise_df['baseline_h2']-pairwise_df['saliency_out'])
print(w,p)
pairwise_df['globalh2_wt_p'] = p
pairwise_df['globalh2_wt_w'] = w
w, p = wilcoxon(pairwise_df['baseline_h1']-pairwise_df['saliency_out'])
print(w,p)
pairwise_df['globalh1_wt_p'] = p
pairwise_df['globalh1_wt_w'] = w
w, p = wilcoxon(pairwise_df['baseline_v2']-pairwise_df['saliency_out'])
print(w,p)
pairwise_df['globalv2_wt_p'] = p
pairwise_df['globalv2_wt_w'] = w
w, p = wilcoxon(list(pairwise_df['baseline_v1']-pairwise_df['saliency_out']))
print(w,p)
pairwise_df['globalv1_wt_p'] = p
pairwise_df['globalv1_wt_w'] = w

In [None]:
pairwise_df['localh2_wt_p'] = np.nan
pairwise_df['localh2_wt_w'] = np.nan
pairwise_df['localh1_wt_p'] = np.nan
pairwise_df['localh1_wt_w'] = np.nan
pairwise_df['localv2_wt_p'] = np.nan
pairwise_df['localv2_wt_w'] = np.nan
pairwise_df['localv1_wt_p'] = np.nan
pairwise_df['localv1_wt_w'] = np.nan

In [None]:
for expID in tqdm(list(set(list(pairwise_df.experiment_id.values)))):
  condition = pairwise_df['experiment_id'] == expID

  diff = list(pairwise_df.loc[condition,['baseline_h2']].values-pairwise_df.loc[condition,['saliency_out']].values)
  diff = [list(d)[0]for d in diff]
  try:
    w, p = wilcoxon(diff)
    pairwise_df.loc[condition,'localh2_wt_p'] = p
    pairwise_df.loc[condition,'localh2_wt_w'] = w
  except ValueError as e:
    print(f'Skipping Wilcoxon Signed Rank test for: {expID} due to: \n{e}')

  diff = list(pairwise_df.loc[condition,['baseline_h1']].values-pairwise_df.loc[condition,['saliency_out']].values)
  diff = [list(d)[0]for d in diff]
  try:
    w, p = wilcoxon(diff)
    pairwise_df.loc[condition,'localh1_wt_p'] = p
    pairwise_df.loc[condition,'localh1_wt_w'] = w
  except ValueError as e:
    print(f'Skipping Wilcoxon Signed Rank test for: {expID} due to: \n{e}')

  diff = list(pairwise_df.loc[condition,['baseline_v2']].values-pairwise_df.loc[condition,['saliency_out']].values)
  diff = [list(d)[0]for d in diff]
  try:
    w, p = wilcoxon(diff)
    pairwise_df.loc[condition,'localv2_wt_p'] = p
    pairwise_df.loc[condition,'localv2_wt_w'] = w
  except ValueError as e:
    print(f'Skipping Wilcoxon Signed Rank test for: {expID} due to: \n{e}')

  diff = list(pairwise_df.loc[condition,['baseline_v1']].values-pairwise_df.loc[condition,['saliency_out']].values)
  diff = [list(d)[0]for d in diff]
  try:
    w, p = wilcoxon(diff)
    pairwise_df.loc[condition,'localv1_wt_p'] = p
    pairwise_df.loc[condition,'localv1_wt_w'] = w
  except ValueError as e:
    print(f'Skipping Wilcoxon Signed Rank test for: {expID} due to: \n{e}')

# Save experiment history

In [None]:
pairwise_df.to_csv(pairwise_tests_data)

In [None]:
print(len(pairwise_df))

In [None]:
pairwise_df.head()

In [None]:
pairwise_df.tail()