# User Study
## Introduction
Thank you for participating in our user study!

In this pilot experiment, we would like to study neural network model explainability.

In detail, we will iteratively present several animal images and the corresponding explanations to you, and you need to give ratings on the explanations for how well they are in explaining the images. For example, the two images below are a Chihuahua and the corresponding explanation for it. If the original image is classified by the model correctly, the explanation conveys the information of 'why there is a chihuahua in the image'. Parts rendered with red color are deemed to be relevant to chihuahua; Parts with blue are irrelevant; Parts with no color are neither relevant or irrelevant. Furthermore, the shades of the color indicate the degree of relevance. As we can see in this explanation, 
1. the head is rendered with red and pink, most of the body is rendered with red, part of the background is rendered with light blue, which does make sense: head and body are highly relevant to chihuahua, and the background is irrelevant for identifying that there is a chihuahua.
2. part of the background is rendered with pink, and one of the legs is rendered with light blue, which indicates that the explanation is not ideal.
<table><tr>
<td> 
  <p style="text-align:center">
    <img alt="Chihuahua" src="../images/chihuahua.png" width="200">
    <em style="color: black">  Chihuahua </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/chihuahua_exp.png" width="200">
    <em style="color: black"> Explanation </em>
  </p> 
</td>
</tr></table>

The main task in this user study is to give ratings on the explanations. We will provide rating schemes and highly recommend you to follow it. However, we know that doing ratings can be difficult sometimes, and spending too much time on it is not what we want. In light of this, you don't need to completely apply the scheme, but give ratings that make sense to you. We do need you to **be consistent about the rating scheme through the whole experiment**, which is very important.   


So, in this experiment, we would like to ask you to
1. go through the animal images and identify the main feature(s) in your mind that can best explain the image. For example, for the chihuahua image above, the main feature(s) can be the head, tail, body etc., whatever that can best help you identify that it is a chihuahua. Please bear in mind that the decision of main feature should be consistent during the whole experiment.   

2. provide ratings on the generated explanations in three aspects:
    1. noise in the background, ranging from 1 to 10;
    2. main features depicted in the explanation, ranging from 1 to 10;
    3. object body depicted in the explanation, ranging from 1 to 10.

## Rating Schemes
### General rating schemes
#### Noise
In the scenario of this user study, the 'noise' refers to the coloring of everything in the image that does not belong to the object, that is, the coloring of background. For example, in the explanation of chihuahua above, the coloring of the carpet in the back is deemed as the noise. Ideally, the explanation should show that the background is irrelevant, that is, the background should be colored with blue or not colored at all. 

We recommend that you rate the noise of the explanation based on the **percentage of background not being rendered with red and pink color**. For example, the rating range is 1 to 10, if you think that about **80%** of the background is colored with red and pink, you can give about **2** points for the rating. Below is a rough example for the noise rating scheme, and each example explanation has its corresponding point range below. 
<table><tr>
<td> 
  <p style="text-align:center">
    <img alt="Chihuahua" src="../images/noise_0.png" width="200">
    <em style="color: black">  0~1 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_1.png" width="200">
    <em style="color: black"> 1~2 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_2.png" width="200">
    <em style="color: black"> 2~3 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_3.png" width="200">
    <em style="color: black"> 3~4 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_4.png" width="200">
    <em style="color: black"> 4~5 </em>
  </p> 
</td>
</tr>
<tr>
<td> 
  <p style="text-align:center">
    <img alt="Chihuahua" src="../images/noise_5.png" width="200">
    <em style="color: black">  5~6 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_6.png" width="200">
    <em style="color: black"> 6~7 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_7.png" width="200">
    <em style="color: black"> 7~8 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_8.png" width="200">
    <em style="color: black"> 8~9 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/noise_9.png" width="200">
    <em style="color: black"> 9~10 </em>
  </p> 
</td>
</tr>
</table>

#### Object body
For rating the object body, we would like you to give a rating based on the **percentage of the object body that being rendered with red and pink**, which indicates the extent that the explanation successfully captured the object. For example, the rating range is 1 to 10, if you think that about **80%** of the object body is colored with red and pink, you can give about **8** points for the rating. Below is a rough example for the object body rating scheme, and each example explanation has its corresponding point range below. 
<table><tr>
<td> 
  <p style="text-align:center">
    <img alt="Chihuahua" src="../images/contour_0.png" width="200">
    <em style="color: black">  0~1 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_1.png" width="200">
    <em style="color: black"> 1~2 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_2.png" width="200">
    <em style="color: black"> 2~3 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_3.png" width="200">
    <em style="color: black"> 3~4 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_4.png" width="200">
    <em style="color: black"> 4~5 </em>
  </p> 
</td>
</tr>
<tr>
<td> 
  <p style="text-align:center">
    <img alt="Chihuahua" src="../images/contour_5.png" width="200">
    <em style="color: black">  5~6 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_6.png" width="200">
    <em style="color: black"> 6~7 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_7.png" width="200">
    <em style="color: black"> 7~8 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_8.png" width="200">
    <em style="color: black"> 8~9 </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/contour_9.png" width="200">
    <em style="color: black"> 9~10 </em>
  </p> 
</td>
</tr>
</table>

#### Main features
Similar as the rating scheme for object body, for main features that you think can best help you identify the object, we would like you to rate based on the **percentage of the main features that being rendered with red and pink**, which indicates the extent that the explanation successfully captured the main features for identifying the object. For example, the rating range is 1 to 10, if you think that about **80%** of the main features (e.g. head, tail) is colored with red and pink, you can give about **8** points for the rating. 


### Examples
Here are some examples. Note that the main feature here is the head.
<table><tr>
<td> 
  <p style="text-align:center">
    <img alt="Chihuahua" src="../images/example_0.png" width="200">
    <em style="color: black">  noise: 8  </em>
    <br>
    <em style="color: black">  object body: 8  </em>
    <br>
    <em style="color: black">  main feature: 7.5  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_1.png" width="200">
    <em style="color: black">  noise: 9.5  </em>
    <br>
    <em style="color: black">  object body: 9.5  </em>
    <br>
    <em style="color: black">  main feature: 8.5  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_2.png" width="200">
    <em style="color: black">  noise: 5  </em>
    <br>
    <em style="color: black">  object body: 9  </em>
    <br>
    <em style="color: black">  main feature: 8.5  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_3.png" width="200">
    <em style="color: black">  noise: 7.5  </em>
    <br>
    <em style="color: black">  object body: 2  </em>
    <br>
    <em style="color: black">  main feature: 6.5  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_4.png" width="200">
    <em style="color: black">  noise: 4.5  </em>
    <br>
    <em style="color: black">  object body: 9.5  </em>
    <br>
    <em style="color: black">  main feature: 9  </em>
  </p> 
</td>
</tr>
<tr>
<td> 
  <p style="text-align:center">
    <img alt="Chihuahua" src="../images/example_5.png" width="200">
    <em style="color: black">  noise: 10  </em>
    <br>
    <em style="color: black">  object body: 9  </em>
    <br>
    <em style="color: black">  main feature: 8  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_6.png" width="200">
    <em style="color: black">  noise: 10  </em>
    <br>
    <em style="color: black">  object body: 9.5  </em>
    <br>
    <em style="color: black">  main feature: 9  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_7.png" width="200">
    <em style="color: black">  noise: 10  </em>
    <br>
    <em style="color: black">  object body: 4  </em>
    <br>
    <em style="color: black">  main feature: 7  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_8.png" width="200">
    <em style="color: black">  noise: 9.5  </em>
    <br>
    <em style="color: black">  object body: 7  </em>
    <br>
    <em style="color: black">  main feature: 8  </em>
  </p> 
</td>
<td> 
  <p style="text-align:center">
    <img alt="Explanation" src="../images/example_9.png" width="200">
    <em style="color: black">  noise: 10  </em>
    <br>
    <em style="color: black">  object body: 3  </em>
    <br>
    <em style="color: black">  main feature: 6  </em>
  </p> 
</td>
</tr>
</table>

In [None]:
for i in range(iteration_setting):
    #!nvidia-smi
    iter_num = load_records['iteration'] + 1
    print(f'This is trial {iter_num}')
    print('Please go to task 2 and do the rating there, come back 10 minutes later :)')
    parameters, trial_index = ax_client.get_next_trial()
    tf.random.set_seed(42)
    ax_client.complete_trial(trial_index=trial_index, raw_data=evaluate(parameters, iter_num))
    
    end_time = str(datetime.now())
    load_records['iteration'] = load_records['iteration'] + 1
    load_records['acc_records'] = acc_records
    load_records['hr_records'] = hr_records
    load_records['hr_noise_records'] = hr_noise_records
    load_records['hr_contour_records'] = hr_contour_records
    load_records['hr_feature_records'] = hr_feature_records
    load_records['hr_total_noise_records'] = hr_total_noise_records
    load_records['hr_total_contour_records'] = hr_total_contour_records
    load_records['hr_total_feature_records'] = hr_total_feature_records
    load_records['seg_records'] = seg_records
    #load_records['shap_records'] = shap_records
    load_records['lr_records'] = lr_records
    load_records['drp_records'] = drp_records
    file_name = directory + '/ax_client_snapshot_' + end_time + '.json'
    load_records['filename'] = file_name
    json_record = json.dumps(load_records, indent=5)
    save_records(json_record)
    fh = open (file_name, 'w')
    ax_client.save_to_json_file(file_name)
    print(f'Well done! {iteration_setting - i - 1} iterations to go!')
    print("-"*40)

In [None]:
# Basic Modules
import os
os.environ['TF_DETERMINISTIC_OPS'] = '1'
import numpy as np
import requests
import warnings
import json
import random

# Disable Warnings
def warn(*args, **kwargs):
    pass
warnings.warn = warn

# ML Modules
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow import keras
import torch
import torchvision
import torchvision.transforms as transforms
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
from tensorflow.keras.layers import Dropout
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Explainer Modules
import shap

# Optimization Modules
#import GPy
#import GPyOpt

# Visualisation Modules
%matplotlib inline
from mpl_toolkits.mplot3d import axes3d

import scipy.stats
import matplotlib.pyplot as plt
from matplotlib import gridspec
from matplotlib.colors import LinearSegmentedColormap
from skimage.segmentation import felzenszwalb
from skimage.segmentation import mark_boundaries
from statistics import mean

from ax.service.ax_client import AxClient
from ax.service.utils.instantiation import ObjectiveProperties

# Plotting imports and initialization
from ax.utils.notebook.plotting import render, init_notebook_plotting
from ax.plot.pareto_utils import compute_posterior_pareto_frontier
from ax.plot.pareto_frontier import plot_pareto_frontier
init_notebook_plotting()

import csv
from datetime import datetime
import matplotlib.pyplot as plt

In [None]:
main_path = os.path.abspath('.')

In [None]:
#os.mkdir(main_path + '/records')

In [None]:
HITL_record = main_path + '/records/HITL_12_task_1.json'

In [None]:
startover = False

In [None]:
file_exists = os.path.exists(HITL_record)
if file_exists and startover == False:
    with open(HITL_record, mode='r') as json_file:
        load_records = json.load(json_file)
        print(load_records)
else:
    load_records = {'iteration':0}

In [None]:
def save_records(records):
    with open(HITL_record, "w") as outfile:
        outfile.write(records)

In [None]:
existing_iters = int(load_records['iteration'])
if existing_iters != 0:
    recover = True
else:
    recover = False
#print(f'There exists {existing_iters} iteration(s).')

In [None]:
#tf.keras.utils.set_random_seed(42)
#tf.config.experbaseline_record = '/m/home/home1/10/zhuy7/data/Downloads/Thesis/records/baseline.json'startover = Trueimental.enable_op_determinism()
seed=42
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

In [None]:
def set_GPU():
    gpus = tf.config.experimental.list_physical_devices('GPU')
    #print('Number of GPU: ', len(gpus))
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    #print('-------------Completed: GPU self-boost--------------')
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    #print('Number of logical GPU：', len(logical_gpus))
set_GPU()

In [None]:
tf.config.experimental.list_physical_devices('GPU')

In [None]:
device = torch.device('cuda:0')

In [None]:
#tf.compat.v1.disable_eager_execution()

In [None]:
seed = 42

In [None]:
from keras.preprocessing.image import ImageDataGenerator
with tf.device('/GPU:0'):
    data_gen = ImageDataGenerator(
          preprocessing_function = preprocess_input)
  
    train_generator = data_gen.flow_from_directory(
        '../../images_shuffle/training',
        target_size=(224, 224),
        batch_size=64,
        class_mode='sparse',
        shuffle=True,
        seed=seed)
    
    val_generator = data_gen.flow_from_directory(
        '../../images_shuffle/validation',
        target_size=(224, 224),
        batch_size=64,
        class_mode='sparse',
        shuffle=True,
        seed=seed)

In [None]:
class_indices = train_generator.class_indices
#class_indices

In [None]:
#@markdown CNN Added Layer
inject_after_layer ='block3_pool'#@param {type:"string"}

#@markdown Image Segmentation
fz_seg_scale = 600#@param {type:"number"}
fz_min_size = 10#@param {type:"number"}

#@markdown Bayesian Optimizer

# Random Initial State
opt_init_state = np.random.randint(1000000)

# Perform the optimization for a single image or multiple images? 
multiple_images = True#@param {type:"boolean"}
img_set = 1

# Prints Opt values
opt_verbose = True#@param {type:"boolean"}

# Choose if the function should be maximized or minimized
opt_maximize = True#@param {type:"boolean"}

# Boundaries of variable of interest X1
min_drop = 0.01#@param {type:"number"}
max_drop = 0.7#@param {type:"number"}

# Boundaries of variable of interest X2
min_seg = 0.2#@param {type:"number"}
max_seg = 0.9#@param {type:"number"}

# Boundaries of variable of interest X2
min_shap = 30#@param {type:"number"}
max_shap = 200#@param {type:"number"}

# Initial random exploration points
opt_init_points = 3#@param {type:"number"}

# Iterations per session
shap_evals = 1#@param {type:"number"}

# Iterations for BO
opt_iterations = 8#@param {type:"number"}

# Spacing between consecutive samples
opt_eps = 1e-2#@param {type:"number"}

# Gaussian process evaluation space (1D)
x_sp = np.linspace(0, max_drop, 1000).reshape(-1, 1)


# Different adfx can be selected, "EI", "MPI", "LCB"
aqfx_type = "EI"#@param {type:"string"}

# Exploration bias. Higher numbers means more exploration
aqfx_jitter = 0.1#@param {type:"number"}

# Data is exact or noisy
opt_noiseless = False#@param {type:"boolean"}

# ---Extras---

# SHAP predictions to render
img_pred_viz = 1
# Counter 
iter_counter = 0

In [None]:
files_list =([])

basepath = "../../holdout/12_task_1"
for filename in os.listdir(basepath):
    file = "../../holdout/12_task_1/" + filename
    if filename == ".ipynb_checkpoints":
        continue;
    files_list.append(file)

img_number = len(files_list)

#print(files_list)

In [None]:
# Add Dropout after given layer(s)
from tensorflow.keras.models import Model
def generate_model(drp):
    #model = VGG16(weights='imagenet',input_shape=(32,32,3), include_top=False)
    model = VGG16(weights='imagenet',input_shape=(224,224,3))
    updated_model = Sequential()
    for layer in model.layers:
        if layer.name != 'predictions':
            layer.trainable=False
            updated_model.add(layer)
    updated_model.add(Dropout(drp))
    updated_model.add(tf.keras.layers.Dense(units=10,kernel_initializer=tf.keras.initializers.GlorotUniform(seed=42),activation='softmax'))
    model = updated_model
    return model

In [None]:
# Creates a segmentation to explain the model with hyper-pixels
def maskerMaster(i,fz_sig):
    # print(fz_sig)

    img = keras.preprocessing.image.load_img(files_list[i], target_size=(224,224)) #VGG input size
    img_orig = keras.preprocessing.image.img_to_array(img)
    #print(img_orig.shape)
    segments_fz = felzenszwalb(img, scale=fz_seg_scale, sigma=fz_sig, min_size=fz_min_size)

    # Visualize masker
    """
    plt.rcParams["figure.figsize"] = (15,4)
    plt.axis('off')
    plt.subplot(1,3,1)
    plt.xlabel('Input Image')
    plt.imshow(img)
    plt.subplot(1,3,2)
    plt.xlabel('Felz Segments')
    plt.imshow(mark_boundaries(img, segments_fz))
    plt.subplot(1,3,3)
    plt.xlabel('Masked Segments')
    plt.imshow(segments_fz)
    plt.show()
    """

    return img, img_orig, segments_fz

def mask_image(eval, segmentation, image, background=None):
    
    if background is None:
        background = image.mean((0, 1))
        
    result = np.zeros((eval.shape[0], 
                    image.shape[0], 
                    image.shape[1], 
                    image.shape[2]))
    
    for i in range(eval.shape[0]):
        result[i, :, :, :] = image
        for j in range(eval.shape[1]):
            if eval[i, j] == 0:
                result[i][segmentation == j, :] = background
    return result

In [None]:
# Core CNN+SHAP
def shapProcessor (drp,fz_s,image_index, modelNew):
    #print(f'dropout: {drp}, hp_seg: {fz_s}')

  # Input image(s)
    if multiple_images:
        img_set = image_index
    else:
        img_set = 0

    img, img_orig, segments_slices = maskerMaster(img_set,fz_s)

  # Run model
    def f(z):
        return modelNew.predict(preprocess_input(mask_image(z, segments_slices, img_orig, 255)))

  # Store predictions
    predictions = modelNew.predict(preprocess_input(np.expand_dims(img_orig.copy(), axis=0)))
    top_preds = np.argsort(-predictions)
    inds = top_preds[0]
    #print(inds[0])
    if inds[0] != 9:
        return False
  #print(predictions, top_preds)
  #print(f'predictions: {top_preds[0][0], predictions[0][top_preds[0][0]], top_preds[0][1], predictions[0][top_preds[0][1]]}')

  # Render segments into image
    def fill_segmentation(values, segmentation):
        out = np.zeros(segmentation.shape)
        for i in range(len(values)):
            out[segmentation == i] = values[i]
        return out


  # Kernel SHAP runs VGG16
    explainer = shap.KernelExplainer(f, np.zeros((1,50)))
    shap_values = explainer.shap_values(np.ones((1,50)), nsamples=1500, l1_reg=False)

  # Define SHAP Colours
    colors = []
    for l in np.linspace(1, 0, 100):
        colors.append((30 / 255, 136 / 255, 229 / 255, l))
    for l in np.linspace(0, 1, 100):
        colors.append((255 / 255, 13 / 255, 87 / 255, l))
    cm = LinearSegmentedColormap.from_list("bwr",colors)

    
    max_val = np.max([np.max(np.abs(shap_values[i][:,:-1])) for i in range(len(shap_values))])
  
  # for i in range(img_pred_viz):
    m = fill_segmentation(shap_values[inds[0]][0], segments_slices)
    #plt.title(feature_names[str(inds[0])][1]) #Labels
    #plt.title(list(class_indices.keys())[inds[0]]) #Labels
    plt.axis('off')
    plt.subplot(1,2,1)
    plt.xlabel('Input Image')
    plt.imshow(img)
    plt.subplot(1,2,2)
    plt.xlabel('Explanation')
    plt.imshow(img.convert('LA'), alpha=0.25)
    plt.imshow(m, cmap=cm, vmin=-max_val, vmax=max_val)
    plt.rcParams["figure.figsize"] = (10,4)
    #plt.imshow(img.convert('LA'), alpha=0.25)
    #im = plt.imshow(m, cmap=cm, vmin=-max_val, vmax=max_val)
  # plt.set_xlabel(feature_names[str(inds[0])][1]) #Labels
    #plt.axis('off')
    #plt.rcParams["figure.figsize"] = (8,8)
    plt.savefig(fname=load_records['directory']+'/plots/' + str(load_records['iteration']) + '_' + str(image_index) + '.png', bbox_inches ="tight")
    #plt.savefig(fname=load_records['directory']+'/plots/' + str(load_records['iteration']) + '_' + str(image_index) + '_T.png', bbox_inches ="tight", transparent = True)
    plt.show()
    return True

In [None]:
from tensorflow.python.eager.context import num_gpus
def userQuestion():
    #print("\nHow informative are the colored segments in categorizing the animal?\n\nPlease consider the following:\n\nAre the shaded regions within the animal boundaries?\nDo the shaded regions include both big and small features?\nIs the highest intensity of the colours within the animal boundaries?\n\nRate from 1 to 10, where 1 is the worst and 10 is perfect.\n")
    print("\nPlease give your rating for the noise, object body, and main features:\n")
    noise = 0
    contour = 0
    feature = 0
    while True:         #Checks for valid inputs
        print("Noise rating:")
        num = input()
        try:
            val = float(num)
            if 0 < val < 11:
                noise = val
                break
            else:
                print("This is not a valid number.")      
        except ValueError:
            print("This is not a number.")
  
    while True:         #Checks for valid inputs
        print("Object body rating:")
        num = input()
        try:
            val = float(num)
            if 0 < val < 11:
                contour = val
                break
            else:
                print("This is not a valid number.")      
        except ValueError:
            print("This is not a number.")

    while True:         #Checks for valid inputs
        print("Main feature rating:")
        num = input()
        try:
            val = float(num)
            if 0 < val < 11:
                feature = val
                break
            else:
                print("This is not a valid number.")      
        except ValueError:
            print("This is not a number.")
  
    return noise, contour, feature

In [None]:
if recover:
    ax_client = AxClient.load_from_json_file(load_records['filename'])
    acc_records = load_records['acc_records']
    hr_noise_records = load_records['hr_noise_records']
    hr_contour_records = load_records['hr_contour_records']
    hr_feature_records = load_records['hr_feature_records']
    hr_records = load_records['hr_records']
    lr_records = load_records['lr_records']
    drp_records = load_records['drp_records']
    seg_records = load_records['seg_records']
    hr_total_noise_records = load_records['hr_total_noise_records']
    hr_total_contour_records = load_records['hr_total_contour_records']
    hr_total_feature_records = load_records['hr_total_feature_records']
    #shap_records = load_records['shap_records']
    directory = load_records['directory']
else:
    ax_client = AxClient()
    ax_client.create_experiment(
    name="moo_experiment",
    parameters=[
                {'name': 'lr', 'type': 'range', 'bounds': [0.00005, 0.02]},
                {'name': 'drp', 'type': 'range', 'bounds': [0.01, 0.6]}, 
                {'name': 'seg', 'type': 'range', 'bounds': [0.2,0.9]}
                #{'name': 'shap', 'type': 'range', 'bounds': [100,2048]}
                ],
    objectives={
        # `threshold` arguments are optional
        "acc": ObjectiveProperties(minimize=False), 
        "hr": ObjectiveProperties(minimize=False)
    }
    )
    acc_records = []
    hr_total_noise_records = []
    hr_total_contour_records = []
    hr_total_feature_records = []
    hr_noise_records = []
    hr_contour_records = []
    hr_feature_records = []
    hr_records = []
    lr_records = []
    drp_records = []
    seg_records = []
    shap_records = []
    start_time = str(datetime.now())
    load_records['start_time'] = start_time
    os.mkdir(main_path + '/result')
    os.mkdir(main_path + '/result/task_1')
    directory = main_path + '/result/task_1/'+ start_time
    os.mkdir(directory)
    os.mkdir(directory + '/plots')
    load_records['directory'] = directory

In [None]:
def fix_rating(noise_ratings, contour_ratings, feature_ratings, ratings_per_iter):
    print("\nDo you need to change the rating? Type y for yes and n for no:")
    ans = input()
    while ans == "y":
        while True:         #Checks for valid inputs
            print("Ok! Which image it is? For example, if you want to change the rating for the first image, type 1")
            img_idx = int(input())
            try:
                if 0 < img_idx < 13:
                    img_idx = img_idx - 1
                    break
                else:
                    print("This is not a valid number.")      
            except ValueError:
                print("This is not a number.")
        
        while True:         #Checks for valid inputs
            print("All right, which rating it is? Type 1 for noise, 2 for object body, 3 for main features:")
            rate_type = int(input())
            try:
                if 1 <= rate_type <= 3:
                    break
                else:
                    print("This is not a valid number.")      
            except ValueError:
                print("This is not a number.")
        
        
        while True:         #Checks for valid inputs
            print("And what change would you like to make? For example, if you want to add 5 points, type 5; if you want to reduce 5 points, type -5")
            rate_change = int(input())
            try:
                if rate_type == 1:
                    if 0 <= noise_ratings[img_idx] + rate_change <= 10:
                        break
                    else:
                        print("Please note that the rating range is [0,10]!")
                        
                if rate_type == 2:
                    if 0 <= contour_ratings[img_idx] + rate_change <= 10:
                        break
                    else:
                        print("Please note that the rating range is [0,10]!") 
                        
                if rate_type == 3:
                    if 0 <= feature_ratings[img_idx] + rate_change <= 10:
                        break
                    else:
                        print("Please note that the rating range is [0,10]!")   
            except ValueError:
                print("This is not a number.")
        
        if rate_type == 1:
            noise_ratings[img_idx] += rate_change
        elif rate_type == 2:
            contour_ratings[img_idx] += rate_change
        elif rate_type == 3:
            feature_ratings[img_idx] += rate_change
        ratings_per_iter[img_idx] = (noise_ratings[img_idx] + contour_ratings[img_idx] + feature_ratings[img_idx]) / 3.0
        print("\nOkay, the rating has been changed! Do you want to change another one?")
        ans = input()

In [None]:
def evaluate(parameters, iter_counter):
    shap_evals = img_number
    ratings_per_iter = np.array([[]])
    noise_ratings = np.array([[]])
    contour_ratings = np.array([[]])
    feature_ratings = np.array([[]])
    acc_per_iter = []

    lr = parameters.get('lr')
    drp = parameters.get("drp")
    seg = parameters.get("seg")
    #shap = parameters.get("shap")

    modelNew = generate_model(drp)
    modelNew.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=lr),  # Optimizer
    # Loss function to minimize
    loss=keras.losses.SparseCategoricalCrossentropy(),
    # List of metrics to monitor
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
    )
    history = modelNew.fit(
      train_generator,
      batch_size=64,
      epochs=30,
      validation_data=val_generator,
      verbose = 0
    )
    #print(history.history)
    accuracy = history.history['val_sparse_categorical_accuracy'][-1]
    #print(f'accuracy: {accuracy}')
    """
    plt.plot(history.history['sparse_categorical_accuracy'])
    plt.plot(history.history['val_sparse_categorical_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    # summarize history for loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    """
    print('\nWelcome back!\n')
    for i in range(shap_evals):
        print(f'\nThis is image {i+1}.')
        pred = shapProcessor(drp, seg,i, modelNew)
        if pred:
            noise,contour,feature = userQuestion()
            noise_ratings = np.append(noise_ratings, noise)
            contour_ratings = np.append(contour_ratings, contour)
            feature_ratings = np.append(feature_ratings, feature)
            ratings_per_iter = np.append(ratings_per_iter, (noise + contour + feature)/3.0)
        else:
            print(f'image {i+1} is classified incorrectly, move on to the next image.\n')
    
    fix_rating(noise_ratings, contour_ratings, feature_ratings, ratings_per_iter)
    #print(f'\nAverage score for this run: {mean(ratings_per_iter)}')
    #print(f'\nAverage noise score for this run: {mean(noise_ratings)}')
    #print(f'\nAverage contour score for this run: {mean(contour_ratings)}')
    #print(f'\nAverage feature score for this run: {mean(feature_ratings)}')
    acc_records.append(accuracy)
    hr_total_noise_records.append(list(noise_ratings))
    hr_total_contour_records.append(list(contour_ratings))
    hr_total_feature_records.append(list(feature_ratings))
    hr_records.append(mean(ratings_per_iter))
    hr_noise_records.append(mean(noise_ratings))
    hr_contour_records.append(mean(contour_ratings))
    hr_feature_records.append(mean(feature_ratings))
    lr_records.append(lr)
    drp_records.append(drp)
    seg_records.append(seg)
    #shap_records.append(shap)
    #accuracy = compute_accuracy(modelNew, images_list1, labels_list1)
    #accuracy = mean(history.history['val_sparse_categorical_accuracy'])
    #print(f'Accuracy of the VGG net on the test images: {accuracy: .6f}')
    print("-"*40)
    del modelNew
    #modelNew.save('task_1_model_' + str(iter_counter))
    tf.keras.backend.clear_session()
    return {"acc": (accuracy, 0.0), "hr": (mean(ratings_per_iter), 0.0)}

In [None]:
iteration_setting = 12

In [None]:
render(ax_client.get_contour_plot(param_x="drp", param_y="lr", metric_name='hr'))

In [None]:
epoch = 15
x = [i+1 for i in range(epoch)]
#y = [0.95110023021698, 0.9547677040100098, 0.9168704152107239, 0.9584352374076843, 0.9645476937294006]
data_acc=plt.plot(x,load_records['acc_records'],'b',label='accuracy')
plt.plot(x,load_records['acc_records'],'b^-')
plt.title('trace plot')
plt.xlabel('iterations')
plt.ylabel('accuracy')
plt.legend()
plt.show()

In [None]:
epoch = 15
x = [i+1 for i in range(epoch)]
data_hr=plt.plot(x,load_records['hr_records'],'g',label='human rating')
plt.plot(x,load_records['hr_records'],'g^-')
plt.title('trace plot')
plt.xlabel('iterations')
plt.ylabel('human rating')
plt.legend()
plt.show()

In [None]:
epoch = 15
x = [i+1 for i in range(epoch)]
plt.figure(figsize=(8,6))
#y = [0.95110023021698, 0.9547677040100098, 0.9168704152107239, 0.9584352374076843, 0.9645476937294006]
data=plt.plot(load_records['hr_records'],load_records['acc_records'],'b',label='iteration', alpha=0.05)
colors = x
sc = plt.scatter(load_records['hr_records'],load_records['acc_records'], c=colors)
plt.colorbar(sc)
plt.title('trace plot')
plt.xlabel('human rating')
plt.ylabel('accuracy')
plt.legend()
plt.show()

In [None]:
df = pd.DataFrame(list(zip([i+1 for i in range(15)], load_records['hr_records'], load_records['acc_records'], load_records['drp_records'], load_records['lr_records'],load_records['seg_records'])), columns =['iteration', 'human_rating','accuracy', 'dropout_rate', 'learning_rate', 'sigma'])
df.to_csv('./pilot_study.csv') 

In [None]:
%%HTML
<button onclick="$('.input, .prompt, .output_stderr, .output_error, .output_result').toggle();">Toggle Code</button>