## Safe Active Preference Learning (SAPL) Framework Tutorial

In this tutorial, you will learn how to run empirical experiments and a human subject study with SAPL framework.

Let's start with empirical studies. We will work on the convergence analysis when the true weight valuation is out of distribution. 

In [1]:
# imports

import os, sys
import pickle
cwd = os.getcwd()
sys.path.insert(0, f'{cwd}/src')   

import numpy as np

from APL_utils import * # utils needed for safe active preference learning
from preprocess_utils import * # utils needed to preprocess data and formulas
import random


In [2]:
# experiment parameters 

no_questions = 10 # maximum number of questions to be asked
no_samples   = 1000 # number of weight valuation samples
threshold_probability = 0.99 # terminating likelihood probability condition

repetition = 1 # times the test will be repeated
experiment = 'pedestrian' # experiment type. choose among 'pedestrian' or 'overtake'


In [3]:
# load data

data_name = f"./data/{experiment}_trajectories.pkl"
with open(data_name, 'rb') as f:
    data = pickle.load(f)

In [4]:
# data preprocess

processed_signals = get_signals(data, experiment)

# construct the formula in WSTL_formula format. Please see STLCG for details.
phi = get_formula(processed_signals, experiment)


In [5]:

u = 0.4
rob_diff_bound = - np.log(u/(1-u))

for i in range(repetition):
    # create no_samples + 1 samples and choose 1 as the ground truth
    # you can seed the weight samples with seed option
    phi.set_weights(processed_signals, w_range = [0.1,1.1], 
                    no_samples=no_samples+1, random=True) 

    # select the ground truth weight valuation
    w_final = random.randint(0, no_samples+1)

    # remove the ground truth from sample set
    robs = phi.robustness(processed_signals, scale = -1).squeeze(1).squeeze(-1)
    final_robs = robs[:,w_final]
    phi = remove_true_sample(phi, w_final)

    aPL_instance= SAPL(processed_signals, phi, no_samples, rob_diff_bound, debug = True)
    aPL_instance.ood_convergence(threshold_probability, no_questions, final_robs)


  self.probability_bt.T


number of questions asked: 10
most-likely prob: 0.7373973090505465
number of agreed answers on asked questions: 1.0
number of agreed answers for all questions: 0.7867647058823529


All other experiments are slight changes of these settings. Now, let's look at how to start a human subject study.

In human subject studies, we need a means to show trajectories to participants. It can be a simulator machine to communicate with or we can show videos. This option is set through <tt>simulator</tt> variable. For the sake of simplicity, we use no simulator option. Additionally, we can ask validation questions at the end of the experiment

In [6]:
import torch

In [7]:
no_samples = 1000
threshold_probability = 0.99
no_questions = 1
experiment = 'pedestrian'
validation = 0
simulator = False

In [8]:
data_name = f"./data/{experiment}_trajectories.pkl"
with open(data_name, 'rb') as f:
    data = pickle.load(f)

processed_signals = get_signals(data, experiment)
phi = get_formula(processed_signals, experiment)

if experiment == "pedestrian":
    question_file = f"./data/{experiment}_question.csv"
elif experiment == 'overtake':
    question_file = f"./data/{experiment}_question_filtered.csv"


In [9]:
u = 0.36 
rob_diff_bound = - np.log(u/(1-u))

phi.set_weights(processed_signals, w_range = [0.1,1.1], no_samples=no_samples, random=True)

aPL_instance= SAPL(processed_signals, phi, no_samples,
                   robustness_difference_limit = rob_diff_bound,
                   debug = True)
    
# overtake examples has some signals to be discared, we remove them from the question list.
if experiment == "overtake":
    pruned_questions = []
    for q in aPL_instance.questions:
        if (q[0] not in [3,4,10,14,19]) and (q[1] not in [3,4,10,14,19]):
            pruned_questions.append(q)
    aPL_instance.questions = pruned_questions

outputs  = aPL_instance.user_experiment(simulator, question_file, 
                                        threshold_probability, no_questions)
[most_likely_w_set, decided_robustness, formula, 
remaining_questions, max_w, agreed_answers, no_questions_asked] = outputs


aligned_validation_questions = 0
if validation != 0:
    for _ in range(validation):
        q_idx = int(torch.randint(len(remaining_questions), size=(1,)))
        selected_q = remaining_questions[q_idx]
        remaining_questions.remove(selected_q)

        answer = aPL_instance.show_trajectories(selected_q)
        r_diff = decided_robustness[selected_q[0]] - decided_robustness[selected_q[1]]
        if (r_diff  >0 and answer ==0) or (r_diff < 0 and answer ==1):
            aligned_validation_questions += 1
    validation_agreement = aligned_validation_questions/validation
else:
    validation_agreement = False

print(f'posterior of the most likely weight set: {max_w}') 
print(f'train_questions: {no_questions_asked}') 
print(f'train_data_agreed_answers: {agreed_answers}')
print(f'validation_agreed_answers: {validation_agreement}')


Exception in thread Thread-4 (load):
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
Exception in thread Thread-5 (load):
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 946, in run
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tkvideo/tkvideo.py", line 45, in load
    frame_data = imageio.get_reader(path)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/imageio/v2.py", line 290, in get_reader
    self._target(*self._args, **

Question Selected: (0, 14)
Robustness wrt max weight set:                         440.0306017178204,                             110.80514693260616
Probability of max weight: 0.0009999999999999996
posterior of the most likely weight set: 0.0009999999999999996
train_questions: 1
train_data_agreed_answers: 1.0
validation_agreed_answers: False
