# Learning the value systems of societies from preferences - submitted for ECAI 2025
This notebook replicates the experiments for the ECAI paper titled "Learning the value systems of societies from preferences". The paper presents a novel approach to learning value systems (value-based preferences) and value groundings (domain-specific value alignment measures) of a society of agents or stakeholders from examples of pairwise preferences between alternatives in a decision-making problem domain.

In the paper we utilize the Apollo dataset from [https://rdrr.io/cran/apollo/man/apollo_swissRouteChoiceData.html](https://rdrr.io/cran/apollo/man/apollo_swissRouteChoiceData.html), about train choice in Switzereland. The dataset includes features such as cost, time, headway, and interchanges, which are used to model agent preferences based on values. Although it also works for sequential decision making, in the paper we focus on the non-sequential decision making use case that the Apollo Dataset is about. 

There are three main executables:
- **`generate_dataset_non_sequential.py`**: Generates the dataset for the experiments.
- **`train_vsl_non_sequential.py`**: Trains the reward models using the generated dataset. This script supports running multiple seeds in parallel.
- **`evaluate_results.py`**: Evaluates the trained models and generates plots to visualize the results.

This notebook is divided into three main sections:
1. **Dataset Generation**: Generates the Apollo dataset.
2. **Training**: Trains the reward models using a certain number of seeds in parallel.
3. **Evaluation**: Evaluates the results and displays the plots directly in the notebook.

## 1. Dataset Generation
In this section, we generate the Apollo dataset using the `generate_dataset_one_shot_tasks.py` script. This dataset will be used for training and evaluation in subsequent steps.

In [3]:
BASE_SEED = 26 # Actual seed in the paper is 26
N_SEEDS = 5

In [4]:
import os
# Use the gentr flag to generate the information of trajectories/alternatives.
# Use the genpf flag to generate the preferences between trajectories/alternatives.
os.system(f'python generate_dataset_one_shot_tasks.py --environment apollo --dataset_name ecai_apollo --seed {BASE_SEED} -gentr -genpf')

  pc_group = alg_group.add_argument_group(


Namespace(dataset_name='ecai_apollo', gen_trajs=True, gen_preferences=True, dtype=<class 'numpy.float32'>, algorithm='pc', config_file='algorithm_config.json', environment='apollo', seed=26, test_size=0.0, reward_epsilon=0.0)


Traceback (most recent call last):
  File [35m"/home/andresh26kali/VAE-ValueLearning/ValueLearningFromPreferences/generate_dataset_one_shot_tasks.py"[0m, line [35m150[0m, in [35m<module>[0m
    [31mdill.dump[0m[1;31m(environment, f)[0m
    [31m~~~~~~~~~[0m[1;31m^^^^^^^^^^^^^^^^[0m
  File [35m"/home/andresh26kali/VAE-ValueLearning/ValueLearningFromPreferences/.venv/lib/python3.13/site-packages/dill/_dill.py"[0m, line [35m252[0m, in [35mdump[0m
    [31mPickler(file, protocol, **_kwds).dump[0m[1;31m(obj)[0m
    [31m~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[0m[1;31m^^^^^[0m
  File [35m"/home/andresh26kali/VAE-ValueLearning/ValueLearningFromPreferences/.venv/lib/python3.13/site-packages/dill/_dill.py"[0m, line [35m420[0m, in [35mdump[0m
    [31mStockPickler.dump[0m[1;31m(self, obj)[0m
    [31m~~~~~~~~~~~~~~~~~[0m[1;31m^^^^^^^^^^^[0m
  File [35m"/usr/lib/python3.13/pickle.py"[0m, line [35m484[0m, in [35mdump[0m
    [31mself.save[0m[1;31m(obj)[0

2

## 2. Training
In this section, we train the reward models using the `train_vsl_non_sequential.py` script. We run the training process with `N_SEEDS` different seeds in parallel.

In [None]:
K_tests = [1,2,3,6,9,12]

# List of seeds to run in parallel
seeds = [BASE_SEED + i for i in range(N_SEEDS)]

def train_with_seed(seed, K):
    import subprocess
    # The -O option is important, as there are many costly debugging operations in the code
    subprocess.Popen(f"python -O train_vsl_non_sequential.py --dataset_name ecai_apollo -ename ecai_test_K{K}_s{seed} -s={seed} -e apollo -cf='algorithm_config_K{K}.json'", shell=True)


In [None]:
# Run training in parallel as separate processes
for K in K_tests:
    # Create a separate process for each seed and K
    processes = [train_with_seed(seed, K) for seed in seeds]
# Train will take approximately 2-6 hours for each L in a small MACBook Pro M2.
# It is slow because it does not scale very well in the number of agents (the algorithm can be subject to more technical optimizations).

In [None]:
# If want to run all K and seeds in parallel, you can use the following command:
if False:
    processes = [train_with_seed(seed, K) for seed in seeds for K in K_tests]

## 3. Evaluation
In this section, we evaluate the trained models using the `evaluate_results.py` script. The evaluation will generate plots to visualize the results, and these plots will be displayed directly in the notebook.

In [None]:
import os

seed = 26

experiments_all_seeds_per_K = {K: ','.join([f"ecai_test_{seed+i}_K{k}" for i in range(N_SEEDS)]) for k in K_tests}
experiments_all = ','.join([f"ecai_test_K{K}_s{seed}" for K in K_tests for seed in seeds])

This will produce the tables and plots for a specific seed and maximum number of clusters. The results of each execution will be saved in the `test_results/ecai_test_s{seed}_K{K}` directory.
Inside, there will be:

- `train_set/`: The results over training set (there is no test set in this case, in other environments it might be useful). Inside there are the following folders:
  - `explanations/`: Morris sensitivity analysis of the grounding functions.
  - `plots/`: Plots of different kinds.
    - `context_features/`: It shows in a graphical manner the proportional deviation from the mean of the context features affecting each decision in each cluster. (e.g. going for shopping, business, etc.)
    - `hists_clusters.pdf`: Shows pie charts of every value system, and histograms for the representativeness achieved in each cluster.
    - `figure_clusters.pdf`: A graphical representation of the clusters. Given the distances are not euclidean, it is not very informative, but it is useful to see how the clusters are separated visually and how well each agent is internally represented each agent (inside the circles). To better see the latter, the `hists_clusters.pdf` is more useful.
  - `tables/`: Results that are better shown in table-form. Tables are in CSV and LaTeX format. They are represented for each single cluster assignment in the final state of the memory used during training, stating the position in the ranking (ordered first by grounding coherence, then by dunn index score).
    - `context_features/`: These tables show the grpahical representation of the context features from before, adding the actual averages of each feature per cluster and the global ones.
    - `general/`: These tables show general information about the assignments: number of agents per cluster, value system, Dunn index, grounding coherence, representativeness, etc.


In [None]:
for K in K_tests:
    for seed in seeds: # (optional)
        os.system(f"python evaluate_results.py -ename ecai_test_s{BASE_SEED}_K{K}")

This calcutes the learning curves for each maximum number of clusters, aggregating the curves of the different seeds used. The results for each maximumm of clusters K are saved at `test_results/ecai_test_s{BASE_SEED}_K{K}/learning_curves/`.

In [None]:
for K in K_tests:
    os.system(f"python evaluate_results.py -ename ecai_test_s{BASE_SEED}_K{K} --lrcfrom={experiments_all_seeds_per_K[K]}")

This calculates the Dunn Index curve comparing the executions with different maximum number of clusters. Each point represents number of "predicted clusters/maximum number of clusters permitted", and the graph shows the average Dunn Index over the number of cases each combination happened to be the final best solution in all the experiments (taking into account all the different execution seeds). The results for each maximumm of clusters K are saved at `test_results/ecai_test_s{BASE_SEED}_K{K}/di_scores/`.

In [None]:
os.system(f"python evaluate_results.py -ename ecai_test_{seed} --dicfrom={experiments_all}")

usage: evaluate_results.py [-h] -ename EXPERIMENT_NAME [-sh]
                           [-subfm SUBFIG_MULTIPLIER] [-pfont PLOT_FONTSIZE]
                           [-dicfrom DUNN_INDEX_CURVE_FROM]
                           [-lrcfrom LEARNING_CURVE_FROM] [-s SEED]
evaluate_results.py: error: unrecognized arguments: --lrcfrom=ecai_test_26,ecai_test_27,ecai_test_28,ecai_test_29,ecai_test_30


Namespace(dataset_name='ecai_apollo', experiment_name='ecai_test_s30', dtype=torch.float32, seed=30, algorithm='pc', config_file='algorithm_config_L3.json', show=False, environment='apollo', discount_factor=1.0, split_ratio=0.0, k_clusters=-1, debug_mode=False, retrain_experts=False, approx_expert=False, reward_epsilon=0.0)
Namespace(dataset_name='ecai_apollo', experiment_name='ecai_test_s28', dtype=torch.float32, seed=28, algorithm='pc', config_file='algorithm_config_L3.json', show=False, environment='apollo', discount_factor=1.0, split_ratio=0.0, k_clusters=-1, debug_mode=False, retrain_experts=False, approx_expert=False, reward_epsilon=0.0)
Namespace(dataset_name='ecai_apollo', experiment_name='ecai_test_s27', dtype=torch.float32, seed=27, algorithm='pc', config_file='algorithm_config_L3.json', show=False, environment='apollo', discount_factor=1.0, split_ratio=0.0, k_clusters=-1, debug_mode=False, retrain_experts=False, approx_expert=False, reward_epsilon=0.0)
Namespace(dataset_name

512

TESTING DATA COHERENCE. It is safe to stop this program now...
TESTING DATA COHERENCE. It is safe to stop this program now...
TESTING DATA COHERENCE. It is safe to stop this program now...
TESTING DATA COHERENCE. It is safe to stop this program now...
TESTING DATA COHERENCE. It is safe to stop this program now...


  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
  logger.warn(
