Cold Start Active Preference Learning in Socio-Economic Domains

This repository contains the official source code and experimental setup for the paper:
"Cold Start Active Preference Learning in Socio-Economic Domains", submitted to the ....

Abstract

Active preference learning offers an efficient approach to modeling preferences, but it is hindered by the cold-start problem, which leads to a marked decline in performance when no initial labeled data are available. While cold-start solutions have been proposed for domains such as vision and text, the cold-start problem in active preference learning remains largely unexplored, underscoring the need for practical, effective methods. Drawing inspiration from established practices in social and economic research, the proposed method initiates learning with a selfsupervised phase that employs Principal Component Analysis (PCA) to generate initial pseudo-labels. This process produces a “warmed-up” model based solely on the data’s intrinsic structure, without requiring expert input. The model is then refined through an active learning loop that strategically queries a simulated noisy oracle for labels. Experiments conducted on various socioeconomic datasets, including those related to financial credibility, career success rate, and socio-economic status, consistently show that the PCA-driven approach outperforms standard active learning strategies that start without prior information. This work thus provides a computationally efficient and straightforward solution that effectively addresses the cold-start problem.

Framework Overview

Our proposed framework consists of four main stages designed to efficiently learn preferences from a cold start:

Data Preparation
Raw data is cleaned, preprocessed, and standardized. Categorical features are intelligently encoded into numerical or one-hot representations.
Warm-Start Pre-training
A self-supervised phase where Principal Component Analysis (PCA) is used to generate pseudo-labels. An initial XGBoost model is pre-trained on these labels to give it a "warm start."
Simulated Expert Oracle
An oracle that mimics a real-world expert by providing preference labels with stochastic noise, modeled using the Bradley-Terry model.
Training Loop
The warm-started model is incrementally refined by strategically querying the oracle for new labels, focusing on the most informative data pairs.

Repository Structure

.
├── Config/
│ └── util.py           # Utility functions
├── Datasets/           # Directory for datasets
├── FIFA/               # Initial Jupyter notebooks for the FIFA dataset (Different experiments were done here)
├── Images/             # Output directory for generated plot images
├── Plots/              # Output directory for plots' data
├── Results/
│ └── DopeWolf          # Output directory for dopewolf
│ ├── GURO              # Output directory for guro
│ ├── Regression        # Highest accuracies of logisitc regression model
│ └── times.txt         # Recorded times from stopwatch.py
├── *.ipynb             # Notebooks for datasets
├── dopewolf.py         # Code to run dopewolf method
├── guro.py             # Code to run guro method
├── plot_generator.py   # Code to generate plots from saved data
├── README.md           # This file
├── run.py              # Standalone script to execute the framework on a cleaned dataset
└── stopwatch.py        # Script to measure the time required for our cold-start method

Download Datasets

Please download the datasets used in the study and place them in the Datasets/ directory.

Datasets can be downloaded directly except the Household and FIFA 22 datasets due to their large size. You can download them from the following links:

Download Household Dataset
Download FIFA 22 Dataset

Here are the links to other datasets' main pages:
Download Credit Dataset
Download Happiness Dataset
Download Student Dataset

Usage and Replication

The experiments can be run from the notebooks. The core logic for running a comparative experiment is outlined in the run.py script.

Replicating Paper Results

To replicate the figures from the paper, run the script for all policies across the relevant datasets. The results and plots will be saved to the plots/ directory.

Citation

If you use this code or our framework in your research, please cite our paper:

@misc{fayazbakhsh2025coldstartactivepreference,
      title={Cold Start Active Preference Learning in Socio-Economic Domains}, 
      author={Mojtaba Fayaz-Bakhsh and Danial Ataee and MohammadAmin Fazli},
      year={2025},
      eprint={2508.05090},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.05090}, 
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cold Start Active Preference Learning in Socio-Economic Domains

Abstract

Framework Overview

Repository Structure

Download Datasets

Usage and Replication

Replicating Paper Results

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Config		Config
Datasets		Datasets
FIFA		FIFA
Images		Images
Plots		Plots
Results		Results
.gitignore		.gitignore
Credit_code.ipynb		Credit_code.ipynb
Fifa_code.ipynb		Fifa_code.ipynb
Happiness_code.ipynb		Happiness_code.ipynb
Household_code.ipynb		Household_code.ipynb
LICENSE		LICENSE
README.md		README.md
Student_code.ipynb		Student_code.ipynb
dopewolf.py		dopewolf.py
guro.py		guro.py
plot_generator.py		plot_generator.py
run.py		run.py
stopwatch.py		stopwatch.py

Folders and files

Latest commit

History

Repository files navigation

Cold Start Active Preference Learning in Socio-Economic Domains

Abstract

Framework Overview

Repository Structure

Download Datasets

Usage and Replication

Replicating Paper Results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages