# (mc)MPR
This notebook serves as a reproduction guide for the implementations of Multiple Prior-Guided Robust Optimization by Zhang et. al. (2025) and our multiclass MPR (mcMPR).


First, we install the necessary dependencies.

In [None]:
import os
import sys
import subprocess

base_dir = os.path.abspath(os.getcwd())
print(f"Base/work directory: {base_dir}")

ENV_NAME = "venv"
venv_path = os.path.join(base_dir, ENV_NAME)
requirements_file = os.path.join(base_dir, "requirements.txt")

if not os.path.isfile(requirements_file):
    raise FileNotFoundError(f"requirements.txt not found in {base_dir}")

if not os.path.exists(venv_path):
    print(f"Creating virtual environment: {venv_path}")
    subprocess.check_call([sys.executable, "-m", "venv", venv_path])
else:
    print(f"Virtual environment '{venv_path}' already exists.")

if os.name != "nt":
    pip_path = os.path.join(venv_path, "bin", "pip")
else:
    pip_path = os.path.join(venv_path, "Scripts", "pip.exe")

print(f"Upgrading pip, setuptools, wheel in: {pip_path}")
subprocess.check_call([pip_path, "install", "--upgrade", "pip", "setuptools", "wheel"])
print("Installing torch...")
subprocess.check_call([pip_path, "install", "torch"])

# Install requirements
print(f"Installing requirements.txt using {pip_path}")
subprocess.check_call([pip_path, "install", "-r", requirements_file])

print("Done! To activate locally:")
if os.name != "nt":
    print(f"  source {ENV_NAME}/bin/activate")
else:
    print(f"  {ENV_NAME}\\Scripts\\activate.bat")

In [None]:
!source venv/bin/activate

## Datasets
For processing and augmenting MovieLens-1M, consult the notebook at ./datasets/ml-1m-synthetic/ml_1m_synthetic.ipynb

The other datasets are provided by Zhang et. al. (2025) in their reproduction package at https://github.com/jizhi-zhang/MPR/tree/main.

## Pretrain the Matrix Factorization (MF) baseline 

For each dataset, we train an MF recommendation model to serve as an unfair base model. 

In [None]:
!python pretrain_baseline.py --task_type ml-1m

In [None]:
!python pretrain_baseline.py --task_type ml-1m-synthetic

In [None]:
!python pretrain_baseline.py --task_type Lastfm-360K

# Predict sensitive attribute distributions under a range of prior distributions

Great! Now that our MF baseline models are trained, we can move on to the prediction of our sensitive attribute distributions under various prior distributions. 

Warning: this can take a very, very long time!

In [None]:
# Generate csvs for Lastfm-360K
!bash ./scripts/predict_sst_diff_seed_batch/generate_Lastfm-360K-csv.sh

In [None]:
# Generate csvs for ml-1m
!bash ./scripts/predict_sst_diff_seed_batch/generate_ml-1m_csv.sh

In [None]:
# Generate csvs for ml-1m with a three-class sensitive attribute
!bash ./scripts/predict_sst_diff_seed_batch/generate_ml-1m-synthetic_csv.sh

## Experiments

### Single prior sweep (Experiment 1 in the paper)

Our next experiment explores MPR performance on a range of single priors.

In [35]:
!bash ./scripts/reproduction/single_prior_sweep.sh

Starting MPR Single Prior Sweep Experiments (Figure 5 style)..
Output root: ./deliverables/fig5_single_prior
Tasks: Lastfm-360K ml-1m
Partial ratio females: 0.1 0.2 0.3
Seeds: 1 2 3
Lastfm priors: 0.1 0.2 0.5 1.0 2.0 3.5 5.0 7.5 10.0
ml-1m priors: 0.1 0.2 0.5 1.0 2.0 2.5 5.0 7.5 10.0
Running for task: Lastfm-360K..
  ..Partial ratio (female): 0.1
    ..Seed: 1
      >> [task=Lastfm-360K] [male=0.5] [female=0.1] [seed=1] [prior=0.1]
         Log file: ./deliverables/fig5_single_prior/Lastfm-360K/male0.5_female0.1/prior0.1/seed1/train.log
^C


### Beta sweep (Experiment 2 in the paper)
The first experiment we will run investigates the influence of the $\beta$ parameter on fairness and predictive accuracy.

In [None]:
!bash ./scripts/reproduction/beta_sweep.sh

### N-prior sweep (Experiment 3 in the paper)

This final binary-MPR experiment demonstrates the influence of the number of priors on the model's performance.

In [None]:
!bash ./scripts/reproduction/n_prior_sweep.sh

### mcMPR test on ml-1m

