# (mc)MPR
This notebook serves as a reproduction guide for the implementations of Multiple Prior-Guided Robust Optimization by Zhang et. al. (2025) and our multiclass MPR (mcMPR).


First, we install the necessary dependencies.

In [None]:
import os
import sys
import subprocess

ENV_NAME = "mpr_venv"

# Create virtual environment and install requirements
if not os.path.exists(ENV_NAME):
    print(f"Creating virtual environment: {ENV_NAME}")
    subprocess.check_call([sys.executable, "-m", "venv", ENV_NAME])
else:
    print(f"Virtual environment '{ENV_NAME}' already exists.")

# Install requirements
pip_path = os.path.join(ENV_NAME, "bin", "pip") if os.name != "nt" else os.path.join(ENV_NAME, "Scripts", "pip.exe")

print(f"Installing requirements.txt using {pip_path}")
subprocess.check_call([
    pip_path, "install", 
    "torch"
])
subprocess.check_call([pip_path, "install", "-r", "requirements.txt"])

Virtual environment 'mpr_venv' already exists.
Installing requirements.txt using mpr_venv/bin/pip
Note: you may need to restart the kernel to use updated packages.
Collecting asttokens==3.0.1 (from -r requirements.txt (line 1))
  Obtaining dependency information for asttokens==3.0.1 from https://files.pythonhosted.org/packages/d2/39/e7eaf1799466a4aef85b6a4fe7bd175ad2b1c6345066aa33f1f58d4b18d0/asttokens-3.0.1-py3-none-any.whl.metadata
  Using cached asttokens-3.0.1-py3-none-any.whl.metadata (4.9 kB)
Collecting colorama==0.4.6 (from -r requirements.txt (line 2))
  Obtaining dependency information for colorama==0.4.6 from https://files.pythonhosted.org/packages/d1/d6/3965ed04c63042e047cb6a3e6ed1a63a35087b6a609aa3a15ed8ac56c221/colorama-0.4.6-py2.py3-none-any.whl.metadata
  Using cached colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting comm==0.2.3 (from -r requirements.txt (line 3))
  Obtaining dependency information for comm==0.2.3 from https://files.pythonhosted.org/package

[31mERROR: Cannot install -r requirements.txt (line 4) and numpy==1.23 because these package versions have conflicting dependencies.[0m[31m
[0m[31mERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


CalledProcessError: Command '['mpr_venv/bin/pip', 'install', '-r', 'requirements.txt']' returned non-zero exit status 1.

## Datasets
For processing and augmenting MovieLens-1M, consult the notebook at ./datasets/ml-1m-synthetic/ml_1m_synthetic.ipynb

The other datasets are provided by Zhang et. al. (2025) in their reproduction package at https://github.com/jizhi-zhang/MPR/tree/main.

## Pretrain the Matrix Factorization (MF) baseline 

For each dataset, we train an MF recommendation model to serve as an unfair base model. 

In [7]:
!python pretrain_baseline.py --task_type ml-1m


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.4.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Users/danie/Documents/projects/NewMPR/NewMPR/pretrain_baseline.py", line 4, in <module>
    import pandas as pd
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/__init__.py", line 49, in <module>
    from pandas.core.api import (
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/api.py", line 28, in <module>
    from pandas.core.arrays import Categorical
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/arrays/__init__.py", line 1, in <module>
    from pandas.core.arrays.arrow

In [None]:
!python pretrain_baseline.py --task_type ml-1m-synthetic

In [None]:
!python pretrain_baseline.py --task_type Lastfm-360K

# Predict sensitive attribute distributions under a range of prior distributions

Great! Now that our MF baseline models are trained, we can move on to the prediction of our sensitive attribute distributions under various prior distributions. 

Warning: this can take a very, very long time!

In [11]:
# Generate csvs for Lastfm-360K
!bash ./scripts/predict_sst_diff_seed_batch/generate_Lastfm-360K-csv.sh

Running: prior_idx=0 seed=1 s_ratios=[0.5, 0.1]

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.4.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "/Users/danie/Documents/projects/NewMPR/NewMPR/predict_sensitive_labels.py", line 8, in <module>
    import pandas as pd
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/__init__.py", line 49, in <module>
    from pandas.core.api import (
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/api.py", line 28, in <module>
    from pandas.core.arrays import Categorical
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/arrays/__init__.py"