This repository is based on our paper: Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent.
You can use following scripts to install related python package through pip:
git clone https://github.com/XueyangFeng/ECPO.git
cd ECPO
pip install -r requirements.txt
We provide a detailed pipeline for the AILO environment, including additional README files. For a quick setup, follow these steps:
- Download the index file.
- Unzip the downloaded file into the
user_simulator/embedding/folder.
All LLM (Large Language Model) calls in this repository are made using OpenAI-like interfaces. To configure the APIs:
- Set your API information in the
config/api_config.jsonfile. - For closed-source models, set the information directly in the config.
- For open-source models, use
vllmfor local deployment. We have provided an example script in themodel/directory.
To run the existing prompt-based Conversational Recommendation Agent (CRA) or an aligned CRA, you can set the relevant configuration in the main.sh file and execute it.
Our CRA alignment process consists of four main stages:
- SGPT (Stage 1): Simulator-Guided Planning Tuning
- ECPO (Stages 2-4): Expectation Confirmation Preference Optimization
- SGPT (Stage 1): Simulator-Guided Planning Tuning
- ECPO Stages (2-4):
You can download the training data and unzip it into the backward/ directory.
First, test recommendation metric using simulator environment:
# test the existing prompt-based CRA baseline
bash main.sh
# test the aligned CRA
bash main_lora.sh
Then, test dialogue metric using gpt-4o evaluator:
cd pair_eval
#Set up your evaluation files (`model2.log` for the targeted log file, and `model1.log` for the expert trajectory), then run:
python eval.py
- Our evaluation method is based on RUCAIBox/iEvaLM-CRS.
- Our training code is based on hiyouga/LLaMA-Factory.
