AHaBench

This is the official repository for our paper:

"Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs"

📊 Dataset

You can download the AHaBench and AHaPairs datasets from Hugging Face.

🚀 How to Use

🛠️ Dependencies

Python 3.10
Python packages
Install the required packages with:
```
pip install -r requirements.txt
```
For DPO Training Requires at least one A100 GPU node.
For Evaluation Metrics Uses the OpenAI API, only CPU is needed.

🏋️‍♂️ DPO Training

Run the following command to train with Direct Preference Optimization (DPO):

CUDA_VISIBLE_DEVICES=${device_nums} python dpo.py \
  --input_path data/train.json \  # change AHaPairs.json to train.json if needed
  --base_model ${model_name} \
  --output_path ./dpo_output \
  --hug_token ${huggingface_token}

Note: Replace device_nums, model_name, and huggingface_token with your actual values.

🧑‍🏫 SFT Training

For Supervised Fine-Tuning (SFT), use the following command:

CUDA_VISIBLE_DEVICES=${device_nums} python sft.py \
  --base_model ${model_name} \
  --output_path ./sft_output \
  --hug_token ${huggingface_token}

📝 Evaluation

To evaluate using the metric prompts, run:

python metric_prompt.py --api-key ${openai_api_key}

Note: You'll need an OpenAI API key for evaluation.

📄 Citation

If you use this repository or dataset in your research, please cite our paper (link to be added after publication).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AHaBench

📊 Dataset

🚀 How to Use

🛠️ Dependencies

🏋️‍♂️ DPO Training

🧑‍🏫 SFT Training

📝 Evaluation

📄 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
dpo.py		dpo.py
metric_prompt.py		metric_prompt.py
requirements.txt		requirements.txt
sft.py		sft.py

Neurips1003/AHaBench

Folders and files

Latest commit

History

Repository files navigation

AHaBench

📊 Dataset

🚀 How to Use

🛠️ Dependencies

🏋️‍♂️ DPO Training

🧑‍🏫 SFT Training

📝 Evaluation

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages