Skip to content

Neurips1003/AHaBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AHaBench

This is the official repository for our paper:

"Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs"

πŸ“Š Dataset

You can download the AHaBench and AHaPairs datasets from Hugging Face.

πŸš€ How to Use

πŸ› οΈ Dependencies

  • Python 3.10

  • Python packages
    Install the required packages with:

    pip install -r requirements.txt
  • For DPO Training Requires at least one A100 GPU node.

  • For Evaluation Metrics Uses the OpenAI API, only CPU is needed.

πŸ‹οΈβ€β™‚οΈ DPO Training

Run the following command to train with Direct Preference Optimization (DPO):

CUDA_VISIBLE_DEVICES=${device_nums} python dpo.py \
  --input_path data/train.json \  # change AHaPairs.json to train.json if needed
  --base_model ${model_name} \
  --output_path ./dpo_output \
  --hug_token ${huggingface_token}

Note: Replace device_nums, model_name, and huggingface_token with your actual values.

πŸ§‘β€πŸ« SFT Training

For Supervised Fine-Tuning (SFT), use the following command:

CUDA_VISIBLE_DEVICES=${device_nums} python sft.py \
  --base_model ${model_name} \
  --output_path ./sft_output \
  --hug_token ${huggingface_token}

πŸ“ Evaluation

To evaluate using the metric prompts, run:

python metric_prompt.py --api-key ${openai_api_key}

Note: You'll need an OpenAI API key for evaluation.

πŸ“„ Citation

If you use this repository or dataset in your research, please cite our paper (link to be added after publication).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages