Skip to content

SecureDL/arabic_jailbreak

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jailbreaking LLMs with Arabic Transliteration and Arabizi

This repository contains codes for our paper "Jailbreaking LLMs with Arabic Transliteration and Arabizi". Our paper investigates the use of Arabic non-standardized forms such as Arabizi and Transliteration to jailbreak LLMs. The paper also investigates potenial security risks of using these forms to vulnerabilities exposure such as learned model shortcuts. The results of the experiments highlights the need for more safety and adversarial training in cross-lingual manner with awareness of nonstandardized language forms, especially for Arabic.

Environment Setup

  1. Requirements:
    Python
    PyTorch
    openai
    anthropic

  2. Denpencencies:

pip install transformers
pip install torch
pip install openai
pip install anthropic

Data preparation

  1. Datasets used for this project can be obtained from the following link:
    Advbench: https://github.com/llm-attacks/llm-attacks
    This dataset is also available here

  2. Use file translate_convert_arabic.ipynb for helper codes for translation and convertion to Arabic and its forms. We have also prepared all the data needed for experiments under data

Experiments

llm-test-ar.py contains all necessary codes to prompt Anthropic and OpenAI models. The file is commented and self-explained.
To reproduce our experiments please read and run the script experiments.sh. Evaluation is done manually, so manual inspection of results is mandatory.

If you like this work please cite it.

Citation

@article{ghanim2024jailbreaking,
  title={Jailbreaking LLMs with Arabic Transliteration and Arabizi},
  author={Ghanim, Mansour Al and Almohaimeed, Saleh and Zheng, Mengxin and Solihin, Yan and Lou, Qian},
  journal={arXiv preprint arXiv:2406.18725},
  year={2024}
}

About

Jailbreaking LLMs with Arabic Transliteration and Arabizi

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors