Manipulated Knowledge Spread

Reproduction Code for Paper "Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities". The preprint of our paper is publicly available at this link.

Requirements

Install the required python dependencies:

pip install -r requirements.txt

Datasets

All datasets we used are provided in the data/ folder, including CounterFact (1K), zsRE (1K) and their toxic versions.

Instructions

Baseline

Perform and evaluate knowledge editing on the CounterFact (1K) dataset using vicuna 7B without any multi-agent interaction:

python baseline_easyedit.py --config_path=../config/agent/vicuna-7b.yaml

Intuition Verification

We request the agents and GPT-4 to generate fake but plausible evidence for all manipulated evidence in the data/ folder.

Evaluate the extent to which single agent is persuaded under different prompt settings:

python baseline_prompt_edit.py --config_path=../config/agent/vicuna-7b.yaml --prompt_type=no_edit

python baseline_prompt_edit.py --config_path=../config/agent/vicuna-7b.yaml --prompt_type=direct_answer --with_evidence

Attack Pipeline

Stage 1: Persuasiveness Injection

To inject persuasiveness into the agent, you should first generate perference data for the LLM:

python generate_dataset.py

We encourage to generate different perference datasets for different LLMs, which minimizes the impact of the LLMs.

Then we use the DPO method for training:

python dpo_training.py

You can modify ckpt_path to adjust the LoRA model path, which will be used in the second stage.

Stage 2: Manipulated Knowledge Injection

Training

As a running example, the script for testing the results of manipulated knowledge spread on the CounterFact (1K) dataset using vicuna 7B is as follows:

python simulation.py --config_path=../config/agent/vicuna-7b.yaml

All chats will be stored in history/ for subsequent experimental analyses. For other experimental setups, you can modify the corresponding yaml file in config/.

RAG Scenario

Format chat histories

python format_rag.py --dataset_path=./counterfact/counterfact-edit-1k.json --input_folder=<chat_history_directory>

RAG training

Evaluation

python baseline_prompt_edit.py --config_path=../config/agent/vicuna-7b.yaml --prompt_type=rag --rag_path=<path_to_rag> --top_k=5

Citing

@misc{ju2024flooding,
    title={Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities},
    author={Tianjie Ju and Yiting Wang and Xinbei Ma and Pengzhou Cheng and Haodong Zhao and Yulong Wang and Lifeng Liu and Jian Xie and Zhuosheng Zhang and Gongshen Liu},
    year={2024},
    eprint={2407.07791},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data		data
figures		figures
simulation		simulation
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manipulated Knowledge Spread

Requirements

Datasets

Instructions

Baseline

Intuition Verification

Attack Pipeline

Stage 1: Persuasiveness Injection

Stage 2: Manipulated Knowledge Injection

Training

RAG Scenario

Citing

License

About

Releases

Packages

Contributors 2

Languages

Jometeorie/KnowledgeSpread

Folders and files

Latest commit

History

Repository files navigation

Manipulated Knowledge Spread

Requirements

Datasets

Instructions

Baseline

Intuition Verification

Attack Pipeline

Stage 1: Persuasiveness Injection

Stage 2: Manipulated Knowledge Injection

Training

RAG Scenario

Citing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages