# 端到端使用 TextAttack 进行攻击
TextAttack 是一个关于自然语言处理模型（nlp）对抗攻击的 python 框架。它实现了对抗攻击、数据增强以及模型训练等功能。本材料利用其端到端的命令行接口 textattack 进行简单的 NLP 对抗攻击以及评估。

具体细节可参考[TextAttack 官方仓库](https://github.com/QData/TextAttack)以及[TextAttack 官方文档](https://textattack.readthedocs.io/en/master/)

---

使用 pip 安装 textattack 库

In [None]:
!pip install textattack

### 1. 训练
首先我们可以对nlp模型进行训练。TextAttack 集成了 transformers 库与 datasets 库（均来自 huggingface），因此可以加载 datasets 库支持的数据集来训练 transformers 库支持的预训练模型。

在此，我们使用 Rotten Tomatoes Movie Review 训练集。首先使用```textattack peek-dataset```来展示数据集的信息。

下载数据集需要将HF_ENDPOINT改为国内镜像。

过程中下载 NLTK_data 很慢，很大概率最终会下载失败。在运行之后的命令时会提示没找到，然后显示“Search in ... (some path)”，这是程序期望 nltk_data 所在的位置。此时可以使用 [nltk_data 的 github 仓库](https://github.com/nltk/nltk_data) ，将其中的 packages 目录下载后，放在之前错误信息提到的位置中，重命名为 nltk_data 并解压就可以了（很大，3.6G，~~用完记得删除~~）。

In [1]:
!HF_ENDPOINT=https://hf-mirror.com textattack peek-dataset --dataset-from-huggingface rotten_tomatoes

2025-04-07 11:22:48.282187: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-07 11:22:48.599321: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[34;1mtextattack[0m: Loading [94mdatasets[0m dataset [94mrotten_tomatoes[0m, split [94mtrain[0m.
[34;1mtextattack[0m: Number of samples: [94m8530[0m
[34;1mtextattack[0m: Number of words per input:
[34;1mtextattack[0m: 	total:   [94m157755[0m
[34;1mtextattack[0m: 	mean:    [94m18.49[0m
[34;1mtextattack[0m: 	std:     [94m8.58[0m
[34;1mtextattack[0m: 	min:  

这个代码可以递归解压 nltk_data 目录下的所有 zip 文件并删除解压完的 zip 文件。

In [None]:
# 解压 nltk_data 并删除所有 zip 文件的代码
import os
import zipfile

def extract_and_remove_zips(root_dir):
    for folder, _, files in os.walk(root_dir):
        for file in files:
            if file.endswith(".zip"):
                zip_path = os.path.join(folder, file)
                extract_path = folder  # 解压到当前 ZIP 文件所在的目录
                
                try:
                    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                        zip_ref.extractall(extract_path)
                    print(f"Extracted: {zip_path} -> {extract_path}")
                    os.remove(zip_path)
                    print(f"Deleted: {zip_path}")
                except Exception as e:
                    print(f"Error processing {zip_path}: {e}")

if __name__ == "__main__":
    root_directory = ""  # 替换为你的目标目录
    extract_and_remove_zips(root_directory)

接下来我们将要训练 [`distilbert-base-uncased`](https://huggingface.co/transformers/model_doc/distilbert.html)，它是一个相对较小的模型，但也能很好的展示 `textattack` 库是如何与 `transformers` 库结合的。同样的由于网络速度原因，需要指定环境变量将HF_ENDPOINT改为国内镜像。

所以接下来将要运行命令：
```shell
textattack train                      \ # 使用 textattack 训练一个模型
    --model distilbert-base-uncased   \ # 使用 distilbert 模型, uncased 版本。模型来自 `transformers`
    --dataset rotten_tomatoes         \ # 在 Rotten Tomatoes 数据集
    --model-num-labels 2              \ # 训练集有 2 个标签
    --model-max-length 64             \ # 最大输入序列长度为 64
    --per-device-train-batch-size 128 \ # batch_size 设置为 128
    --num-epochs 3                    \ # 训练 3 轮
```

In [4]:
!HF_ENDPOINT=https://hf-mirror.com textattack train --model-name-or-path distilbert-base-uncased --dataset rotten_tomatoes --model-num-labels 2 --model-max-length 64 --per-device-train-batch-size 128 --num-epochs 3

2025-04-07 11:24:49.912007: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-07 11:24:49.932613: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[34;1mtextattack[0m: Loading transformers AutoModelForSequenceClassification: distilbert-base-uncased
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on

使用 `textattack eval` 来评估刚才训练的模型，注意将 `--model` 参数改为刚才训练模型保存的文件

In [6]:
!HF_ENDPOINT=https://hf-mirror.com textattack eval --num-examples 1000 --model ./outputs/2025-03-27-11-58-31-902283/best_model/ --dataset-from-huggingface rotten_tomatoes --dataset-split test

2025-04-07 11:26:31.305533: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-07 11:26:31.326019: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[34;1mtextattack[0m: Loading [94mdatasets[0m dataset [94mrotten_tomatoes[0m, split [94mtest[0m.
[34;1mtextattack[0m: Got 1000 predictions.
[34;1mtextattack[0m: Correct 842/1000 ([94m84.20%[0m)


接下来使用集成的`textattack attack`对刚才的模型进行攻击，这里使用`textfooler`方法进行攻击。

In [7]:
!HF_ENDPOINT=https://hf-mirror.com TFHUB_CACHE_DIR=./tmp textattack attack --recipe textfooler --num-examples 100 --model ./outputs/2025-03-27-11-58-31-902283/best_model/ --dataset-from-huggingface rotten_tomatoes --dataset-split test

2025-04-07 11:28:38.707237: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-07 11:28:38.737897: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[34;1mtextattack[0m: Loading [94mdatasets[0m dataset [94mrotten_tomatoes[0m, split [94mtest[0m.
[34;1mtextattack[0m: Unknown if model of class <class 'transformers.models.distilbert.modeling_distilbert.DistilBertForSequenceClassification'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.
At

输出中包含非常多的信息：包含每个样本攻击前与攻击后的对比，以及模型预测的变化。最后有一张 Attack Results 表来展示攻击的结果。Attack success rate 代表攻击成功率（即模型错误判断的概率），Original accuracy 代表模型在干净样本上的正确率

textattack 还集成了许多其他攻击方式，感兴趣的同学可以参考[官方文档](https://textattack.readthedocs.io/en/master/)进行进一步的探索