本仓库的代码是基于Google发布的tensorflow1.x版本的改进版本,已经将全部的api转换为tensorflow2.x版本可兼容的api。以下的实验结果基于BERT-Base, Multilingual Cased,单个任务执行需要消耗14G左右的显存。
复现笔记:https://mp.weixin.qq.com/s/0yhLIBFosBHe7QYvcoNI7g?token=224962903&lang=zh_CN
系统:Win11专业版
显卡:NVIDIA GeForce RTX 4060Ti 16G
CUDA:12.6
Python:3.10.12
Tensorflow-GPU:2.10.0
cudatoolkit:11.2.2
cudnn:8.1.0.77
conda create -n bert-tf python==3.10.12
conda activate bert-tf
conda install conda-forge::cudatoolkit==11.2.2
conda install conda-forge::cudnn==8.1.0.77
pip install tensorflow-gpu==2.10.0
pip install six==1.15.0
模型下载(from Google bert)
BERT-Large, Uncased (Whole Word Masking)
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Large, Cased (Whole Word Masking)
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Uncased
: 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Large, Uncased
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Cased
: 12-layer, 768-hidden, 12-heads , 110M parametersBERT-Large, Cased
: 24-layer, 1024-hidden, 16-heads, 340M parametersBERT-Base, Multilingual Cased (New, recommended)
: 104 languages, 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Base, Multilingual Uncased (Orig, not recommended)
(Not recommended, useMultilingual Cased
instead): 102 languages, 12-layer, 768-hidden, 12-heads, 110M parametersBERT-Base, Chinese
: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
GLUE数据集:
python download_glue_data.py
SQUAD(手动下载):
SQuAD 1.1 | SQuAD 2.0 |
---|---|
train-v1.1.json | train-v2.0.json |
dev-v1.1.json | dev-v2.0.json |
evaluate-v1.1.py | evaluate-v2.0.py |
**open cmd**
./run/run_classifier.bat
./run/run_squad.bat
./run/create_pretraining_data.bat
./run/extract_features.bat
./run/run_pretraining.bat
run_classifier.bat以COLA为例:
run_squad.bat以squad1.1为例:
squad需要执行独立的评估代码才能计算出相应的评估指标
创建预训练数据
文本特征提取
模型预训练