Skip to content

Commit

Permalink
v0.4.7
Browse files Browse the repository at this point in the history
  • Loading branch information
Tongjilibo committed Feb 4, 2024
1 parent 4b8e008 commit bdf11e0
Show file tree
Hide file tree
Showing 7 changed files with 29 additions and 28 deletions.
6 changes: 1 addition & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,13 +78,9 @@ pip install git+https://github.com/Tongjilibo/bert4torch

|更新日期| bert4torch | torch4keras | 版本说明 |
|------| ---------------- | ----------------- |----------- |
|20240204| 0.4.7 | 0.1.9|修改`save_pretrained`用于保存文件夹, 增加GenerateSpeed用于统计token生成速度,修复t5在use_states=True时候的错误, 修改层次编码的bug, 增加deepseek_moe模型,修复generation并发错误,优化大模型耗时|
|20240116| 0.4.6 | 0.1.8|bug修复,增加`save_pretrained`用于保存`transformer`格式的权重, 增加部分`embedding`模型|
|20240111| 0.4.5 | 0.1.7|`training`时候不生成`past_key_values`, 增加`streamlit`的example, 修复句向量`max`时的bug, `batch_generate`合并到`generate`, 修改`generation`的默认参数名(兼容过去的参数名), 多轮对话中可保留`past_key_values`, 把`attention`中的`mask`补齐逻辑移到`apply_embedding`中, 增加`uie``pipeline`,增加`PtuningV2Trainer`|
|20231228| 0.4.4 | 0.1.7|新增`pipelines`模块,把`chat`整理进去,并新增`Text2Vec`模块用于向量生成,新增`snapshot_download`用于hf模型下载|
|20231224| 0.4.3 | 0.1.7|`chat`中增加常见chat模型, 简化大模型调用的代码逻辑|
|20231219| 0.4.2 | 0.1.7|参数`checkpoint_path`支持传入文件夹地址,增加`chat`模块用于快速发布demo/api, 支持加载`.safetensors`, `meta`的device提示报错|
|20231210| 0.4.1 | 0.1.6.post2|增加longlora, 增加test模块,适配torch4keras==0.1.6(监控fit过程,有报错则发送邮件提醒; 解决torch2.0的compile冲突问题; 修复clip_grad_norm的bug)|
|20231126| 0.4.0 | 0.1.5 |修复flash_attn的bug, stream_generate支持仅输出last_token|

[更多版本](https://github.com/Tongjilibo/bert4torch/blob/master/docs/Update.md)

Expand Down
12 changes: 5 additions & 7 deletions bert4torch/layers/attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ def __init__(self, hidden_size, num_attention_heads, attention_probs_dropout_pro
# 获取flash_attention的配置项
self.flash_attention_config = kwargs.get('flash_attention_config', dict())
self.is_causal = kwargs.get('is_causal', False) or self.flash_attention_config.pop('is_causal', False)
if (flash_attention is None) and (int(torch.__version__.split('.')[0]) >= 2):
flash_attention = 'sdpa'
elif ((flash_attention is True) or (flash_attention == 'sdpa')) and (int(torch.__version__.split('.')[0]) < 2):
if ((flash_attention is True) or (flash_attention == 'sdpa')) and (int(torch.__version__.split('.')[0]) < 2):
log_warn_once('`F.scaled_dot_product_attention` only supported in torch 2.0')
flash_attention = None
elif (flash_attention == 'xformers') and (not is_xformers_available()):
Expand Down Expand Up @@ -176,10 +174,10 @@ def forward(self, hidden_states=None, attention_mask=None, encoder_hidden_states
context_layer = xops.memory_efficient_attention(query_layer, key_layer, value_layer, attn_bias=xops.LowerTriangularMask())
# SDPA
elif self.flash_attention in {True, 'sdpa'}:
# is_causal=True 仅适用于qlen=klen,且单条样本时(多条样本mask必须使用)
if attention_mask.size(0)==1 and (query_layer.shape[2] == key_layer.shape[2]):
context_layer = F.scaled_dot_product_attention(query_layer, key_layer, value_layer, is_causal=True)
elif len(self.flash_attention_config) == 0:
# # is_causal=True 仅适用于qlen=klen,且单条样本时(多条样本mask必须使用)
# if attention_mask.size(0)==1 and (query_layer.shape[2] == key_layer.shape[2]):
# context_layer = F.scaled_dot_product_attention(query_layer, key_layer, value_layer, is_causal=True)
if len(self.flash_attention_config) == 0:
# 默认方式
context_layer = F.scaled_dot_product_attention(query_layer, key_layer, value_layer, attention_mask.bool())
else:
Expand Down
3 changes: 2 additions & 1 deletion docs/History.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## 更新历史

- **20240121**:增加GenerateSpeed用于统计token生成速度,修复t5在use_states=True时候的错误, 修改层次编码的bug
- **20240204**:增加deepseek_moe模型,修复generation并发错误,优化大模型耗时
- **20240121**:修改`save_pretrained`用于保存文件夹, 增加GenerateSpeed用于统计token生成速度,修复t5在use_states=True时候的错误, 修改层次编码的bug
- **20240116**:bug修复,增加`save_pretrained`用于保存`transformer`格式的权重, 增加部分`embedding`模型|
- **20240111**`training`时候不生成`past_key_values`, 增加`streamlit`的example, 修复句向量`max`时的bug, `batch_generate`合并到`generate`, 修改`generation`的默认参数名(兼容过去的参数名), 多轮对话中可保留`past_key_values`, 把`attention`中的`mask`补齐逻辑移到`apply_embedding`中, 增加`uie``pipeline`,增加`PtuningV2Trainer`
- **20231228**:新增`pipelines`模块,把`chat`整理进去,并新增`Text2Vec`模块用于向量生成,新增`snapshot_download`用于hf模型下载
Expand Down
5 changes: 5 additions & 0 deletions docs/Update.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

|更新日期| bert4torch版本 | torch4keras版本 | 版本说明 |
|------| ---------------- | ----------------- |----------- |
|20240204| 0.4.7 | 0.1.9|修改`save_pretrained`用于保存文件夹, 增加GenerateSpeed用于统计token生成速度,修复t5在use_states=True时候的错误, 修改层次编码的bug, 增加deepseek_moe模型,修复generation并发错误,优化大模型耗时|
|20240116| 0.4.6 | 0.1.8|bug修复,增加`save_pretrained`用于保存`transformer`格式的权重, 增加部分`embedding`模型|
|20240111| 0.4.5 | 0.1.7|`training`时候不生成`past_key_values`, 增加`streamlit`的example, 修复句向量`max`时的bug, `batch_generate`合并到`generate`, 修改`generation`的默认参数名(兼容过去的参数名), 多轮对话中可保留`past_key_values`, 把`attention`中的`mask`补齐逻辑移到`apply_embedding`中, 增加`uie``pipeline`,增加`PtuningV2Trainer`|
|20231228| 0.4.4 | 0.1.7|新增`pipelines`模块,把`chat`整理进去,并新增`Text2Vec`模块用于向量生成,新增`snapshot_download`用于hf模型下载|
|20231224| 0.4.3 | 0.1.7|`chat`中增加常见chat模型, 简化大模型调用的代码逻辑|
|20231219| 0.4.2 | 0.1.7|参数`checkpoint_path`支持传入文件夹地址,增加`chat`模块用于快速发布demo/api, 支持加载`.safetensors`, `meta`的device提示报错|
|20231210| 0.4.1 | 0.1.6.post2|增加longlora, 增加test模块,适配torch4keras==0.1.6(监控fit过程,有报错则发送邮件提醒; 解决torch2.0的compile冲突问题; 修复clip_grad_norm的bug)|
|20231126| 0.4.0 | 0.1.5 |修复flash_attn的bug, stream_generate支持仅输出last_token|
Expand Down
Binary file modified docs/pics/wechat_group.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 9 additions & 8 deletions examples/basic/glm/basic_language_model_chatglm_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
import torch
from bert4torch.models import build_transformer_model
from transformers import AutoTokenizer
from bert4torch.generation import AutoRegressiveDecoder, SeqGeneration
from bert4torch.generation import SeqGeneration
from bert4torch.snippets import Timeit2
import time
import os

Expand All @@ -22,29 +23,29 @@

tokenizer = AutoTokenizer.from_pretrained(dir_path.replace('/', '\\'), trust_remote_code=True)
encoder = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path).to(device)
generation = SeqGeneration(encoder, tokenizer, start_id=None, end_id=tokenizer.eos_token_id, pad_id=tokenizer.pad_token_id,
generation = SeqGeneration(encoder, tokenizer, end_id=tokenizer.eos_token_id, pad_id=tokenizer.pad_token_id,
mode='random_sample', maxlen=2048, default_rtype='logits', use_states=True)


print('===============single================')
start = time.time()
ti = Timeit2()
for text in texts:
response = generation.generate(text, topk=50, topp=0.7, temperature=0.95)
print(response)
print(f'Consume: {time.time()-start}s')
ti('single')


print('===============batch_cache================')
start = time.time()
response = generation.generate(texts, topk=50, topp=0.7, temperature=0.95)
print(response)
print(f'Consume: {time.time()-start}s')
ti('batch_cache')


print('===============batch_nocache================')
start = time.time()
generation = SeqGeneration(encoder, tokenizer, start_id=None, end_id=tokenizer.eos_token_id, pad_id=tokenizer.pad_token_id,
mode='random_sample', maxlen=2048, default_rtype='logits', use_states=False)
ti.restart()
response = generation.generate(texts, topk=50, topp=0.7, temperature=0.95)
print(response)
print(f'Consume: {time.time()-start}s')
ti('batch_nocache')
ti.end()
14 changes: 7 additions & 7 deletions test/llm/test_llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,11 +271,11 @@ def test_yi(model_dir):


if __name__=='__main__':
# test_baichuan('E:/pretrain_ckpt/llama/Baichuan-7B')
# test_belle('E:/pretrain_ckpt/llama/belle-llama-7b-2m')
# test_chinese_llama_alpaca('E:/pretrain_ckpt/llama/hfl@chinese_alpaca_plus_7b')
# test_vicuna('E:/pretrain_ckpt/llama/lmsys@vicuna-7b-v1.5')
# test_ziya('E:/pretrain_ckpt/llama/IDEA-CCNL@Ziya-LLaMA-13B-v1.1')
# test_llama2('E:/pretrain_ckpt/llama/llama-2-7b-chat')
# test_llama('E:/pretrain_ckpt/llama/llama-7b')
test_baichuan('E:/pretrain_ckpt/llama/Baichuan-7B')
test_belle('E:/pretrain_ckpt/llama/belle-llama-7b-2m')
test_chinese_llama_alpaca('E:/pretrain_ckpt/llama/hfl@chinese_alpaca_plus_7b')
test_vicuna('E:/pretrain_ckpt/llama/lmsys@vicuna-7b-v1.5')
test_ziya('E:/pretrain_ckpt/llama/IDEA-CCNL@Ziya-LLaMA-13B-v1.1')
test_llama2('E:/pretrain_ckpt/llama/llama-2-7b-chat')
test_llama('E:/pretrain_ckpt/llama/llama-7b')
test_yi("E:/pretrain_ckpt/llama/01-ai@Yi-6B")

0 comments on commit bdf11e0

Please sign in to comment.