-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taskflow做信息抽取报错:substring not found #2854
Comments
@yingyibiao 辛苦帮忙看下这个问题 from paddlenlp.transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("ernie-3.0-base-zh")
tokenizer("MetertechΣ960 酶标仪(中国台湾Metertech公司)", return_offsets_mapping=True) 报错如下:
|
在使用Taskflow进行information extraction时遇到同样的问题, ValueError: substring not found |
可以安装一下最新的develop版本的paddlenlp,下周一我们会发布官方的版本 |
@leon-cas 请问该问题目前2.4.4版本有解决方案吗? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
出错代码
from paddlenlp import Taskflow schema = ['仪器'] ie = Taskflow('information_extraction', schema=schema) ie.set_schema(schema) ie('MetertechΣ960 酶标仪(中国台湾Metertech公司)')
报错信息:
1137 return batch_outputs /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/transformers/tokenizer_utils.py in _batch_prepare_for_model(self, batch_ids_pairs, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, pad_to_multiple_of, return_position_ids, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_dict, return_offsets_mapping, return_length, verbose, **kwargs) 1281 prepend_batch_axis=False, 1282 verbose=verbose, -> 1283 **kwargs) 1284 for key, value in encoded_inputs.items(): 1285 if key not in batch_outputs: /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/transformers/tokenizer_utils_base.py in prepare_for_model(self, ids, pair_ids, padding, truncation, max_length, stride, pad_to_multiple_of, return_tensors, return_position_ids, return_token_type_ids, return_attention_mask, return_length, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, add_special_tokens, verbose, prepend_batch_axis, **kwargs) 2798 2799 token_offset_mapping = self.get_offset_mapping(text) -> 2800 token_pair_offset_mapping = self.get_offset_mapping(text_pair) 2801 if max_length and total_len > max_length: 2802 token_offset_mapping, token_pair_offset_mapping, _ = self.truncate_sequences( /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/transformers/tokenizer_utils.py in get_offset_mapping(self, text) 1349 token = token[2:] 1350 -> 1351 start = text[offset:].index(token) + offset 1352 1353 end = start + len(token) ValueError: substring not found
初步测试和判断:是特殊文本导致的,但不知道具体原因??
The text was updated successfully, but these errors were encountered: