[BUG/Help] <title>ValueError: 130001 is not in list #596

WanJuWuGo · 2023-04-13T13:28:42Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

ptuning时候max_steps改大一点点, 就会这样，是我哪里搞错了吗

Expected Behavior

。

Steps To Reproduce

。

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

。

WanJuWuGo · 2023-04-13T13:33:04Z

代码和模型版本都是最新

XiaTiaoQAQ · 2023-04-13T14:00:39Z

#432
看看这个，应该替换下output模型内的文件就可以解决

luolanfeixue · 2023-04-14T05:03:36Z

mask_token = gMASK if gMASK in input_ids else MASK
这句代码有问题，input_ids为一个batch，一个batch只要有一个example中存在gMASK则mask_token就等于gMASK。

这样会导致下面这句报错（假设input_ids有一条数据存在gMASK，一条数据存在MASK）
mask_positions = [seq.tolist().index(mask_token) for seq in input_ids]

按照下面写把原来那两句注释掉
mask_positions =[]
for seq in input_ids:
mask_token = gMASK if gMASK in seq else MASK
mask_positions.append(seq.tolist().index(mask_token))

duzx16 · 2023-04-14T06:15:31Z

mask_token = gMASK if gMASK in input_ids else MASK 这句代码有问题，input_ids为一个batch，一个batch只要有一个example中存在gMASK则mask_token就等于gMASK。

这样会导致下面这句报错（假设input_ids有一条数据存在gMASK，一条数据存在MASK） mask_positions = [seq.tolist().index(mask_token) for seq in input_ids]

按照下面写把原来那两句注释掉 mask_positions =[] for seq in input_ids: mask_token = gMASK if gMASK in seq else MASK mask_positions.append(seq.tolist().index(mask_token))

目前的实现里都是用gMASK的，如果没用gMASK就是出错了

luolanfeixue · 2023-04-14T06:18:00Z

mask_token = gMASK if gMASK in input_ids else MASK 这句代码有问题，input_ids为一个batch，一个batch只要有一个example中存在gMASK则mask_token就等于gMASK。
这样会导致下面这句报错（假设input_ids有一条数据存在gMASK，一条数据存在MASK） mask_positions = [seq.tolist().index(mask_token) for seq in input_ids]
按照下面写把原来那两句注释掉 mask_positions =[] for seq in input_ids: mask_token = gMASK if gMASK in seq else MASK mask_positions.append(seq.tolist().index(mask_token))

目前的实现里都是用gMASK的，如果没用gMASK就是出错了

如果数据本身存在mask，在
tokenizer.build_inputs_with_special_tokens(a_ids, b_ids)就不会加入gMask

luolanfeixue · 2023-04-14T06:21:52Z

mask_token = gMASK if gMASK in input_ids else MASK 这句代码有问题，input_ids为一个batch，一个batch只要有一个example中存在gMASK则mask_token就等于gMASK。
这样会导致下面这句报错（假设input_ids有一条数据存在gMASK，一条数据存在MASK） mask_positions = [seq.tolist().index(mask_token) for seq in input_ids]
按照下面写把原来那两句注释掉 mask_positions =[] for seq in input_ids: mask_token = gMASK if gMASK in seq else MASK mask_positions.append(seq.tolist().index(mask_token))

目前的实现里都是用gMASK的，如果没用gMASK就是出错了

那如果数据里有mask的情况，数据需要把mask去掉？

duzx16 · 2023-04-14T06:24:08Z

mask_token = gMASK if gMASK in input_ids else MASK 这句代码有问题，input_ids为一个batch，一个batch只要有一个example中存在gMASK则mask_token就等于gMASK。
这样会导致下面这句报错（假设input_ids有一条数据存在gMASK，一条数据存在MASK） mask_positions = [seq.tolist().index(mask_token) for seq in input_ids]
按照下面写把原来那两句注释掉 mask_positions =[] for seq in input_ids: mask_token = gMASK if gMASK in seq else MASK mask_positions.append(seq.tolist().index(mask_token))

目前的实现里都是用gMASK的，如果没用gMASK就是出错了

那如果数据里有mask的情况，数据需要把mask去掉？

哦原来是这样，我终于知道出现这种错误的都是什么情况了。

duzx16 · 2023-04-14T08:00:41Z

mask_token = gMASK if gMASK in input_ids else MASK 这句代码有问题，input_ids为一个batch，一个batch只要有一个example中存在gMASK则mask_token就等于gMASK。
这样会导致下面这句报错（假设input_ids有一条数据存在gMASK，一条数据存在MASK） mask_positions = [seq.tolist().index(mask_token) for seq in input_ids]
按照下面写把原来那两句注释掉 mask_positions =[] for seq in input_ids: mask_token = gMASK if gMASK in seq else MASK mask_positions.append(seq.tolist().index(mask_token))

目前的实现里都是用gMASK的，如果没用gMASK就是出错了

那如果数据里有mask的情况，数据需要把mask去掉？

现在应该已经修复了，不管数据里有没有 [MASK]，tokenizer都会在末尾加入 [gMASK]。不过还是建议把数据里的 [MASK] 去掉。目前transformers 没有提供可以不编码这些special token的选项。

duzx16 closed this as completed Apr 14, 2023

duzx16 mentioned this issue Apr 14, 2023

[BUG/Help] 求大佬看下，微调多轮对话，预测时出现130000 is not in list问题 #432

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/Help] <title>ValueError: 130001 is not in list #596

[BUG/Help] <title>ValueError: 130001 is not in list #596

WanJuWuGo commented Apr 13, 2023

WanJuWuGo commented Apr 13, 2023

XiaTiaoQAQ commented Apr 13, 2023

luolanfeixue commented Apr 14, 2023

duzx16 commented Apr 14, 2023

luolanfeixue commented Apr 14, 2023

luolanfeixue commented Apr 14, 2023

duzx16 commented Apr 14, 2023

duzx16 commented Apr 14, 2023

[BUG/Help] <title>ValueError: 130001 is not in list #596

[BUG/Help] <title>ValueError: 130001 is not in list #596

Comments

WanJuWuGo commented Apr 13, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

WanJuWuGo commented Apr 13, 2023

XiaTiaoQAQ commented Apr 13, 2023

luolanfeixue commented Apr 14, 2023

duzx16 commented Apr 14, 2023

luolanfeixue commented Apr 14, 2023

luolanfeixue commented Apr 14, 2023

duzx16 commented Apr 14, 2023

duzx16 commented Apr 14, 2023