[Feature] Filter negative labels #244

xiaohangguo · 2023-11-24T03:05:28Z

在数据集处理时，如果训练集中input过长，超过了max_length，这种数据虽然在tokenizer部分被标记成了-100，但仍然会参与向前计算和损失函数的计算，这会浪费计算资源，因此写了一个filter_out_all_negative_labels函数来过滤掉负标签的样本。这样使得负样本不会参与向前传播的计算和损失函数的计算。

During dataset processing, if the input in the training set is too long and exceeds max_length, such data, although marked as -100 in the tokenizer section, will still participate in the forward computation and loss function calculation, which can waste computational resources. Therefore, a function named filter_out_all_negative_labels was written to filter out samples with negative labels. This ensures that samples with negative labels do not participate in forward propagation and loss function calculations.

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

LZHgrla

Thanks!
LGTM

xiaohangguo and others added 8 commits November 22, 2023 20:02

add wizard_coder templates

d3bdc5a

Update xtuner/utils/templates.py

1cab7af

Co-authored-by: Zhihao Lin <36994684+LZHgrla@users.noreply.github.com>

fix pre-commit

035db38

yapf formatting code

c096c3a

yapf formatting code

72ab89f

filter negative labels data in processing datasets

1fb1a65

filter negative labels data in processing datasets

5c73ec1

Merge branch 'main' into filter_negative_labels

5fb905e

LZHgrla self-requested a review November 24, 2023 04:18

LZHgrla added 2 commits November 24, 2023 13:38

Update huggingface.py

b7e3736

Update huggingface.py

04497de

LZHgrla approved these changes Nov 24, 2023

View reviewed changes

LZHgrla merged commit 97905af into InternLM:main Nov 24, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Filter negative labels #244

[Feature] Filter negative labels #244

xiaohangguo commented Nov 24, 2023

LZHgrla left a comment

[Feature] Filter negative labels #244

[Feature] Filter negative labels #244

Conversation

xiaohangguo commented Nov 24, 2023

LZHgrla left a comment

Choose a reason for hiding this comment