Pretraining-Mask sentences #23

Ronica1234 · 2022-11-20T23:45:52Z

Hello,
In the primera pretrain process. The model choose 30% of the sentences by pyramid methods and then 50% of the candidates (15% of the sentences) will be mask while all 30% will be kept as the target. May I know why the 15% masked sentences will not be inputted in the target?

for i_d in range(len(truncated_doc)):
for i_s in range(len(truncated_doc[i_d])):
if cur_idx in mask_indices:
tgt.append(truncated_doc[i_d][i_s])
# here is the line which choose 50% percent of the candidates (30% percent of sentences) for masking
if cur_idx not in non_mask_indices:
truncated_doc[i_d][i_s] = ''#tokenizer.mask_token
cur_idx += 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretraining-Mask sentences #23

Pretraining-Mask sentences #23

Ronica1234 commented Nov 20, 2022

Pretraining-Mask sentences #23

Pretraining-Mask sentences #23

Comments

Ronica1234 commented Nov 20, 2022