You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
In the primera pretrain process. The model choose 30% of the sentences by pyramid methods and then 50% of the candidates (15% of the sentences) will be mask while all 30% will be kept as the target. May I know why the 15% masked sentences will not be inputted in the target?
for i_d in range(len(truncated_doc)):
for i_s in range(len(truncated_doc[i_d])):
if cur_idx in mask_indices:
tgt.append(truncated_doc[i_d][i_s])
# here is the line which choose 50% percent of the candidates (30% percent of sentences) for masking
if cur_idx not in non_mask_indices:
truncated_doc[i_d][i_s] = ''#tokenizer.mask_token
cur_idx += 1
The text was updated successfully, but these errors were encountered:
Hello,
In the primera pretrain process. The model choose 30% of the sentences by pyramid methods and then 50% of the candidates (15% of the sentences) will be mask while all 30% will be kept as the target. May I know why the 15% masked sentences will not be inputted in the target?
for i_d in range(len(truncated_doc)):
for i_s in range(len(truncated_doc[i_d])):
if cur_idx in mask_indices:
tgt.append(truncated_doc[i_d][i_s])
# here is the line which choose 50% percent of the candidates (30% percent of sentences) for masking
if cur_idx not in non_mask_indices:
truncated_doc[i_d][i_s] = ''#tokenizer.mask_token
cur_idx += 1
The text was updated successfully, but these errors were encountered: