Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining-Mask sentences #23

Open
Ronica1234 opened this issue Nov 20, 2022 · 0 comments
Open

Pretraining-Mask sentences #23

Ronica1234 opened this issue Nov 20, 2022 · 0 comments

Comments

@Ronica1234
Copy link

Hello,
In the primera pretrain process. The model choose 30% of the sentences by pyramid methods and then 50% of the candidates (15% of the sentences) will be mask while all 30% will be kept as the target. May I know why the 15% masked sentences will not be inputted in the target?

for i_d in range(len(truncated_doc)):
for i_s in range(len(truncated_doc[i_d])):
if cur_idx in mask_indices:
tgt.append(truncated_doc[i_d][i_s])
# here is the line which choose 50% percent of the candidates (30% percent of sentences) for masking
if cur_idx not in non_mask_indices:
truncated_doc[i_d][i_s] = ''#tokenizer.mask_token
cur_idx += 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant