Skip to content

Commit

Permalink
fix pad id bug (#4377)
Browse files Browse the repository at this point in the history
Signed-off-by: Yi Dong <yidong@nvidia.com>
  • Loading branch information
yidong72 authored and ericharper committed Jun 17, 2022
1 parent 220e96f commit 1677b76
Showing 1 changed file with 3 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,8 @@ def main():

encoder = Encoder(args)

if args.dataset_impl == 'retmmap':
assert args.need_pad_id, "retmmap need --need_pad_id flag"
tokenizer = get_tokenizer(args)

level = "document"
Expand All @@ -304,7 +306,7 @@ def main():
output_bin_files[key],
impl=args.dataset_impl,
chunk_size=args.chunk_size,
pad_id=tokenizer.pad_id,
pad_id=tokenizer.pad_id if hasattr(tokenizer, "pad_id") else 0,
retrieval_db=args.retrieval_db,
vocab_size=tokenizer.vocab_size,
)
Expand Down

0 comments on commit 1677b76

Please sign in to comment.