Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got error when running data_prepro_clean command #11

Open
chris-opendata opened this issue Jan 24, 2024 · 0 comments
Open

Got error when running data_prepro_clean command #11

chris-opendata opened this issue Jan 24, 2024 · 0 comments

Comments

@chris-opendata
Copy link

Thanks for your great work.
Following the steps, I am up to running the following command for CNNDM datasets,
python data_prepro_clean.py --mode bpe_binarize --input_dir <my_processed-data-dir> --tokenizer_dir <my_bpe-dir>
but got the following error,

Traceback (most recent call last):
File "../fairseq_cli/preprocess.py", line 452, in
cli_main()
File "../fairseq_cli/preprocess.py", line 448, in cli_main
main(args)
File "../fairseq_cli/preprocess.py", line 331, in main
make_all(args.source_lang, src_dict)
File "../fairseq_cli/preprocess.py", line 301, in make_all
make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers)
File "../fairseq_cli/preprocess.py", line 297, in make_dataset
make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
File "../fairseq_cli/preprocess.py", line 173, in make_binary_dataset
100 * sum(replaced.values()) / n_seq_tok[1],
ZeroDivisionError: division by zero

Could you please help? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant