(gpt17) @-SAPA1F:~$ cd '/home//Desktop/gpt-2-tensorflow2.0' (gpt17) @-SAPA1F:~/Desktop/gpt-2-tensorflow2.0$ python pre_process.py --data-dir="/home//Desktop/gpt-2-tensorflow2.0/data" --vocab-size=32000 Pre-processing the text data..... 0it [00:00, ?it/s] Training BytePair encoding...... 0it [00:00, ?it/s] sentencepiece_trainer.cc(160) LOG(INFO) Running command: --input=/home//Desktop/gpt-2-tensorflow2.0/data/bpe_spm.tsv --model_prefix=/home//Desktop/gpt-2-tensorflow2.0/data/bpe_model --input_format=tsv --vocab_size=32000 --user_defined_symbols=[SEP],[BOS],[EOS] --hard_vocab_limit=false --model_type=bpe --pad_id=0 --unk_id=1 --bos_id=-1 --eos_id=-1 --pad_piece=[PAD] --unk_piece=[UNK] sentencepiece_trainer.cc(73) LOG(INFO) Starts training with : trainer_spec { input: /home//Desktop/gpt-2-tensorflow2.0/data/bpe_spm.tsv input_format: tsv model_prefix: /home//Desktop/gpt-2-tensorflow2.0/data/bpe_model model_type: BPE vocab_size: 32000 self_test_sample_size: 0 character_coverage: 0.9995 input_sentence_size: 0 shuffle_input_sentence: 1 seed_sentencepiece_size: 1000000 shrinking_factor: 0.75 max_sentence_length: 4192 num_threads: 16 num_sub_iterations: 2 max_sentencepiece_length: 16 split_by_unicode_script: 1 split_by_number: 1 split_by_whitespace: 1 split_digits: 0 treat_whitespace_as_suffix: 0 user_defined_symbols: [SEP] user_defined_symbols: [BOS] user_defined_symbols: [EOS] required_chars: byte_fallback: 0 vocabulary_output_piece_score: 1 train_extremely_large_corpus: 0 hard_vocab_limit: 0 use_all_vocab: 0 unk_id: 1 bos_id: -1 eos_id: -1 pad_id: 0 unk_piece: [UNK] bos_piece: eos_piece: pad_piece: [PAD] unk_surface: ⁇ } normalizer_spec { name: nmt_nfkc add_dummy_prefix: 1 remove_extra_whitespaces: 1 escape_whitespaces: 1 normalization_rule_tsv: } denormalizer_spec {} trainer_interface.cc(325) LOG(INFO) SentenceIterator is not specified. Using MultiFileSentenceIterator. trainer_interface.cc(186) LOG(INFO) Loading corpus: /home//Desktop/gpt-2-tensorflow2.0/data/bpe_spm.tsv trainer_interface.cc(381) LOG(INFO) Loaded all 0 sentences trainer_interface.cc(396) LOG(INFO) Adding meta_piece: [PAD] trainer_interface.cc(396) LOG(INFO) Adding meta_piece: [UNK] trainer_interface.cc(396) LOG(INFO) Adding meta_piece: [SEP] trainer_interface.cc(396) LOG(INFO) Adding meta_piece: [BOS] trainer_interface.cc(396) LOG(INFO) Adding meta_piece: [EOS] trainer_interface.cc(401) LOG(INFO) Normalizing sentences... Traceback (most recent call last): File "pre_process.py", line 105, in train() File "/home//anaconda3/envs/gpt17/lib/python3.7/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/home//anaconda3/envs/gpt17/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home//anaconda3/envs/gpt17/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home//anaconda3/envs/gpt17/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "pre_process.py", line 99, in train train_byte_pair_encoding(vocab_size) File "pre_process.py", line 47, in train_byte_pair_encoding spm.SentencePieceTrainer.train(spmcmd) File "/home//anaconda3/envs/gpt17/lib/python3.7/site-packages/sentencepiece.py", line 389, in _sentencepiece_trainer_train return SentencePieceTrainer.TrainFromString(arg) File "/home//anaconda3/envs/gpt17/lib/python3.7/site-packages/sentencepiece.py", line 197, in TrainFromString return _sentencepiece.SentencePieceTrainer_TrainFromString(arg) RuntimeError: Internal: /sentencepiece/src/trainer_interface.cc(402) [!sentences_.empty()]