New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The latest version have a bug when execute the train.py file. #549

Closed
caozhen-alex opened this Issue Feb 1, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@caozhen-alex

caozhen-alex commented Feb 1, 2018

Traceback (most recent call last):
  File "train.py", line 403, in <module>
    main()
  File "train.py", line 381, in main
    first_dataset = next(lazily_load_dataset("train"))
  File "train.py", line 296, in lazily_load_dataset
    yield lazy_dataset_loader(pt, corpus_type)
  File "train.py", line 283, in lazy_dataset_loader
    dataset = torch.load(pt_file)
  File "/home/admusr/anaconda2/envs/python3_pengfei/lib/python3.6/site-packages/torch/serialization.py", line 261, in load
    return _load(f, map_location, pickle_module)
  File "/home/admusr/anaconda2/envs/python3_pengfei/lib/python3.6/site-packages/torch/serialization.py", line 409, in _load
    result = unpickler.load()
ModuleNotFoundError: No module named 'onmt.IO'
@sebastianGehrmann

This comment has been minimized.

Collaborator

sebastianGehrmann commented Feb 1, 2018

Can you check that you ran the preprocessing with the latest version of the code?

onmt.IO was refactored into a onmt.io package recently. This indicates that your preprocessed dataset is too old.

@caozhen-alex

This comment has been minimized.

caozhen-alex commented Feb 2, 2018

@caozhen-alex

This comment has been minimized.

caozhen-alex commented Feb 2, 2018

@sebastianGehrmann It occured another error when I try to run the preprocessing.py.

Extracting features...
 * number of source features: 0.
 * number of target features: 0.
Building `Fields` object...
Building & saving training data...
Warning. The corpus ../archive/text_data/src-train.txt is larger than 10M bytes, you can set '-max_shard_size' to process it by small shards to use less memory.
 * saving train data shard to ../archive/data20180202/v0.train.1.pt.
Building & saving vocabulary...
 * reloading ../archive/data20180202/v0.train.1.pt.
Traceback (most recent call last):
  File "preprocess.py", line 193, in <module>
    main()
  File "preprocess.py", line 186, in main
    build_save_vocab(train_dataset_files, fields, opt)
  File "preprocess.py", line 163, in build_save_vocab
    opt.tgt_words_min_frequency)
  File "/data1/caozhen/OpenNMT-py20180202/onmt/io/IO.py", line 263, in build_vocab
    min_freq=tgt_words_min_frequency)
  File "/data1/caozhen/OpenNMT-py20180202/onmt/io/IO.py", line 222, in _build_field_vocab
    tok for tok in [field.unk_token, field.pad_token, field.init_token,
AttributeError: 'Field' object has no attribute 'unk_token'
@caozhen-alex

This comment has been minimized.

caozhen-alex commented Feb 2, 2018

@sebastianGehrmann Got it. Should update the dependency torchtext.

@JianyuZhan JianyuZhan closed this Feb 2, 2018

@PR-Iyyer

This comment has been minimized.

PR-Iyyer commented Mar 22, 2018

my torchtext version is 0.2.1 and still error persists :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment