Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange results #8

Open
Eurus-Holmes opened this issue Dec 19, 2019 · 4 comments
Open

strange results #8

Eurus-Holmes opened this issue Dec 19, 2019 · 4 comments
Labels
help wanted Extra attention is needed Todo

Comments

@Eurus-Holmes
Copy link
Owner

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
data_type:  text
 * vocabulary size. source = 8505; target = 9391
Building model...
Initializing model parameters.
NMTImgDModel(
  (encoder): RNNEncoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(8505, 500, padding_idx=1)
        )
      )
    )
    (rnn): LSTM(500, 500, num_layers=2, dropout=0.3)
  )
  (decoder): InputFeedRNNDecoder(
    (embeddings): Embeddings(
      (make_embedding): Sequential(
        (emb_luts): Elementwise(
          (0): Embedding(9391, 500, padding_idx=1)
        )
      )
    )
    (dropout): Dropout(p=0.3, inplace=False)
    (rnn): StackedLSTM(
      (dropout): Dropout(p=0.3, inplace=False)
      (layers): ModuleList(
        (0): LSTMCell(1000, 500)
        (1): LSTMCell(500, 500)
      )
    )
    (attn): GlobalAttention(
      (linear_in): Linear(in_features=500, out_features=500, bias=False)
      (linear_out): Linear(in_features=1000, out_features=500, bias=False)
      (sm): Softmax(dim=None)
      (tanh): Tanh()
    )
  )
  (encoder_images): ImageGlobalFeaturesProjector(
    (layers): Sequential(
      (0): Linear(in_features=4096, out_features=4096, bias=True)
      (1): Tanh()
      (2): Dropout(p=0.5, inplace=False)
      (3): Linear(in_features=4096, out_features=1000, bias=True)
      (4): Tanh()
      (5): Dropout(p=0.5, inplace=False)
    )
  )
  (generator): Sequential(
    (0): Linear(in_features=500, out_features=9391, bias=True)
    (1): LogSoftmax()
  )
)
* number of parameters: 44297203
encoder:  29138812
decoder:  15158391
Making optimizer for training.
/dfsdata2/chenfy7_data/conda3/lib/python3.6/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))

Start training...
 * number of epochs: 25, starting from Epoch 1
 * batch size: 40

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
/dfsdata2/chenfy7_data/MNMT/onmt/modules/GlobalAttention.py:177: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  align_vectors = self.sm(align.view(batch*targetL, sourceL))
/dfsdata2/chenfy7_data/conda3/lib/python3.6/site-packages/torch/nn/modules/container.py:92: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  input = module(input)
/dfsdata2/chenfy7_data/MNMT/onmt/Optim.py:95: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  clip_grad_norm(self.params, self.max_grad_norm)
Epoch  1,    50/  725; acc:   0.00; ppl: 421.24; 5036 src tok/s; 5457 tgt tok/s;      5 s elapsed
Epoch  1,   100/  725; acc:   0.00; ppl: 149.96; 5092 src tok/s; 5500 tgt tok/s;     10 s elapsed
Epoch  1,   150/  725; acc:   0.00; ppl: 101.28; 5285 src tok/s; 5699 tgt tok/s;     15 s elapsed
Epoch  1,   200/  725; acc:   0.00; ppl:  82.19; 5596 src tok/s; 6056 tgt tok/s;     20 s elapsed
Epoch  1,   250/  725; acc:   0.00; ppl:  77.47; 5341 src tok/s; 5798 tgt tok/s;     25 s elapsed
Epoch  1,   300/  725; acc:   0.00; ppl:  62.96; 5573 src tok/s; 6004 tgt tok/s;     30 s elapsed
Epoch  1,   350/  725; acc:   0.00; ppl:  58.73; 5316 src tok/s; 5765 tgt tok/s;     35 s elapsed
Epoch  1,   400/  725; acc:   0.00; ppl:  54.37; 5310 src tok/s; 5705 tgt tok/s;     40 s elapsed
Epoch  1,   450/  725; acc:   0.00; ppl:  49.82; 5303 src tok/s; 5758 tgt tok/s;     45 s elapsed
Epoch  1,   500/  725; acc:   0.00; ppl:  46.13; 5421 src tok/s; 5881 tgt tok/s;     50 s elapsed
Epoch  1,   550/  725; acc:   0.00; ppl:  42.84; 5575 src tok/s; 6056 tgt tok/s;     54 s elapsed
Epoch  1,   600/  725; acc:   0.00; ppl:  39.20; 5142 src tok/s; 5572 tgt tok/s;     60 s elapsed
Epoch  1,   650/  725; acc:   0.00; ppl:  36.36; 5238 src tok/s; 5683 tgt tok/s;     65 s elapsed
Epoch  1,   700/  725; acc:   0.00; ppl:  30.90; 5047 src tok/s; 5458 tgt tok/s;     70 s elapsed
Train perplexity: 64.8512
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
/dfsdata2/chenfy7_data/conda3/lib/python3.6/site-packages/torchtext/data/field.py:321: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  return Variable(arr, volatile=not train), lengths
/dfsdata2/chenfy7_data/conda3/lib/python3.6/site-packages/torchtext/data/field.py:322: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  return Variable(arr, volatile=not train)
Validation perplexity: 27.3178
Validation accuracy: 0

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  2,    50/  725; acc:   0.00; ppl:  25.04; 5149 src tok/s; 5579 tgt tok/s;      5 s elapsed
Epoch  2,   100/  725; acc:   0.00; ppl:  22.29; 5128 src tok/s; 5539 tgt tok/s;     10 s elapsed
Epoch  2,   150/  725; acc:   0.00; ppl:  20.06; 5446 src tok/s; 5873 tgt tok/s;     15 s elapsed
Epoch  2,   200/  725; acc:   0.00; ppl:  18.54; 5268 src tok/s; 5701 tgt tok/s;     20 s elapsed
Epoch  2,   250/  725; acc:   0.00; ppl:  18.69; 5379 src tok/s; 5839 tgt tok/s;     25 s elapsed
Epoch  2,   300/  725; acc:   0.00; ppl:  15.57; 5300 src tok/s; 5710 tgt tok/s;     30 s elapsed
Epoch  2,   350/  725; acc:   0.00; ppl:  15.18; 5455 src tok/s; 5915 tgt tok/s;     35 s elapsed
Epoch  2,   400/  725; acc:   0.00; ppl:  14.30; 5305 src tok/s; 5699 tgt tok/s;     40 s elapsed
Epoch  2,   450/  725; acc:   0.00; ppl:  13.23; 5069 src tok/s; 5504 tgt tok/s;     45 s elapsed
Epoch  2,   500/  725; acc:   0.00; ppl:  12.61; 5096 src tok/s; 5528 tgt tok/s;     50 s elapsed
Epoch  2,   550/  725; acc:   0.00; ppl:  11.85; 5187 src tok/s; 5634 tgt tok/s;     55 s elapsed
Epoch  2,   600/  725; acc:   0.00; ppl:  11.33; 5534 src tok/s; 5997 tgt tok/s;     60 s elapsed
Epoch  2,   650/  725; acc:   0.00; ppl:  11.10; 5453 src tok/s; 5916 tgt tok/s;     65 s elapsed
Epoch  2,   700/  725; acc:   0.00; ppl:  10.31; 5093 src tok/s; 5507 tgt tok/s;     70 s elapsed
Train perplexity: 14.9386
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 11.5119
Validation accuracy: 0

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  3,    50/  725; acc:   0.00; ppl:   9.78; 5141 src tok/s; 5571 tgt tok/s;      5 s elapsed
Epoch  3,   100/  725; acc:   0.00; ppl:   9.31; 5288 src tok/s; 5712 tgt tok/s;     10 s elapsed
Epoch  3,   150/  725; acc:   0.00; ppl:   8.81; 5408 src tok/s; 5831 tgt tok/s;     15 s elapsed
Epoch  3,   200/  725; acc:   0.00; ppl:   8.67; 5708 src tok/s; 6178 tgt tok/s;     20 s elapsed
Epoch  3,   250/  725; acc:   0.00; ppl:   8.96; 5487 src tok/s; 5957 tgt tok/s;     25 s elapsed
Epoch  3,   300/  725; acc:   0.00; ppl:   7.80; 5382 src tok/s; 5799 tgt tok/s;     29 s elapsed
Epoch  3,   350/  725; acc:   0.00; ppl:   7.89; 5221 src tok/s; 5662 tgt tok/s;     34 s elapsed
Epoch  3,   400/  725; acc:   0.00; ppl:   7.61; 5490 src tok/s; 5898 tgt tok/s;     39 s elapsed
Epoch  3,   450/  725; acc:   0.00; ppl:   7.44; 5269 src tok/s; 5721 tgt tok/s;     44 s elapsed
Epoch  3,   500/  725; acc:   0.00; ppl:   7.27; 5481 src tok/s; 5946 tgt tok/s;     49 s elapsed
Epoch  3,   550/  725; acc:   0.00; ppl:   7.05; 5103 src tok/s; 5542 tgt tok/s;     54 s elapsed
Epoch  3,   600/  725; acc:   0.00; ppl:   6.80; 5260 src tok/s; 5700 tgt tok/s;     59 s elapsed
Epoch  3,   650/  725; acc:   0.00; ppl:   6.78; 5243 src tok/s; 5688 tgt tok/s;     64 s elapsed
Epoch  3,   700/  725; acc:   0.00; ppl:   6.41; 5211 src tok/s; 5635 tgt tok/s;     69 s elapsed
Train perplexity: 7.78145
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 9.07705
Validation accuracy: 0

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  4,    50/  725; acc:   0.00; ppl:   6.26; 5438 src tok/s; 5893 tgt tok/s;      5 s elapsed
Epoch  4,   100/  725; acc:   0.00; ppl:   6.09; 5231 src tok/s; 5650 tgt tok/s;     10 s elapsed
Epoch  4,   150/  725; acc:   0.00; ppl:   5.85; 5716 src tok/s; 6163 tgt tok/s;     15 s elapsed
Epoch  4,   200/  725; acc:   0.00; ppl:   5.92; 5447 src tok/s; 5895 tgt tok/s;     19 s elapsed
Epoch  4,   250/  725; acc:   0.00; ppl:   6.08; 5250 src tok/s; 5699 tgt tok/s;     24 s elapsed
Epoch  4,   300/  725; acc:   0.00; ppl:   5.43; 5182 src tok/s; 5583 tgt tok/s;     30 s elapsed
Epoch  4,   350/  725; acc:   0.00; ppl:   5.49; 5893 src tok/s; 6389 tgt tok/s;     34 s elapsed
Epoch  4,   400/  725; acc:   0.00; ppl:   5.37; 5611 src tok/s; 6028 tgt tok/s;     39 s elapsed
Epoch  4,   450/  725; acc:   0.00; ppl:   5.37; 5325 src tok/s; 5781 tgt tok/s;     44 s elapsed
Epoch  4,   500/  725; acc:   0.00; ppl:   5.19; 5286 src tok/s; 5735 tgt tok/s;     49 s elapsed
Epoch  4,   550/  725; acc:   0.00; ppl:   5.14; 5118 src tok/s; 5559 tgt tok/s;     54 s elapsed
Epoch  4,   600/  725; acc:   0.00; ppl:   5.05; 5158 src tok/s; 5590 tgt tok/s;     59 s elapsed
Epoch  4,   650/  725; acc:   0.00; ppl:   5.06; 5442 src tok/s; 5904 tgt tok/s;     64 s elapsed
Epoch  4,   700/  725; acc:   0.00; ppl:   4.89; 4935 src tok/s; 5337 tgt tok/s;     69 s elapsed
Train perplexity: 5.47586
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.41383
Validation accuracy: 0

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  5,    50/  725; acc:   0.00; ppl:   4.73; 5564 src tok/s; 6029 tgt tok/s;      5 s elapsed
Epoch  5,   100/  725; acc:   0.00; ppl:   4.73; 5213 src tok/s; 5631 tgt tok/s;     10 s elapsed
Epoch  5,   150/  725; acc:   0.00; ppl:   4.53; 5576 src tok/s; 6012 tgt tok/s;     15 s elapsed
Epoch  5,   200/  725; acc:   0.00; ppl:   4.58; 5333 src tok/s; 5772 tgt tok/s;     20 s elapsed
Epoch  5,   250/  725; acc:   0.00; ppl:   4.73; 5440 src tok/s; 5905 tgt tok/s;     24 s elapsed
Epoch  5,   300/  725; acc:   0.00; ppl:   4.29; 5318 src tok/s; 5730 tgt tok/s;     29 s elapsed
Epoch  5,   350/  725; acc:   0.00; ppl:   4.33; 5656 src tok/s; 6133 tgt tok/s;     34 s elapsed
Epoch  5,   400/  725; acc:   0.00; ppl:   4.23; 5325 src tok/s; 5720 tgt tok/s;     39 s elapsed
Epoch  5,   450/  725; acc:   0.00; ppl:   4.27; 5365 src tok/s; 5825 tgt tok/s;     44 s elapsed
Epoch  5,   500/  725; acc:   0.00; ppl:   4.17; 5143 src tok/s; 5580 tgt tok/s;     49 s elapsed
Epoch  5,   550/  725; acc:   0.00; ppl:   4.14; 5360 src tok/s; 5822 tgt tok/s;     54 s elapsed
Epoch  5,   600/  725; acc:   0.00; ppl:   4.10; 5298 src tok/s; 5741 tgt tok/s;     59 s elapsed
Epoch  5,   650/  725; acc:   0.00; ppl:   4.11; 5345 src tok/s; 5799 tgt tok/s;     64 s elapsed
Epoch  5,   700/  725; acc:   0.00; ppl:   3.99; 5077 src tok/s; 5490 tgt tok/s;     69 s elapsed
Train perplexity: 4.33262
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.39225
Validation accuracy: 0

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  6,    50/  725; acc:   0.00; ppl:   3.90; 5281 src tok/s; 5723 tgt tok/s;      5 s elapsed
Epoch  6,   100/  725; acc:   0.00; ppl:   3.85; 5220 src tok/s; 5639 tgt tok/s;     10 s elapsed
Epoch  6,   150/  725; acc:   0.00; ppl:   3.78; 5547 src tok/s; 5981 tgt tok/s;     15 s elapsed
Epoch  6,   200/  725; acc:   0.00; ppl:   3.82; 5269 src tok/s; 5703 tgt tok/s;     20 s elapsed
Epoch  6,   250/  725; acc:   0.00; ppl:   3.93; 5132 src tok/s; 5571 tgt tok/s;     25 s elapsed
Epoch  6,   300/  725; acc:   0.00; ppl:   3.61; 5344 src tok/s; 5757 tgt tok/s;     30 s elapsed
Epoch  6,   350/  725; acc:   0.00; ppl:   3.65; 5139 src tok/s; 5573 tgt tok/s;     35 s elapsed
Epoch  6,   400/  725; acc:   0.00; ppl:   3.62; 5472 src tok/s; 5878 tgt tok/s;     40 s elapsed
Epoch  6,   450/  725; acc:   0.00; ppl:   3.58; 5403 src tok/s; 5866 tgt tok/s;     45 s elapsed
Epoch  6,   500/  725; acc:   0.00; ppl:   3.51; 5167 src tok/s; 5605 tgt tok/s;     50 s elapsed
Epoch  6,   550/  725; acc:   0.00; ppl:   3.53; 5178 src tok/s; 5624 tgt tok/s;     55 s elapsed
Epoch  6,   600/  725; acc:   0.00; ppl:   3.51; 5294 src tok/s; 5737 tgt tok/s;     60 s elapsed
Epoch  6,   650/  725; acc:   0.00; ppl:   3.54; 5270 src tok/s; 5717 tgt tok/s;     65 s elapsed
Epoch  6,   700/  725; acc:   0.00; ppl:   3.41; 4914 src tok/s; 5314 tgt tok/s;     70 s elapsed
Train perplexity: 3.64816
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.53908
Validation accuracy: 0
Decaying learning rate to 0.001

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  7,    50/  725; acc:   0.00; ppl:   3.28; 5155 src tok/s; 5586 tgt tok/s;      5 s elapsed
Epoch  7,   100/  725; acc:   0.00; ppl:   3.19; 5119 src tok/s; 5530 tgt tok/s;     10 s elapsed
Epoch  7,   150/  725; acc:   0.00; ppl:   3.05; 5320 src tok/s; 5737 tgt tok/s;     15 s elapsed
Epoch  7,   200/  725; acc:   0.00; ppl:   3.00; 5401 src tok/s; 5846 tgt tok/s;     20 s elapsed
Epoch  7,   250/  725; acc:   0.00; ppl:   3.07; 5447 src tok/s; 5913 tgt tok/s;     25 s elapsed
Epoch  7,   300/  725; acc:   0.00; ppl:   2.76; 5264 src tok/s; 5672 tgt tok/s;     30 s elapsed
Epoch  7,   350/  725; acc:   0.00; ppl:   2.76; 5201 src tok/s; 5640 tgt tok/s;     35 s elapsed
Epoch  7,   400/  725; acc:   0.00; ppl:   2.70; 5325 src tok/s; 5721 tgt tok/s;     40 s elapsed
Epoch  7,   450/  725; acc:   0.00; ppl:   2.66; 5380 src tok/s; 5841 tgt tok/s;     45 s elapsed
Epoch  7,   500/  725; acc:   0.00; ppl:   2.59; 5292 src tok/s; 5741 tgt tok/s;     50 s elapsed
Epoch  7,   550/  725; acc:   0.00; ppl:   2.56; 5317 src tok/s; 5775 tgt tok/s;     55 s elapsed
Epoch  7,   600/  725; acc:   0.00; ppl:   2.51; 5388 src tok/s; 5839 tgt tok/s;     60 s elapsed
Epoch  7,   650/  725; acc:   0.00; ppl:   2.51; 5446 src tok/s; 5909 tgt tok/s;     65 s elapsed
Epoch  7,   700/  725; acc:   0.00; ppl:   2.41; 5269 src tok/s; 5698 tgt tok/s;     70 s elapsed
Train perplexity: 2.76429
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.15295
Validation accuracy: 0
Decaying learning rate to 0.0005

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  8,    50/  725; acc:   0.00; ppl:   2.63; 5286 src tok/s; 5727 tgt tok/s;      5 s elapsed
Epoch  8,   100/  725; acc:   0.00; ppl:   2.55; 5278 src tok/s; 5702 tgt tok/s;     10 s elapsed
Epoch  8,   150/  725; acc:   0.00; ppl:   2.48; 5578 src tok/s; 6014 tgt tok/s;     15 s elapsed
Epoch  8,   200/  725; acc:   0.00; ppl:   2.46; 5386 src tok/s; 5830 tgt tok/s;     20 s elapsed
Epoch  8,   250/  725; acc:   0.00; ppl:   2.47; 5456 src tok/s; 5922 tgt tok/s;     25 s elapsed
Epoch  8,   300/  725; acc:   0.00; ppl:   2.27; 5120 src tok/s; 5516 tgt tok/s;     30 s elapsed
Epoch  8,   350/  725; acc:   0.00; ppl:   2.24; 5330 src tok/s; 5780 tgt tok/s;     35 s elapsed
Epoch  8,   400/  725; acc:   0.00; ppl:   2.20; 5472 src tok/s; 5879 tgt tok/s;     39 s elapsed
Epoch  8,   450/  725; acc:   0.00; ppl:   2.16; 5367 src tok/s; 5827 tgt tok/s;     44 s elapsed
Epoch  8,   500/  725; acc:   0.00; ppl:   2.11; 5273 src tok/s; 5721 tgt tok/s;     49 s elapsed
Epoch  8,   550/  725; acc:   0.00; ppl:   2.08; 5289 src tok/s; 5744 tgt tok/s;     54 s elapsed
Epoch  8,   600/  725; acc:   0.00; ppl:   2.03; 5217 src tok/s; 5654 tgt tok/s;     59 s elapsed
Epoch  8,   650/  725; acc:   0.00; ppl:   2.02; 5391 src tok/s; 5849 tgt tok/s;     64 s elapsed
Epoch  8,   700/  725; acc:   0.00; ppl:   1.97; 5387 src tok/s; 5826 tgt tok/s;     69 s elapsed
Train perplexity: 2.24574
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.17659
Validation accuracy: 0
Decaying learning rate to 0.00025

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch  9,    50/  725; acc:   0.00; ppl:   2.31; 5510 src tok/s; 5971 tgt tok/s;      5 s elapsed
Epoch  9,   100/  725; acc:   0.00; ppl:   2.26; 5329 src tok/s; 5756 tgt tok/s;     10 s elapsed
Epoch  9,   150/  725; acc:   0.00; ppl:   2.19; 5632 src tok/s; 6073 tgt tok/s;     14 s elapsed
Epoch  9,   200/  725; acc:   0.00; ppl:   2.16; 5619 src tok/s; 6081 tgt tok/s;     19 s elapsed
Epoch  9,   250/  725; acc:   0.00; ppl:   2.21; 5275 src tok/s; 5726 tgt tok/s;     24 s elapsed
Epoch  9,   300/  725; acc:   0.00; ppl:   2.02; 5229 src tok/s; 5633 tgt tok/s;     29 s elapsed
Epoch  9,   350/  725; acc:   0.00; ppl:   2.01; 5245 src tok/s; 5688 tgt tok/s;     34 s elapsed
Epoch  9,   400/  725; acc:   0.00; ppl:   1.96; 5111 src tok/s; 5491 tgt tok/s;     39 s elapsed
Epoch  9,   450/  725; acc:   0.00; ppl:   1.95; 5390 src tok/s; 5852 tgt tok/s;     44 s elapsed
Epoch  9,   500/  725; acc:   0.00; ppl:   1.90; 5307 src tok/s; 5757 tgt tok/s;     49 s elapsed
Epoch  9,   550/  725; acc:   0.00; ppl:   1.87; 5386 src tok/s; 5850 tgt tok/s;     54 s elapsed
Epoch  9,   600/  725; acc:   0.00; ppl:   1.83; 5381 src tok/s; 5831 tgt tok/s;     59 s elapsed
Epoch  9,   650/  725; acc:   0.00; ppl:   1.84; 5306 src tok/s; 5757 tgt tok/s;     64 s elapsed
Epoch  9,   700/  725; acc:   0.00; ppl:   1.77; 5171 src tok/s; 5592 tgt tok/s;     69 s elapsed
Train perplexity: 2.00746
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.3781
Validation accuracy: 0
Decaying learning rate to 0.000125

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 10,    50/  725; acc:   0.00; ppl:   2.15; 5385 src tok/s; 5835 tgt tok/s;      5 s elapsed
Epoch 10,   100/  725; acc:   0.00; ppl:   2.10; 5346 src tok/s; 5775 tgt tok/s;     10 s elapsed
Epoch 10,   150/  725; acc:   0.00; ppl:   2.04; 5245 src tok/s; 5656 tgt tok/s;     15 s elapsed
Epoch 10,   200/  725; acc:   0.00; ppl:   2.03; 5436 src tok/s; 5883 tgt tok/s;     20 s elapsed
Epoch 10,   250/  725; acc:   0.00; ppl:   2.04; 5234 src tok/s; 5682 tgt tok/s;     25 s elapsed
Epoch 10,   300/  725; acc:   0.00; ppl:   1.92; 5579 src tok/s; 6010 tgt tok/s;     30 s elapsed
Epoch 10,   350/  725; acc:   0.00; ppl:   1.90; 5671 src tok/s; 6149 tgt tok/s;     34 s elapsed
Epoch 10,   400/  725; acc:   0.00; ppl:   1.86; 6365 src tok/s; 6838 tgt tok/s;     38 s elapsed
Epoch 10,   450/  725; acc:   0.00; ppl:   1.83; 6295 src tok/s; 6835 tgt tok/s;     43 s elapsed
Epoch 10,   500/  725; acc:   0.00; ppl:   1.79; 5816 src tok/s; 6310 tgt tok/s;     47 s elapsed
Epoch 10,   550/  725; acc:   0.00; ppl:   1.76; 6279 src tok/s; 6819 tgt tok/s;     51 s elapsed
Epoch 10,   600/  725; acc:   0.00; ppl:   1.73; 6309 src tok/s; 6836 tgt tok/s;     56 s elapsed
Epoch 10,   650/  725; acc:   0.00; ppl:   1.73; 6193 src tok/s; 6719 tgt tok/s;     60 s elapsed
Epoch 10,   700/  725; acc:   0.00; ppl:   1.69; 6289 src tok/s; 6801 tgt tok/s;     64 s elapsed
Train perplexity: 1.88806
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.51417
Validation accuracy: 0
Decaying learning rate to 6.25e-05

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 11,    50/  725; acc:   0.00; ppl:   2.08; 6320 src tok/s; 6848 tgt tok/s;      4 s elapsed
Epoch 11,   100/  725; acc:   0.00; ppl:   2.04; 6055 src tok/s; 6541 tgt tok/s;      9 s elapsed
Epoch 11,   150/  725; acc:   0.00; ppl:   1.99; 6561 src tok/s; 7074 tgt tok/s;     13 s elapsed
Epoch 11,   200/  725; acc:   0.00; ppl:   1.96; 6127 src tok/s; 6631 tgt tok/s;     17 s elapsed
Epoch 11,   250/  725; acc:   0.00; ppl:   1.99; 6317 src tok/s; 6857 tgt tok/s;     21 s elapsed
Epoch 11,   300/  725; acc:   0.00; ppl:   1.84; 6469 src tok/s; 6970 tgt tok/s;     25 s elapsed
Epoch 11,   350/  725; acc:   0.00; ppl:   1.84; 6415 src tok/s; 6955 tgt tok/s;     29 s elapsed
Epoch 11,   400/  725; acc:   0.00; ppl:   1.80; 7122 src tok/s; 7651 tgt tok/s;     33 s elapsed
Epoch 11,   450/  725; acc:   0.00; ppl:   1.78; 6481 src tok/s; 7037 tgt tok/s;     37 s elapsed
Epoch 11,   500/  725; acc:   0.00; ppl:   1.75; 6321 src tok/s; 6857 tgt tok/s;     41 s elapsed
Epoch 11,   550/  725; acc:   0.00; ppl:   1.72; 5820 src tok/s; 6322 tgt tok/s;     46 s elapsed
Epoch 11,   600/  725; acc:   0.00; ppl:   1.69; 4328 src tok/s; 4690 tgt tok/s;     52 s elapsed
Epoch 11,   650/  725; acc:   0.00; ppl:   1.68; 6184 src tok/s; 6709 tgt tok/s;     56 s elapsed
Epoch 11,   700/  725; acc:   0.00; ppl:   1.65; 6281 src tok/s; 6792 tgt tok/s;     60 s elapsed
Train perplexity: 1.83492
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.56904
Validation accuracy: 0
Decaying learning rate to 3.125e-05

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 12,    50/  725; acc:   0.00; ppl:   2.03; 5734 src tok/s; 6214 tgt tok/s;      5 s elapsed
Epoch 12,   100/  725; acc:   0.00; ppl:   2.00; 6023 src tok/s; 6506 tgt tok/s;      9 s elapsed
Epoch 12,   150/  725; acc:   0.00; ppl:   1.94; 6219 src tok/s; 6706 tgt tok/s;     13 s elapsed
Epoch 12,   200/  725; acc:   0.00; ppl:   1.93; 6473 src tok/s; 7006 tgt tok/s;     17 s elapsed
Epoch 12,   250/  725; acc:   0.00; ppl:   1.97; 6491 src tok/s; 7046 tgt tok/s;     21 s elapsed
Epoch 12,   300/  725; acc:   0.00; ppl:   1.82; 6115 src tok/s; 6589 tgt tok/s;     26 s elapsed
Epoch 12,   350/  725; acc:   0.00; ppl:   1.80; 6100 src tok/s; 6614 tgt tok/s;     30 s elapsed
Epoch 12,   400/  725; acc:   0.00; ppl:   1.77; 6169 src tok/s; 6627 tgt tok/s;     34 s elapsed
Epoch 12,   450/  725; acc:   0.00; ppl:   1.76; 5584 src tok/s; 6063 tgt tok/s;     39 s elapsed
Epoch 12,   500/  725; acc:   0.00; ppl:   1.72; 6201 src tok/s; 6727 tgt tok/s;     43 s elapsed
Epoch 12,   550/  725; acc:   0.00; ppl:   1.70; 6148 src tok/s; 6677 tgt tok/s;     48 s elapsed
Epoch 12,   600/  725; acc:   0.00; ppl:   1.66; 6318 src tok/s; 6847 tgt tok/s;     52 s elapsed
Epoch 12,   650/  725; acc:   0.00; ppl:   1.67; 6076 src tok/s; 6592 tgt tok/s;     56 s elapsed
Epoch 12,   700/  725; acc:   0.00; ppl:   1.63; 5976 src tok/s; 6463 tgt tok/s;     61 s elapsed
Train perplexity: 1.80639
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59157
Validation accuracy: 0
Decaying learning rate to 1.5625e-05

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 13,    50/  725; acc:   0.00; ppl:   2.02; 6126 src tok/s; 6639 tgt tok/s;      4 s elapsed
Epoch 13,   100/  725; acc:   0.00; ppl:   1.99; 5966 src tok/s; 6444 tgt tok/s;      9 s elapsed
Epoch 13,   150/  725; acc:   0.00; ppl:   1.93; 6454 src tok/s; 6959 tgt tok/s;     13 s elapsed
Epoch 13,   200/  725; acc:   0.00; ppl:   1.92; 5978 src tok/s; 6470 tgt tok/s;     17 s elapsed
Epoch 13,   250/  725; acc:   0.00; ppl:   1.93; 5988 src tok/s; 6500 tgt tok/s;     22 s elapsed
Epoch 13,   300/  725; acc:   0.00; ppl:   1.81; 6119 src tok/s; 6593 tgt tok/s;     26 s elapsed
Epoch 13,   350/  725; acc:   0.00; ppl:   1.79; 6386 src tok/s; 6924 tgt tok/s;     30 s elapsed
Epoch 13,   400/  725; acc:   0.00; ppl:   1.75; 6089 src tok/s; 6541 tgt tok/s;     35 s elapsed
Epoch 13,   450/  725; acc:   0.00; ppl:   1.74; 5801 src tok/s; 6299 tgt tok/s;     39 s elapsed
Epoch 13,   500/  725; acc:   0.00; ppl:   1.70; 5902 src tok/s; 6402 tgt tok/s;     44 s elapsed
Epoch 13,   550/  725; acc:   0.00; ppl:   1.69; 5470 src tok/s; 5941 tgt tok/s;     48 s elapsed
Epoch 13,   600/  725; acc:   0.00; ppl:   1.66; 6231 src tok/s; 6752 tgt tok/s;     53 s elapsed
Epoch 13,   650/  725; acc:   0.00; ppl:   1.64; 6455 src tok/s; 7003 tgt tok/s;     57 s elapsed
Epoch 13,   700/  725; acc:   0.00; ppl:   1.62; 5744 src tok/s; 6212 tgt tok/s;     61 s elapsed
Train perplexity: 1.79125
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59283
Validation accuracy: 0
Decaying learning rate to 7.8125e-06

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 14,    50/  725; acc:   0.00; ppl:   2.02; 6006 src tok/s; 6508 tgt tok/s;      4 s elapsed
Epoch 14,   100/  725; acc:   0.00; ppl:   1.97; 5757 src tok/s; 6219 tgt tok/s;      9 s elapsed
Epoch 14,   150/  725; acc:   0.00; ppl:   1.92; 5928 src tok/s; 6392 tgt tok/s;     13 s elapsed
Epoch 14,   200/  725; acc:   0.00; ppl:   1.90; 5921 src tok/s; 6409 tgt tok/s;     18 s elapsed
Epoch 14,   250/  725; acc:   0.00; ppl:   1.93; 6270 src tok/s; 6807 tgt tok/s;     22 s elapsed
Epoch 14,   300/  725; acc:   0.00; ppl:   1.80; 6367 src tok/s; 6859 tgt tok/s;     26 s elapsed
Epoch 14,   350/  725; acc:   0.00; ppl:   1.79; 5964 src tok/s; 6467 tgt tok/s;     31 s elapsed
Epoch 14,   400/  725; acc:   0.00; ppl:   1.75; 5902 src tok/s; 6341 tgt tok/s;     35 s elapsed
Epoch 14,   450/  725; acc:   0.00; ppl:   1.74; 5624 src tok/s; 6107 tgt tok/s;     40 s elapsed
Epoch 14,   500/  725; acc:   0.00; ppl:   1.70; 6130 src tok/s; 6651 tgt tok/s;     44 s elapsed
Epoch 14,   550/  725; acc:   0.00; ppl:   1.68; 6168 src tok/s; 6699 tgt tok/s;     49 s elapsed
Epoch 14,   600/  725; acc:   0.00; ppl:   1.65; 5788 src tok/s; 6272 tgt tok/s;     53 s elapsed
Epoch 14,   650/  725; acc:   0.00; ppl:   1.64; 5718 src tok/s; 6204 tgt tok/s;     58 s elapsed
Epoch 14,   700/  725; acc:   0.00; ppl:   1.61; 5715 src tok/s; 6180 tgt tok/s;     62 s elapsed
Train perplexity: 1.78543
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.5924
Validation accuracy: 0
Decaying learning rate to 3.90625e-06

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 15,    50/  725; acc:   0.00; ppl:   2.00; 5659 src tok/s; 6132 tgt tok/s;      5 s elapsed
Epoch 15,   100/  725; acc:   0.00; ppl:   1.98; 5518 src tok/s; 5961 tgt tok/s;      9 s elapsed
Epoch 15,   150/  725; acc:   0.00; ppl:   1.92; 5642 src tok/s; 6084 tgt tok/s;     14 s elapsed
Epoch 15,   200/  725; acc:   0.00; ppl:   1.90; 5665 src tok/s; 6132 tgt tok/s;     19 s elapsed
Epoch 15,   250/  725; acc:   0.00; ppl:   1.92; 6222 src tok/s; 6754 tgt tok/s;     23 s elapsed
Epoch 15,   300/  725; acc:   0.00; ppl:   1.80; 5778 src tok/s; 6225 tgt tok/s;     28 s elapsed
Epoch 15,   350/  725; acc:   0.00; ppl:   1.78; 5940 src tok/s; 6441 tgt tok/s;     32 s elapsed
Epoch 15,   400/  725; acc:   0.00; ppl:   1.73; 5475 src tok/s; 5882 tgt tok/s;     37 s elapsed
Epoch 15,   450/  725; acc:   0.00; ppl:   1.73; 5957 src tok/s; 6468 tgt tok/s;     41 s elapsed
Epoch 15,   500/  725; acc:   0.00; ppl:   1.71; 5978 src tok/s; 6485 tgt tok/s;     46 s elapsed
Epoch 15,   550/  725; acc:   0.00; ppl:   1.67; 5787 src tok/s; 6285 tgt tok/s;     50 s elapsed
Epoch 15,   600/  725; acc:   0.00; ppl:   1.65; 6175 src tok/s; 6691 tgt tok/s;     55 s elapsed
Epoch 15,   650/  725; acc:   0.00; ppl:   1.64; 5701 src tok/s; 6186 tgt tok/s;     59 s elapsed
Epoch 15,   700/  725; acc:   0.00; ppl:   1.61; 5538 src tok/s; 5989 tgt tok/s;     64 s elapsed
Train perplexity: 1.78114
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59565
Validation accuracy: 0
Decaying learning rate to 1.95313e-06

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 16,    50/  725; acc:   0.00; ppl:   2.00; 5991 src tok/s; 6492 tgt tok/s;      4 s elapsed
Epoch 16,   100/  725; acc:   0.00; ppl:   1.96; 5922 src tok/s; 6397 tgt tok/s;      9 s elapsed
Epoch 16,   150/  725; acc:   0.00; ppl:   1.91; 5990 src tok/s; 6459 tgt tok/s;     13 s elapsed
Epoch 16,   200/  725; acc:   0.00; ppl:   1.91; 6150 src tok/s; 6656 tgt tok/s;     18 s elapsed
Epoch 16,   250/  725; acc:   0.00; ppl:   1.91; 5961 src tok/s; 6471 tgt tok/s;     22 s elapsed
Epoch 16,   300/  725; acc:   0.00; ppl:   1.79; 6165 src tok/s; 6642 tgt tok/s;     26 s elapsed
Epoch 16,   350/  725; acc:   0.00; ppl:   1.79; 5773 src tok/s; 6260 tgt tok/s;     31 s elapsed
Epoch 16,   400/  725; acc:   0.00; ppl:   1.74; 5849 src tok/s; 6284 tgt tok/s;     35 s elapsed
Epoch 16,   450/  725; acc:   0.00; ppl:   1.72; 5945 src tok/s; 6454 tgt tok/s;     40 s elapsed
Epoch 16,   500/  725; acc:   0.00; ppl:   1.70; 6243 src tok/s; 6773 tgt tok/s;     44 s elapsed
Epoch 16,   550/  725; acc:   0.00; ppl:   1.68; 6404 src tok/s; 6955 tgt tok/s;     48 s elapsed
Epoch 16,   600/  725; acc:   0.00; ppl:   1.66; 6508 src tok/s; 7052 tgt tok/s;     52 s elapsed
Epoch 16,   650/  725; acc:   0.00; ppl:   1.63; 6310 src tok/s; 6846 tgt tok/s;     57 s elapsed
Epoch 16,   700/  725; acc:   0.00; ppl:   1.61; 6121 src tok/s; 6619 tgt tok/s;     61 s elapsed
Train perplexity: 1.77928
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59371
Validation accuracy: 0
Decaying learning rate to 9.76563e-07

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 17,    50/  725; acc:   0.00; ppl:   1.99; 5839 src tok/s; 6327 tgt tok/s;      4 s elapsed
Epoch 17,   100/  725; acc:   0.00; ppl:   1.97; 5559 src tok/s; 6005 tgt tok/s;      9 s elapsed
Epoch 17,   150/  725; acc:   0.00; ppl:   1.91; 6232 src tok/s; 6720 tgt tok/s;     14 s elapsed
Epoch 17,   200/  725; acc:   0.00; ppl:   1.90; 6063 src tok/s; 6562 tgt tok/s;     18 s elapsed
Epoch 17,   250/  725; acc:   0.00; ppl:   1.92; 6021 src tok/s; 6536 tgt tok/s;     22 s elapsed
Epoch 17,   300/  725; acc:   0.00; ppl:   1.78; 5835 src tok/s; 6287 tgt tok/s;     27 s elapsed
Epoch 17,   350/  725; acc:   0.00; ppl:   1.78; 6186 src tok/s; 6708 tgt tok/s;     31 s elapsed
Epoch 17,   400/  725; acc:   0.00; ppl:   1.74; 5985 src tok/s; 6430 tgt tok/s;     36 s elapsed
Epoch 17,   450/  725; acc:   0.00; ppl:   1.73; 6229 src tok/s; 6764 tgt tok/s;     40 s elapsed
Epoch 17,   500/  725; acc:   0.00; ppl:   1.69; 6177 src tok/s; 6701 tgt tok/s;     44 s elapsed
Epoch 17,   550/  725; acc:   0.00; ppl:   1.67; 5426 src tok/s; 5893 tgt tok/s;     49 s elapsed
Epoch 17,   600/  725; acc:   0.00; ppl:   1.65; 5722 src tok/s; 6200 tgt tok/s;     54 s elapsed
Epoch 17,   650/  725; acc:   0.00; ppl:   1.64; 6321 src tok/s; 6858 tgt tok/s;     58 s elapsed
Epoch 17,   700/  725; acc:   0.00; ppl:   1.61; 5479 src tok/s; 5925 tgt tok/s;     63 s elapsed
Train perplexity: 1.77762
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59454
Validation accuracy: 0
Decaying learning rate to 4.88281e-07

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 18,    50/  725; acc:   0.00; ppl:   2.00; 5913 src tok/s; 6408 tgt tok/s;      4 s elapsed
Epoch 18,   100/  725; acc:   0.00; ppl:   1.96; 5777 src tok/s; 6241 tgt tok/s;      9 s elapsed
Epoch 18,   150/  725; acc:   0.00; ppl:   1.92; 5771 src tok/s; 6223 tgt tok/s;     14 s elapsed
Epoch 18,   200/  725; acc:   0.00; ppl:   1.91; 5625 src tok/s; 6089 tgt tok/s;     18 s elapsed
Epoch 18,   250/  725; acc:   0.00; ppl:   1.92; 6000 src tok/s; 6513 tgt tok/s;     23 s elapsed
Epoch 18,   300/  725; acc:   0.00; ppl:   1.80; 6140 src tok/s; 6615 tgt tok/s;     27 s elapsed
Epoch 18,   350/  725; acc:   0.00; ppl:   1.78; 6331 src tok/s; 6865 tgt tok/s;     31 s elapsed
Epoch 18,   400/  725; acc:   0.00; ppl:   1.74; 6835 src tok/s; 7342 tgt tok/s;     35 s elapsed
Epoch 18,   450/  725; acc:   0.00; ppl:   1.72; 6173 src tok/s; 6702 tgt tok/s;     39 s elapsed
Epoch 18,   500/  725; acc:   0.00; ppl:   1.69; 5878 src tok/s; 6377 tgt tok/s;     44 s elapsed
Epoch 18,   550/  725; acc:   0.00; ppl:   1.67; 5697 src tok/s; 6188 tgt tok/s;     49 s elapsed
Epoch 18,   600/  725; acc:   0.00; ppl:   1.64; 5435 src tok/s; 5890 tgt tok/s;     53 s elapsed
Epoch 18,   650/  725; acc:   0.00; ppl:   1.64; 6010 src tok/s; 6520 tgt tok/s;     58 s elapsed
Epoch 18,   700/  725; acc:   0.00; ppl:   1.61; 6302 src tok/s; 6815 tgt tok/s;     62 s elapsed
Train perplexity: 1.77876
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59494
Validation accuracy: 0
Decaying learning rate to 2.44141e-07

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 19,    50/  725; acc:   0.00; ppl:   2.00; 6105 src tok/s; 6615 tgt tok/s;      4 s elapsed
Epoch 19,   100/  725; acc:   0.00; ppl:   1.97; 6110 src tok/s; 6600 tgt tok/s;      9 s elapsed
Epoch 19,   150/  725; acc:   0.00; ppl:   1.94; 6490 src tok/s; 6998 tgt tok/s;     13 s elapsed
Epoch 19,   200/  725; acc:   0.00; ppl:   1.90; 6469 src tok/s; 7002 tgt tok/s;     17 s elapsed
Epoch 19,   250/  725; acc:   0.00; ppl:   1.91; 6297 src tok/s; 6835 tgt tok/s;     21 s elapsed
Epoch 19,   300/  725; acc:   0.00; ppl:   1.78; 6506 src tok/s; 7010 tgt tok/s;     25 s elapsed
Epoch 19,   350/  725; acc:   0.00; ppl:   1.78; 5888 src tok/s; 6384 tgt tok/s;     30 s elapsed
Epoch 19,   400/  725; acc:   0.00; ppl:   1.74; 6198 src tok/s; 6658 tgt tok/s;     34 s elapsed
Epoch 19,   450/  725; acc:   0.00; ppl:   1.73; 6191 src tok/s; 6723 tgt tok/s;     38 s elapsed
Epoch 19,   500/  725; acc:   0.00; ppl:   1.69; 6314 src tok/s; 6850 tgt tok/s;     42 s elapsed
Epoch 19,   550/  725; acc:   0.00; ppl:   1.68; 6230 src tok/s; 6767 tgt tok/s;     47 s elapsed
Epoch 19,   600/  725; acc:   0.00; ppl:   1.64; 6079 src tok/s; 6588 tgt tok/s;     51 s elapsed
Epoch 19,   650/  725; acc:   0.00; ppl:   1.65; 6107 src tok/s; 6625 tgt tok/s;     55 s elapsed
Epoch 19,   700/  725; acc:   0.00; ppl:   1.61; 4290 src tok/s; 4640 tgt tok/s;     61 s elapsed
Train perplexity: 1.77923
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59494
Validation accuracy: 0
Decaying learning rate to 1.2207e-07

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 20,    50/  725; acc:   0.00; ppl:   2.00; 3771 src tok/s; 4087 tgt tok/s;      7 s elapsed
Epoch 20,   100/  725; acc:   0.00; ppl:   1.97; 4334 src tok/s; 4682 tgt tok/s;     13 s elapsed
Epoch 20,   150/  725; acc:   0.00; ppl:   1.93; 5777 src tok/s; 6229 tgt tok/s;     18 s elapsed
Epoch 20,   200/  725; acc:   0.00; ppl:   1.90; 5388 src tok/s; 5832 tgt tok/s;     23 s elapsed
Epoch 20,   250/  725; acc:   0.00; ppl:   1.91; 5023 src tok/s; 5453 tgt tok/s;     28 s elapsed
Epoch 20,   300/  725; acc:   0.00; ppl:   1.79; 5298 src tok/s; 5708 tgt tok/s;     33 s elapsed
Epoch 20,   350/  725; acc:   0.00; ppl:   1.78; 5783 src tok/s; 6271 tgt tok/s;     37 s elapsed
Epoch 20,   400/  725; acc:   0.00; ppl:   1.74; 5213 src tok/s; 5601 tgt tok/s;     42 s elapsed
Epoch 20,   450/  725; acc:   0.00; ppl:   1.73; 5338 src tok/s; 5796 tgt tok/s;     48 s elapsed
Epoch 20,   500/  725; acc:   0.00; ppl:   1.69; 5320 src tok/s; 5771 tgt tok/s;     52 s elapsed
Epoch 20,   550/  725; acc:   0.00; ppl:   1.68; 5563 src tok/s; 6042 tgt tok/s;     57 s elapsed
Epoch 20,   600/  725; acc:   0.00; ppl:   1.64; 5293 src tok/s; 5736 tgt tok/s;     62 s elapsed
Epoch 20,   650/  725; acc:   0.00; ppl:   1.64; 5269 src tok/s; 5716 tgt tok/s;     67 s elapsed
Epoch 20,   700/  725; acc:   0.00; ppl:   1.60; 5327 src tok/s; 5761 tgt tok/s;     72 s elapsed
Train perplexity: 1.77865
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59501
Validation accuracy: 0
Decaying learning rate to 6.10352e-08

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 21,    50/  725; acc:   0.00; ppl:   2.00; 5304 src tok/s; 5748 tgt tok/s;      5 s elapsed
Epoch 21,   100/  725; acc:   0.00; ppl:   1.96; 5261 src tok/s; 5683 tgt tok/s;     10 s elapsed
Epoch 21,   150/  725; acc:   0.00; ppl:   1.92; 5310 src tok/s; 5726 tgt tok/s;     15 s elapsed
Epoch 21,   200/  725; acc:   0.00; ppl:   1.89; 5346 src tok/s; 5786 tgt tok/s;     20 s elapsed
Epoch 21,   250/  725; acc:   0.00; ppl:   1.91; 5368 src tok/s; 5827 tgt tok/s;     25 s elapsed
Epoch 21,   300/  725; acc:   0.00; ppl:   1.80; 5473 src tok/s; 5896 tgt tok/s;     30 s elapsed
Epoch 21,   350/  725; acc:   0.00; ppl:   1.78; 5305 src tok/s; 5752 tgt tok/s;     35 s elapsed
Epoch 21,   400/  725; acc:   0.00; ppl:   1.74; 5464 src tok/s; 5870 tgt tok/s;     40 s elapsed
Epoch 21,   450/  725; acc:   0.00; ppl:   1.73; 5402 src tok/s; 5865 tgt tok/s;     44 s elapsed
Epoch 21,   500/  725; acc:   0.00; ppl:   1.70; 5458 src tok/s; 5921 tgt tok/s;     49 s elapsed
Epoch 21,   550/  725; acc:   0.00; ppl:   1.68; 5482 src tok/s; 5954 tgt tok/s;     54 s elapsed
Epoch 21,   600/  725; acc:   0.00; ppl:   1.66; 5325 src tok/s; 5771 tgt tok/s;     59 s elapsed
Epoch 21,   650/  725; acc:   0.00; ppl:   1.64; 5399 src tok/s; 5857 tgt tok/s;     64 s elapsed
Epoch 21,   700/  725; acc:   0.00; ppl:   1.60; 5234 src tok/s; 5660 tgt tok/s;     69 s elapsed
Train perplexity: 1.77863
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59508
Validation accuracy: 0
Decaying learning rate to 3.05176e-08

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 22,    50/  725; acc:   0.00; ppl:   2.00; 5534 src tok/s; 5996 tgt tok/s;      5 s elapsed
Epoch 22,   100/  725; acc:   0.00; ppl:   1.97; 5427 src tok/s; 5862 tgt tok/s;     10 s elapsed
Epoch 22,   150/  725; acc:   0.00; ppl:   1.91; 5490 src tok/s; 5920 tgt tok/s;     14 s elapsed
Epoch 22,   200/  725; acc:   0.00; ppl:   1.91; 5615 src tok/s; 6078 tgt tok/s;     19 s elapsed
Epoch 22,   250/  725; acc:   0.00; ppl:   1.91; 5544 src tok/s; 6018 tgt tok/s;     24 s elapsed
Epoch 22,   300/  725; acc:   0.00; ppl:   1.80; 5599 src tok/s; 6032 tgt tok/s;     29 s elapsed
Epoch 22,   350/  725; acc:   0.00; ppl:   1.78; 5592 src tok/s; 6064 tgt tok/s;     33 s elapsed
Epoch 22,   400/  725; acc:   0.00; ppl:   1.74; 5449 src tok/s; 5854 tgt tok/s;     38 s elapsed
Epoch 22,   450/  725; acc:   0.00; ppl:   1.74; 5592 src tok/s; 6071 tgt tok/s;     43 s elapsed
Epoch 22,   500/  725; acc:   0.00; ppl:   1.70; 5291 src tok/s; 5740 tgt tok/s;     48 s elapsed
Epoch 22,   550/  725; acc:   0.00; ppl:   1.67; 5246 src tok/s; 5698 tgt tok/s;     53 s elapsed
Epoch 22,   600/  725; acc:   0.00; ppl:   1.64; 5331 src tok/s; 5777 tgt tok/s;     58 s elapsed
Epoch 22,   650/  725; acc:   0.00; ppl:   1.64; 5364 src tok/s; 5820 tgt tok/s;     63 s elapsed
Epoch 22,   700/  725; acc:   0.00; ppl:   1.60; 5269 src tok/s; 5698 tgt tok/s;     68 s elapsed
Train perplexity: 1.77845
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59508
Validation accuracy: 0
Decaying learning rate to 1.52588e-08

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 23,    50/  725; acc:   0.00; ppl:   2.01; 5273 src tok/s; 5714 tgt tok/s;      5 s elapsed
Epoch 23,   100/  725; acc:   0.00; ppl:   1.96; 5283 src tok/s; 5707 tgt tok/s;     10 s elapsed
Epoch 23,   150/  725; acc:   0.00; ppl:   1.92; 5429 src tok/s; 5854 tgt tok/s;     15 s elapsed
Epoch 23,   200/  725; acc:   0.00; ppl:   1.91; 5772 src tok/s; 6248 tgt tok/s;     19 s elapsed
Epoch 23,   250/  725; acc:   0.00; ppl:   1.92; 5311 src tok/s; 5765 tgt tok/s;     25 s elapsed
Epoch 23,   300/  725; acc:   0.00; ppl:   1.79; 5607 src tok/s; 6041 tgt tok/s;     29 s elapsed
Epoch 23,   350/  725; acc:   0.00; ppl:   1.78; 5581 src tok/s; 6051 tgt tok/s;     34 s elapsed
Epoch 23,   400/  725; acc:   0.00; ppl:   1.74; 5664 src tok/s; 6084 tgt tok/s;     39 s elapsed
Epoch 23,   450/  725; acc:   0.00; ppl:   1.73; 5565 src tok/s; 6042 tgt tok/s;     43 s elapsed
Epoch 23,   500/  725; acc:   0.00; ppl:   1.70; 5433 src tok/s; 5894 tgt tok/s;     48 s elapsed
Epoch 23,   550/  725; acc:   0.00; ppl:   1.68; 5305 src tok/s; 5762 tgt tok/s;     53 s elapsed
Epoch 23,   600/  725; acc:   0.00; ppl:   1.64; 5321 src tok/s; 5766 tgt tok/s;     58 s elapsed
Epoch 23,   650/  725; acc:   0.00; ppl:   1.63; 5570 src tok/s; 6043 tgt tok/s;     63 s elapsed
Epoch 23,   700/  725; acc:   0.00; ppl:   1.62; 5086 src tok/s; 5500 tgt tok/s;     68 s elapsed
Train perplexity: 1.78003
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59508
Validation accuracy: 0
Decaying learning rate to 7.62939e-09

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 24,    50/  725; acc:   0.00; ppl:   2.00; 5179 src tok/s; 5612 tgt tok/s;      5 s elapsed
Epoch 24,   100/  725; acc:   0.00; ppl:   1.97; 5270 src tok/s; 5693 tgt tok/s;     10 s elapsed
Epoch 24,   150/  725; acc:   0.00; ppl:   1.91; 5621 src tok/s; 6061 tgt tok/s;     15 s elapsed
Epoch 24,   200/  725; acc:   0.00; ppl:   1.90; 5539 src tok/s; 5995 tgt tok/s;     20 s elapsed
Epoch 24,   250/  725; acc:   0.00; ppl:   1.91; 5519 src tok/s; 5991 tgt tok/s;     24 s elapsed
Epoch 24,   300/  725; acc:   0.00; ppl:   1.79; 5305 src tok/s; 5715 tgt tok/s;     29 s elapsed
Epoch 24,   350/  725; acc:   0.00; ppl:   1.79; 5416 src tok/s; 5872 tgt tok/s;     34 s elapsed
Epoch 24,   400/  725; acc:   0.00; ppl:   1.75; 5396 src tok/s; 5797 tgt tok/s;     39 s elapsed
Epoch 24,   450/  725; acc:   0.00; ppl:   1.74; 5136 src tok/s; 5576 tgt tok/s;     44 s elapsed
Epoch 24,   500/  725; acc:   0.00; ppl:   1.69; 5026 src tok/s; 5453 tgt tok/s;     50 s elapsed
Epoch 24,   550/  725; acc:   0.00; ppl:   1.67; 5451 src tok/s; 5920 tgt tok/s;     54 s elapsed
Epoch 24,   600/  725; acc:   0.00; ppl:   1.65; 5437 src tok/s; 5892 tgt tok/s;     59 s elapsed
Epoch 24,   650/  725; acc:   0.00; ppl:   1.63; 5161 src tok/s; 5600 tgt tok/s;     64 s elapsed
Epoch 24,   700/  725; acc:   0.00; ppl:   1.61; 5209 src tok/s; 5633 tgt tok/s;     69 s elapsed
Train perplexity: 1.77821
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59507
Validation accuracy: 0
Decaying learning rate to 3.8147e-09

Loading train dataset from data/m30k.train.1.pt, number of examples: 29000
Epoch 25,    50/  725; acc:   0.00; ppl:   1.99; 5320 src tok/s; 5765 tgt tok/s;      5 s elapsed
Epoch 25,   100/  725; acc:   0.00; ppl:   1.96; 5449 src tok/s; 5886 tgt tok/s;     10 s elapsed
Epoch 25,   150/  725; acc:   0.00; ppl:   1.90; 5350 src tok/s; 5769 tgt tok/s;     15 s elapsed
Epoch 25,   200/  725; acc:   0.00; ppl:   1.90; 5486 src tok/s; 5938 tgt tok/s;     20 s elapsed
Epoch 25,   250/  725; acc:   0.00; ppl:   1.93; 5207 src tok/s; 5653 tgt tok/s;     25 s elapsed
Epoch 25,   300/  725; acc:   0.00; ppl:   1.78; 5414 src tok/s; 5833 tgt tok/s;     30 s elapsed
Epoch 25,   350/  725; acc:   0.00; ppl:   1.78; 5747 src tok/s; 6231 tgt tok/s;     34 s elapsed
Epoch 25,   400/  725; acc:   0.00; ppl:   1.74; 5454 src tok/s; 5860 tgt tok/s;     39 s elapsed
Epoch 25,   450/  725; acc:   0.00; ppl:   1.73; 5452 src tok/s; 5920 tgt tok/s;     44 s elapsed
Epoch 25,   500/  725; acc:   0.00; ppl:   1.70; 5432 src tok/s; 5893 tgt tok/s;     49 s elapsed
Epoch 25,   550/  725; acc:   0.00; ppl:   1.67; 5275 src tok/s; 5730 tgt tok/s;     54 s elapsed
Epoch 25,   600/  725; acc:   0.00; ppl:   1.65; 5331 src tok/s; 5777 tgt tok/s;     59 s elapsed
Epoch 25,   650/  725; acc:   0.00; ppl:   1.64; 5397 src tok/s; 5856 tgt tok/s;     64 s elapsed
Epoch 25,   700/  725; acc:   0.00; ppl:   1.61; 5391 src tok/s; 5830 tgt tok/s;     68 s elapsed
Train perplexity: 1.77785
Train accuracy: 0
Loading valid dataset from data/m30k.valid.1.pt, number of examples: 1014
Validation perplexity: 8.59507
Validation accuracy: 0
Decaying learning rate to 1.90735e-09
@iacercalixto
Copy link

iacercalixto commented Dec 19, 2019

Hi, I suppose you referring to the accuracy scores being 0 throughout the training process, while training/validation perplexity seems to be going down as expected.

Have you tried looking into the code to see what might be happening? That did not happen with me, I believe. It could help if you could inform what are the exact versions of the libraries you're using (e.g. python, pytorch, etc.).

Also, have you looked at the outputs generated by the model at the end of the training process, for instance with translate.py? I guess the question is: is it an issue of printing the accuracy properly during training, or is the model not being properly trained?

@Eurus-Holmes Eurus-Holmes added help wanted Extra attention is needed Todo labels Jan 12, 2020
@PengboLiu
Copy link

I also encountered the same problem

@LakeCarrot
Copy link

Hi @iacercalixto ,

I encounted the same problem and tried your method. Below is what I got.

(mnmt-env) (base) bo@bo-thinkstation:~/Workspace/multimodal/benchmark-applications/MNMT$ python translate_mm.py -src data/wmt16/Multi30K_DE/test.norm.tok.lc.10000bpe.en -model model_snapshots/${MODEL_SNAPSHOT} -path_to_test_img_feats ./flickr30k_test_vgg19_bn_cnn_features.hdf5 -output model_snapshots/${MODEL_SNAPSHOT}.translations-test2016
Batch size > 1 not implemented! Falling back to batch_size = 1 ...
Building multi-modal model...
Loading model parameters.
/home/bo/Workspace/multimodal/benchmark-applications/MNMT/mnmt-env/lib/python3.6/site-packages/torchtext/data/field.py:321: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  return Variable(arr, volatile=not train), lengths
/home/bo/Workspace/multimodal/benchmark-applications/MNMT/mnmt-env/lib/python3.6/site-packages/torchtext/data/field.py:322: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  return Variable(arr, volatile=not train)
/home/bo/Workspace/multimodal/benchmark-applications/MNMT/onmt/translate/TranslatorMultimodal.py:101: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  def var(a): return Variable(a, volatile=True)
/home/bo/Workspace/multimodal/benchmark-applications/MNMT/onmt/Models.py:626: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  for e in self._all]
/home/bo/Workspace/multimodal/benchmark-applications/MNMT/onmt/modules/GlobalAttention.py:177: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  align_vectors = self.sm(align.view(batch*targetL, sourceL))
/home/bo/Workspace/multimodal/benchmark-applications/MNMT/mnmt-env/lib/python3.6/site-packages/torch/nn/modules/container.py:92: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  input = module(input)
PRED AVG SCORE: -0.4037, PRED PPL: 1.4974

The ppl looks good to me. But i have no idea what the average score is and whether it is reasonable or not. Do you have any insight on this result?

Best,
Bo

@iacercalixto
Copy link

Hi @LakeCarrot , what you are mentioning is indeed not a problem. Also, I don't think it is related to the topic created by @Eurus-Holmes (just for completeness' sake). The problem raised earlier is that the accuracy being printed in the training procedure is "0.00".

The perplexity and average negative log-likelihood (score) in your example both look correct. In order to know whether the translations make sense, check the generated output files. By default that's "pred.txt", unsless you called translate_mm.py with a parameter to set the output file to another file name.

Best,
Iacer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed Todo
Projects
None yet
Development

No branches or pull requests

4 participants