Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError (matrix and matrix expected) while training #20

Closed
erayyildiz opened this issue Nov 24, 2017 · 6 comments
Closed

RuntimeError (matrix and matrix expected) while training #20

erayyildiz opened this issue Nov 24, 2017 · 6 comments

Comments

@erayyildiz
Copy link

Hello, thanks for making available this tool.

I am using IBM power pc machine with Ubuntu 16.03 and I am getting en error while I am trying to train a postag model. Here is the command I am running:
python train_wc.py --train_file cmc_test_twitter.txt --dev_file cmc_test_twitter.txt --test_file cmc_test_twitter.txt --eva_matrix a --checkpoint ./checkpoint/pos_ --lr 0.015 --caseless --fine_tune --high_way --co_train

And here is the output:

train
setting:
Namespace(batch_size=10, caseless=True, char_dim=30, char_hidden=300, char_layers=1, checkpoint='./checkpoint/pos_', clip_grad=5.0, co_train=True, dev_file='empirist_gold_cmc/tagged/cmc_test_twitter.txt', drop_out=0.5, emb_file='./embedding/glove.6B.100d.txt', epoch=200, eva_matrix='a', fine_tune=False, gpu=0, high_way=True, highway_layers=1, lambda0=1, least_iters=50, load_check_point='', load_opt=False, lr=0.015, lr_decay=0.05, mini_count=5, momentum=0.9, patience=15, rand_embedding=False, shrink_embedding=False, small_crf=True, start_epoch=0, test_file='empirist_gold_cmc/tagged/cmc_test_twitter.txt', train_file='empirist_gold_cmc/tagged/cmc_test_twitter.txt', unk='unk', update='sgd', word_dim=100, word_hidden=300, word_layers=1)
loading corpus
constructing coding table
feature size: '44'
loading embedding
embedding size: '400005'
constructing dataset
building model
device: 0
Traceback (most recent call last):       
  File "train_wc.py", line 188, in <module>
    scores = ner_model(f_f, f_p, b_f, b_p, w_f)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eray/projects/LM-LSTM-CRF/model/lm_lstm_crf.py", line 235, in forward
    char_out = self.fb2char(fb_lstm_out)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eray/projects/LM-LSTM-CRF/model/highway.py", line 53, in forward
    g = nn.functional.sigmoid(self.gate[0](x))
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear.apply(input, self.weight, self.bias)
  File "/home/eray/anaconda3/lib/python3.5/site-packages/torch/nn/_functions/linear.py", line 12, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: matrix and matrix expected at /home/eray/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:237
@LiyuanLucasLiu
Copy link
Owner

I'm not sure what happened. But it seems that the fb_lstm_out may not be the expected shape. It could be (my guess) caused by somehow ''illegal input format''

Hope it can help you :-)

@erayyildiz
Copy link
Author

erayyildiz commented Nov 25, 2017

Hi Liyuan, thanks for your reply but I don't think so the problem is about my data. I am using a samle pos tagging dataset which is in conll format you described in readme file (word tag\n).

The shape of fb_lstm_out is [13, 10, 600]. 600 is the output vector of bidirectional lstm layer and 10 is batch size I am using. I think 13 is the number of words. Is the shape of fb_lstm_out correct? Do you have any advice for me, what should I check else?

@LiyuanLucasLiu
Copy link
Owner

It seems correct. Have you tried the default dataset? i mean wsj / conll03 / conll00 #9 . I think the first thing we should do is to make sure it's not caused by runtime environments.

@erayyildiz
Copy link
Author

I tried the default dataset and the I got the same error again. I think it is about my environment. I have an IBM PowerAI machine and its cpu architecture is powerpc64le which is quite different than intel cpu architectures. Although I install pytorch on anaconda for powerpc, it seems there are still some environmental problems with pytorch.

@LiyuanLucasLiu
Copy link
Owner

Are you using cpu for the training? If so, you should set --gpu -1.
I think the architecture would not be a big issue, but the dependent libraries may have some problems.
Besides, i would recommend you to re-install pytorch or anaconda.

@erayyildiz
Copy link
Author

I tried both cpu and gpu for training and it did not make any changes. I will try to reinstalling pytorch and anaconda. I will inform you about the results. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants