Check failed: startRow + numRows <= getHeight() #1049

sdwhwxt · 2016-12-30T17:38:32Z

集群训练执行到第二个pass的时候出错：
Matrix.h:228] Check failed: startRow + numRows <= getHeight()；
请教这一般是什么问题？谢谢

wangkuiyi · 2017-01-01T23:14:51Z

@sdwhwxt 感谢反馈！我不了解这块代码。只是想到如果能把整个错误信息（log messages）都贴出来，可能了解这部分代码的朋友们更容易帮上忙。

sdwhwxt · 2017-01-02T06:22:40Z

完整信息只有下面这一行：
F1230 18:27:33.999948 22763 Matrix.h:228] Check failed: startRow + numRows <= getHeight() (6196 vs. 6195)
谢谢

sdwhwxt · 2017-01-22T06:37:55Z

错误复现信息：

F0122 08:56:04.545070 17508 Matrix.h:228] Check failed: startRow + numRows <= getHeight() (5708 vs. 5707) 
*** Check failure stack trace: ***
    @          0x1475638  google::LogMessage::Fail()
    @          0x1475590  google::LogMessage::SendToLog()
    @          0x1475025  google::LogMessage::Flush()
    @          0x1477de6  google::LogMessageFatal::~LogMessageFatal()
    @           0x780abe  paddle::Matrix::subMatrix()
    @           0x663569  paddle::SequenceLastInstanceLayer::forward()
    @           0x70b634  paddle::NeuralNetwork::forward()
    @           0x7015cf  paddle::TrainerThread::forward()
    @           0x7027bc  paddle::TrainerThread::computeThread()
    @     0x7f19bc2fe8a0  execute_native_thread_routine
    @     0x7f19bcd841c3  start_thread
    @     0x7f19bba6f12d  __clone
./train.sh: line 207: 16026 Aborted                 (core dumped) PYTHONPATH=./paddle:$PYTHONPATH GLOG_logtostderr=0 GLOG_log_dir="./log" ./paddle_trainer --num_gradient_servers=${OMPI_COMM_WORLD_SIZE} --trainer_id=${OMPI_COMM_WORLD_RANK} --pservers=$ipstring --rdma_tcp=${rdma_tcp} --nics=${nics} ${train_arg} --config=conf/trainer_config.conf --save_dir=./${save_dir} ${extern_arg}

网络配置

term_seq_data = data_layer("term_seq", 63736)
med_feat_data = data_layer("med_ent_feat", 7244)
emb_term = embedding_layer(input=term_seq_data, size=128)
bi_lstm_term = bidirectional_lstm(input=emb_term, size=128, concat_act=TanhActivation())
bi_lstm_term_out = fc_layer(input=bi_lstm_term, size=128, act=TanhActivation())

med_hid_layer = fc_layer(input=med_feat_data, size=128, act=SoftReluActivation())
output= fc_layer(input=[bi_lstm_term_out, med_hid_layer], size=4, act=SoftmaxActivation())

Yancey1989 · 2017-07-28T09:58:00Z

看起来是比较久远的问题了，我先关闭这个issue，如果用新版本的Paddle还有问题请随时反馈，多谢：）

…ePaddle#1049) remove command argument train.py from the document, which is added by mistake.

* fix distiller * fix distiller * fix distiller * demo imagenet * demo imagenet

Yancey1989 closed this as completed Jul 28, 2017

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019

remove redundant command argument for training using launch.py (Paddl…

1607e56

…ePaddle#1049) remove command argument train.py from the document, which is added by mistake.

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024

Fix distiller (PaddlePaddle#1049)

0bf8a1d

* fix distiller * fix distiller * fix distiller * demo imagenet * demo imagenet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check failed: startRow + numRows <= getHeight() #1049

Check failed: startRow + numRows <= getHeight() #1049

sdwhwxt commented Dec 30, 2016

wangkuiyi commented Jan 1, 2017

sdwhwxt commented Jan 2, 2017

sdwhwxt commented Jan 22, 2017 •

edited by Yancey1989

Loading

Yancey1989 commented Jul 28, 2017

Check failed: startRow + numRows <= getHeight() #1049

Check failed: startRow + numRows <= getHeight() #1049

Comments

sdwhwxt commented Dec 30, 2016

wangkuiyi commented Jan 1, 2017

sdwhwxt commented Jan 2, 2017

sdwhwxt commented Jan 22, 2017 • edited by Yancey1989 Loading

Yancey1989 commented Jul 28, 2017

sdwhwxt commented Jan 22, 2017 •

edited by Yancey1989

Loading