Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

op cudnn_lstm does not have kernel for data_type[double]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN] at [D:\1.6.1\paddle\paddle\fluid\framework\operator.cc:1007] #21122

Closed
Wahaha1314 opened this issue Nov 12, 2019 · 28 comments
Assignees

Comments

@Wahaha1314
Copy link

Wahaha1314 commented Nov 12, 2019

  • 版本、环境信息:
       1)PaddlePaddle版本:1.6.1
     
       2)GPU:预测若用GPU,CUDA 9.0,cudnn 7.6
       3)系统环境:window 10 python3.6
  • 训练信息
       1)单机,单卡
     
  • 部分复现信息:(train_X, train_Y就是我自己的数据集,其维度的我有在下面打印出来)
    batch_size=60 #迭代次数
    max_len=10 #input_dim不能超过该值
    dropout_prob=0.2
    seq_len=20 #时间序列长度
    hidden_size=6 #隐藏层单元
    num_layers=1 #叠加LSTM的层数
    learning_rate=0.002 #学习率

data1= fluid.data(name='data1', shape=[-1,20,6], dtype='float64')
label = fluid.data(name='label', shape=[-1,1], dtype='float64')

print(data1)
init_h = fluid.layers.fill_constant( [num_layers, batch_size, hidden_size], 'float64', 0.0 )
init_c = fluid.layers.fill_constant( [num_layers, batch_size, hidden_size], 'float64', 0.0 )

rnn_out, last_h, last_c = fluid.layers.lstm(data1, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob)
print(last_h)
last_result=last_h[0,:,:]
print(last_result)
out = fluid.layers.fc(input=last_result, size=1, act=None)

获取损失函数

cost = fluid.layers.square_error_cost(input=out, label=label)
avg_cost = fluid.layers.mean(cost)

获取预测程序

test_program = fluid.default_main_program().clone(for_test=True)

定义优化方法

optimizer = fluid.optimizer.AdagradOptimizer(learning_rate)
opt = optimizer.minimize(avg_cost)

创建一个执行器,CPU训练速度比较慢

place = fluid.CPUPlace()

place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)

进行参数初始化

exe.run(fluid.default_startup_program())

def reader_createor(data, label):
def reader():
for i in range(len(data)):
yield data[i,:], label[i]
return reader

train_reader = fluid.io.batch(fluid.io.shuffle(reader_createor(train_X, train_Y), 3000), batch_size=60,drop_last=True)

#加载测试数据
#train_reader = fluid.io.batch(fluid.io.shuffle(train_X, 3000), batch_size=60,drop_last=True)

feeder = fluid.DataFeeder(place=place, feed_list=[data1, label])

开始训练

for pass_id in range(1000):
# 进行训练
train_cost = 0
for batch_id, data in enumerate(train_reader()):
train_cost = exe.run(program=fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[avg_cost])

    if batch_id % 20 == 0:
        print('Pass:%d, Batch:%d, Cost:%0.5f' % (pass_id, batch_id, train_cost[0]))
  • 问题描述:请详细描述您的问题,同步贴出报错信息、日志、可复现的代码片段
    前面是一点输出,后面是报错信息(我也试过换类型,换成了float32类型,但是就报PaddleCheckError: CUDNN_STATUS_BAD_PARAM at [D:\1.6.1\paddle\paddle\fluid\operators\cudnn_lstm_op.cu.cc:113]错误!!)
    train_x1的维度: (2980, 20)
    train_Y的维度: (2980, 1)
    train_XT的维度: (6, 2980, 20)
    train_X的维度: (2980, 20, 6)
    name: "data1"
    type {
    type: LOD_TENSOR
    lod_tensor {
    tensor {
    data_type: FP64
    dims: -1
    dims: 20
    dims: 6
    }
    lod_level: 0
    }
    }
    persistable: false
    need_check_feed: true

name: "cudnn_lstm_0.tmp_1"
type {
type: LOD_TENSOR
lod_tensor {
tensor {
data_type: FP64
dims: 1
dims: 60
dims: 6
}
}
}
persistable: false

name: "cudnn_lstm_0.tmp_1_slice_0"
type {
type: LOD_TENSOR
lod_tensor {
tensor {
data_type: FP64
dims: 60
dims: 6
}
}
}

C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\executor.py:774: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "D:\eclipse-workspace\pd_pd\pd_lstm\pd_lstm3.py", line 181, in
fetch_list=[avg_cost])
File "C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\executor.py", line 775, in run
six.reraise(*sys.exc_info())
File "E:\Anaconda3\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\executor.py", line 770, in run
use_program_cache=use_program_cache)
File "C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\executor.py", line 817, in _run_impl
use_program_cache=use_program_cache)
File "C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\executor.py", line 894, in _run_program
fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.


Python Call Stacks (More useful to users):

File "C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\framework.py", line 2459, in append_op
attrs=kwargs.get("attrs", None))
File "C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "C:\Users\17720\AppData\Roaming\Python\Python36\site-packages\paddle\fluid\layers\nn.py", line 1018, in lstm
'seed': seed,
File "D:\eclipse-workspace\pd_pd\pd_lstm\pd_lstm3.py", line 129, in
rnn_out, last_h, last_c = fluid.layers.lstm(data1, init_h, init_c, max_len, hidden_size, num_layers, dropout_prob=dropout_prob)


Error Message Summary:

PaddleCheckError: op cudnn_lstm does not have kernel for data_type[double]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN] at [D:\1.6.1\paddle\paddle\fluid\framework\operator.cc:1007]
[operator < cudnn_lstm > error]
W1112 09:50:24.788756 14520 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.1, Runtime API Version: 9.0
W1112 09:50:24.795739 14520 device_context.cc:243] device: 0, cuDNN Version: 7.6.

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

lstm这个OP现在只支持float32类型的数据,可以把输入数据改成float32试下~

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

调小batch size试下,还是不行的话麻烦贴下完整的报错信息~

@ceci3 ceci3 self-assigned this Nov 12, 2019
@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

您需要我的完整代码和数据集这样方便您调试吗?我研究了两天了,一直没有解决。   ------------------ 原始邮件 ------------------ 发件人: "ceci3"<notifications@github.com>; 发送时间: 2019年11月12日(星期二) 中午11:38 收件人: "PaddlePaddle/Paddle"<Paddle@noreply.github.com>; 抄送: "无名何许人"<y149167@foxmail.com>;"Author"<author@noreply.github.com>; 主题: Re: [PaddlePaddle/Paddle] op cudnn_lstm does not have kernel for data_type[double]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN] at [D:\1.6.1\paddle\paddle\fluid\framework\operator.cc:1007] (#21122) 调小batch size试下,还是不行的话麻烦贴下完整的报错信息~ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

好的,还是没有看到附件,应该是需要上传上来

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

image

还是没有。。这是我能看到你说那个带附件的恢复消息。。

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

用电脑在这个issue下面comment里直接上传附件是可以的

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 12, 2019 via email

@Wahaha1314
Copy link
Author

pd_lstm.zip

对不起,我一直用邮件在回复 放到邮件附件里了,没注意这边上传 我现在上传给您

@Wahaha1314
Copy link
Author

上条记录里面有附件,您收到了吧? 真的太麻烦您了,第一次在github上提问题,所以有点智商捉急,哈哈哈。

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

收到~ 😁

@Wahaha1314
Copy link
Author

请问目前有相关的解决方案了吗?还是出现了什么问题?

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

我这儿运行会报错,暂时还没复现出您上面说的那些维度的信息还是可以打印出来,但是就是执行lstm 以及最后的平均误差打印不出来&nbsp; &nbsp;同时是很快报停 大概就10秒这个情况

@ceci3
Copy link
Contributor

ceci3 commented Nov 12, 2019

您可以先在lstm后面和计算平均误差后面使用fluid.layer.Print打印看下能否打印出结果

@Wahaha1314
Copy link
Author

打印不出任何信息,纠结的很。

@Wahaha1314
Copy link
Author

我感觉程序很简单,应该不可能是其他的错误呀,感觉执行lstm那条语句就程序死掉了

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 13, 2019 via email

@ceci3
Copy link
Contributor

ceci3 commented Nov 13, 2019

给代码在data1的定义下面加上data1.stop_gradient=False试下

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 13, 2019 via email

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 13, 2019 via email

@ceci3
Copy link
Contributor

ceci3 commented Nov 13, 2019

stop_gradient是为了标明这个参数是不是需要 求梯度,数据输入层一般我们认为是不需要梯度的,所以默认数据输入层的stop_gradient=True,要手动设置为False才标明要计算这个参数的梯度。因为你这个网络输入层直接传给了lstm,lstm的参数都需要梯度回传,所以需要手动添加stop_gradient=False。

@Wahaha1314
Copy link
Author

Wahaha1314 commented Nov 13, 2019 via email

@ceci3
Copy link
Contributor

ceci3 commented Nov 13, 2019

没有别的问题的话,那我先关掉了这个issue了~

@ceci3 ceci3 closed this as completed Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants