Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample cod Error on Gpu: RuntimeError: cublasSgemm(dev.cublas_handle, #1582

Open
yang182 opened this issue Aug 15, 2019 · 1 comment
Open

Comments

@yang182
Copy link

yang182 commented Aug 15, 2019

import numpy as np
import dynet_config
dynet_config.set_gpu()

import dynet as dy
from optparse import OptionParser
parser = OptionParser()
parser.add_option("--dynet-gpu", action="store_true")
class OurNetwork(object):
def init(self, pc):
self.pW = pc.add_parameters((10,30))
self.pB = pc.add_parameters(10)
self.lookup = pc.add_lookup_parameters((500,10))

def __call__(self, inputs):
    lookup = self.lookup
    emb_vectors = [lookup[i] for i in inputs]
    net_input = dy.concatenate(emb_vectors)
    net_output = dy.softmax((self.pW * net_input) + self.pB)
    return net_output

def create_network_return_loss(self, inputs, expected_output):
    dy.renew_cg()
    out = self(inputs)
    loss = -dy.log(dy.pick(out, expected_output))
    return loss

def create_network_return_best(self, inputs):
    dy.renew_cg()
    out = self(inputs)
    return np.argmax(out.npvalue())

dy.init()
dy.renew_cg()

m = dy.Model()

network = OurNetwork(m)

trainer = dy.SimpleSGDTrainer(m)

for epoch in range(50):
for inp,lbl in (([1,2,3],1), ([3,2,4],2)):
loss = network.create_network_return_loss(inp, lbl)
loss.value()
loss.backward()
trainer.update()
print(loss.value()) # need to run loss.value() for the forward prop

print('Predicted smallest element among {} is {}:'.format([1,2,3], network.create_network_return_best([1,2,3])))

==========================out info=================
[dynet] initializing CUDA
[dynet] CUDA driver/runtime versions are 10.0/10.0
Request for 1 GPU ...
[dynet] Device Number: 0
[dynet] Device name: GeForce RTX 2080 Ti
[dynet] Memory Clock Rate (KHz): 7000000
[dynet] Memory Bus Width (bits): 352
[dynet] Peak Memory Bandwidth (GB/s): 616
[dynet] Memory Free (GB): 1.65767/11.5229
[dynet]
[dynet] Device(s) selected: 0
[dynet] random seed: 3237171193
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
WARNING: Attempting to initialize dynet twice. Ignoring duplicate initialization.
CUBLAS failure in cublasSgemm(dev.cublas_handle, CUBLAS_OP_N, CUBLAS_OP_N, y.d.rows(), y.d.cols() * y.d.batch_elems(), l.d.cols(), dev.kSCALAR_ONE, l.v, l.d.rows(), r.v, r.d.rows(), acc_scalar, y.v, y.d.rows())
13
Traceback (most recent call last):
File "/data/jupyter/hongchao/text2structure/model2note/model/test_dy_gpu.py", line 57, in
loss.value()
File "_dynet.pyx", line 769, in _dynet.Expression.value
File "_dynet.pyx", line 783, in _dynet.Expression.value
RuntimeError: cublasSgemm(dev.cublas_handle, CUBLAS_OP_N, CUBLAS_OP_N, y.d.rows(), y.d.cols() * y.d.batch_elems(), l.d.cols(), dev.kSCALAR_ONE, l.v, l.d.rows(), r.v, r.d.rows(), acc_scalar, y.v, y.d.rows())

Process finished with exit code 1

@cydur
Copy link
Contributor

cydur commented Feb 19, 2021

I've got a similar problem using C++ with CUDA V 10.2.
In my case the forward step works fine and I get a loss value which looks OK. But in the backward step there is an Exception telling:
** On entry to SGEMM parameter number 8 had an illegal value
CUBLAS failure in cublasSgemm(dev.cublas_handle, CUBLAS_OP_N, CUBLAS_OP_T, y.d.rows(), y.d.cols(), l.d.cols() * l.d.batch_elems(), dev.kSCALAR_ONE, l.v, l.d.rows(), r.v, r.d.rows(), dev.kSCALAR_ONE, y.v, y.d.rows())
7

This only occurs with dynet-autobatch turned on. Without autobatch it works fine.

Any ideas someone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants