You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I also posted in Chainer repository, but I think this issue may be more accurate here: chainer/chainer#4709
My problem is: If I misuse MultiprocessParallelUpdater such that an exception is thrown, the python process does not exit, but just hangs.
The code to reproduce is at the bottom. It causes cupy.cuda.runtime.CUDARuntimeError: cudaErrorInitializationError: initialization error, but the python process does not exit.
I'd like the process to exit, but I am not sure how. Is this possible?
Thank you.
Conditions
Chainer version 4.0.0
CuPy version 4.0.0
OS/Platform ubuntu 16.04
CUDA/cuDNN version 9.0
Code to reproduce
import chainer
import chainer.functions as F
import chainer.links as L
class MLP(chainer.Chain):
def __init__(self, n_units, n_out):
super(MLP, self).__init__()
with self.init_scope():
# the size of the inputs to each layer will be inferred
self.l1 = L.Linear(None, n_units) # n_in -> n_units
self.l2 = L.Linear(None, n_units) # n_units -> n_units
self.l3 = L.Linear(None, n_out) # n_units -> n_out
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
return self.l3(h2)
def train():
train, test = chainer.datasets.get_mnist()
batch_size = 64
learning_rate = 0.05
model = L.Classifier(MLP(1000, 10))
optimizer = chainer.optimizers.MomentumSGD(learning_rate)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer.WeightDecay(5e-4))
# Set up a trainer
num_gpus = 2
devices = range(num_gpus)
# this is just to force the error
chainer.cuda.get_device_from_id(0).use()
train_iters = [chainer.iterators.MultiprocessIterator(i, batch_size, n_processes=num_gpus) \
for i in chainer.datasets.split_dataset_n_random(train, len(devices))]
updater = training.updaters.MultiprocessParallelUpdater(train_iters, optimizer, devices=range(num_gpus))
updater.setup_workers()
if __name__=="__main__":
train()
Stacktrace:
algo-1_1 | Process _Worker-1:
algo-1_1 | Traceback (most recent call last):
algo-1_1 | File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
algo-1_1 | self.run()
algo-1_1 | File "/usr/local/lib/python3.5/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 45, in run
algo-1_1 | dev.use()
algo-1_1 | File "cupy/cuda/device.pyx", line 101, in cupy.cuda.device.Device.use
algo-1_1 | File "cupy/cuda/device.pyx", line 107, in cupy.cuda.device.Device.use
algo-1_1 | File "cupy/cuda/runtime.pyx", line 184, in cupy.cuda.runtime.setDevice
algo-1_1 | File "cupy/cuda/runtime.pyx", line 136, in cupy.cuda.runtime.check_status
algo-1_1 | cupy.cuda.runtime.CUDARuntimeError: cudaErrorInitializationError: initialization error
I had a problem similar to this one, but with mpi4py / chainermn, so the answer might be similar: chainer/chainermn#236
The text was updated successfully, but these errors were encountered:
Hi,
I also posted in Chainer repository, but I think this issue may be more accurate here: chainer/chainer#4709
My problem is: If I misuse MultiprocessParallelUpdater such that an exception is thrown, the python process does not exit, but just hangs.
The code to reproduce is at the bottom. It causes
cupy.cuda.runtime.CUDARuntimeError: cudaErrorInitializationError: initialization error
, but the python process does not exit.I'd like the process to exit, but I am not sure how. Is this possible?
Thank you.
Stacktrace:
I had a problem similar to this one, but with mpi4py / chainermn, so the answer might be similar: chainer/chainermn#236
The text was updated successfully, but these errors were encountered: