Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When in_size=None is used in Liner and it is not used, an error occurs #283

Closed
shu65 opened this issue Aug 21, 2018 · 0 comments
Closed
Assignees
Labels
Milestone

Comments

@shu65
Copy link
Member

shu65 commented Aug 21, 2018

When in_size=None is used in Liner and it is not used, an error occurs

Model for error check:

class MLP(chainer.Chain):

    def __init__(self, n_units, n_out):
        super(MLP, self).__init__(
            # the size of the inputs to each layer will be inferred

            l=L.Linear(None, n_units),  # <== added

            l1=L.Linear(784, n_units),  # n_in -> n_units
            l2=L.Linear(n_units, n_units),  # n_units -> n_units
            l3=L.Linear(n_units, n_out),  # n_units -> n_out
        )

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        h2 = F.relu(self.l2(h1))
        return self.l3(h2)

Error message:

==========================================
Num process (COMM_WORLD): 3
Using hierarchical communicator
Num unit: 1000
Num Minibatch-size: 1
Num epoch: 20
==========================================
Exception in main training loop: 'O'
Traceback (most recent call last):
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 307, in run
    update()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 177, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/optimizers.py", line 28, in update
    self.communicator.bcast_data(target)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/communicators/mpi_communicator_base.py", line 605, in bcast_data
    self.mpi_comm.Bcast(buf)
  File "mpi4py/MPI/Comm.pyx", line 578, in mpi4py.MPI.Comm.Bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 473, in mpi4py.MPI._p_msg_cco.for_bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 437, in mpi4py.MPI._p_msg_cco.for_cco_send
  File "mpi4py/MPI/msgbuffer.pxi", line 155, in mpi4py.MPI.message_simple
  File "mpi4py/MPI/msgbuffer.pxi", line 101, in mpi4py.MPI.message_basic
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "train_mnist.py", line 124, in <module>
    main()
  File "train_mnist.py", line 120, in main
    trainer.run()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 321, in run
    six.reraise(*sys.exc_info())
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 307, in run
    update()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 177, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/optimizers.py", line 28, in update
    self.communicator.bcast_data(target)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/communicators/mpi_communicator_base.py", line 605, in bcast_data
    self.mpi_comm.Bcast(buf)
  File "mpi4py/MPI/Comm.pyx", line 578, in mpi4py.MPI.Comm.Bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 473, in mpi4py.MPI._p_msg_cco.for_bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 437, in mpi4py.MPI._p_msg_cco.for_cco_send
  File "mpi4py/MPI/msgbuffer.pxi", line 155, in mpi4py.MPI.message_simple
  File "mpi4py/MPI/msgbuffer.pxi", line 101, in mpi4py.MPI.message_basic
KeyError: 'O'
Exception in main training loop: 'O'
Traceback (most recent call last):
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 307, in run
    update()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 177, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/optimizers.py", line 28, in update
    self.communicator.bcast_data(target)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/communicators/mpi_communicator_base.py", line 605, in bcast_data
    self.mpi_comm.Bcast(buf)
  File "mpi4py/MPI/Comm.pyx", line 578, in mpi4py.MPI.Comm.Bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 476, in mpi4py.MPI._p_msg_cco.for_bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 453, in mpi4py.MPI._p_msg_cco.for_cco_recv
  File "mpi4py/MPI/msgbuffer.pxi", line 155, in mpi4py.MPI.message_simple
  File "mpi4py/MPI/msgbuffer.pxi", line 101, in mpi4py.MPI.message_basic
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "train_mnist.py", line 124, in <module>
    main()
  File "train_mnist.py", line 120, in main
    trainer.run()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 321, in run
    six.reraise(*sys.exc_info())
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 307, in run
    update()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 177, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/optimizers.py", line 28, in update
    self.communicator.bcast_data(target)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/communicators/mpi_communicator_base.py", line 605, in bcast_data
    self.mpi_comm.Bcast(buf)
  File "mpi4py/MPI/Comm.pyx", line 578, in mpi4py.MPI.Comm.Bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 476, in mpi4py.MPI._p_msg_cco.for_bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 453, in mpi4py.MPI._p_msg_cco.for_cco_recv
  File "mpi4py/MPI/msgbuffer.pxi", line 155, in mpi4py.MPI.message_simple
  File "mpi4py/MPI/msgbuffer.pxi", line 101, in mpi4py.MPI.message_basic
KeyError: 'O'
Exception in main training loop: 'O'
Traceback (most recent call last):
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 307, in run
    update()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 177, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/optimizers.py", line 28, in update
    self.communicator.bcast_data(target)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/communicators/mpi_communicator_base.py", line 605, in bcast_data
    self.mpi_comm.Bcast(buf)
  File "mpi4py/MPI/Comm.pyx", line 578, in mpi4py.MPI.Comm.Bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 476, in mpi4py.MPI._p_msg_cco.for_bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 453, in mpi4py.MPI._p_msg_cco.for_cco_recv
  File "mpi4py/MPI/msgbuffer.pxi", line 155, in mpi4py.MPI.message_simple
  File "mpi4py/MPI/msgbuffer.pxi", line 101, in mpi4py.MPI.message_basic
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "train_mnist.py", line 124, in <module>
    main()
  File "train_mnist.py", line 120, in main
    trainer.run()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 321, in run
    six.reraise(*sys.exc_info())
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/trainer.py", line 307, in run
    update()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 177, in update_core
    optimizer.update(loss_func, *in_arrays)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/optimizers.py", line 28, in update
    self.communicator.bcast_data(target)
  File "/home/taizan/.pyenv/versions/3.6.1/lib/python3.6/site-packages/chainermn/communicators/mpi_communicator_base.py", line 605, in bcast_data
    self.mpi_comm.Bcast(buf)
  File "mpi4py/MPI/Comm.pyx", line 578, in mpi4py.MPI.Comm.Bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 476, in mpi4py.MPI._p_msg_cco.for_bcast
  File "mpi4py/MPI/msgbuffer.pxi", line 453, in mpi4py.MPI._p_msg_cco.for_cco_recv
  File "mpi4py/MPI/msgbuffer.pxi", line 155, in mpi4py.MPI.message_simple
  File "mpi4py/MPI/msgbuffer.pxi", line 101, in mpi4py.MPI.message_basic
KeyError: 'O'
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[31121,1],0]
  Exit code:    1
--------------------------------------------------------------------------

versions:

chainer       5.0.0b3
chainermn     1.3.0
cupy          5.0.0b3
@shu65 shu65 mentioned this issue Sep 6, 2018
kuenishi pushed a commit that referenced this issue Sep 25, 2018
@kuenishi kuenishi added this to the 1.3.1 milestone Sep 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants