Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

batch normalization with buckets #2663

Closed
lightingghost opened this issue Jul 9, 2016 · 13 comments
Closed

batch normalization with buckets #2663

lightingghost opened this issue Jul 9, 2016 · 13 comments

Comments

@lightingghost
Copy link
Contributor

lightingghost commented Jul 9, 2016

I want to implement lstm with batch normalization. When I run a sequence to sequence sample with bucketing, mxnet shows an error of

`
Traceback (most recent call last):
File "/home/odin/Documents/sent2mat_mx/model.py", line 206, in
model(config)

File "/home/odin/Documents/sent2mat_mx/model.py", line 201, in model
lr_decay],

File "/home/odin/local/mxnet/python/mxnet/model.py", line 788, in fit
sym_gen=self.sym_gen)

File "/home/odin/local/mxnet/python/mxnet/model.py", line 222, in _train_multi_device
executor_manager.load_data_batch(data_batch)

File "/home/odin/local/mxnet/python/mxnet/executor_manager.py", line 387, in load_data_batch
shared_group=self.execgrp)

File "/home/odin/local/mxnet/python/mxnet/executor_manager.py", line 224, in init
shared_data_arrays=self.shared_data_arrays[i])

File "/home/odin/local/mxnet/python/mxnet/executor_manager.py", line 170, in _bind_exec
assert aux_shape[i] == a.shape

IndexError: list index out of range
`

It seems to me that when generating executor_manager (model.py line 184), it tends to use the default bucket length, which is the longest bucket. However, when feeding data, (model.py line 222), the bucket length is not essentially the same as the longest bucket. The auxiliary states are then not the same as previous ones.

Is it possible to solve this problem? Any help would be appreciated.

@lightingghost
Copy link
Contributor Author

lightingghost commented Jul 9, 2016

and that if I do not use bucketing, instead I use fixed length, mxnet will throw

CUDA: an illegal memory access was encountered

If I compile with DEBUG and backtrace, the error information was given as
`

#0 0x00007fffbf725d00 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007fffbf71ae97 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007fffbf30ce29 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007fffbf67391b in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007fffbf67516a in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007fffbf2502ff in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007fffbf1e45e7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007fffbf1aa782 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#8 0x00007fffbf1ab2b3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#9 0x00007fffbf1050de in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#10 0x00007fffbf1053c0 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#11 0x00007fffe903d52d in ?? () from /usr/local/cuda-7.5/lib64/libcudart.so.7.5
#12 0x00007fffe9031ba0 in ?? () from /usr/local/cuda-7.5/lib64/libcudart.so.7.5
#13 0x00007fffe903c796 in ?? () from /usr/local/cuda-7.5/lib64/libcudart.so.7.5
#14 0x00007fffe9040ed1 in ?? () from /usr/local/cuda-7.5/lib64/libcudart.so.7.5
#15 0x00007fffe903445e in ?? () from /usr/local/cuda-7.5/lib64/libcudart.so.7.5
#16 0x00007fffe9028f88 in ?? () from /usr/local/cuda-7.5/lib64/libcudart.so.7.5
#17 0x00007fffe9053cac in cudaMalloc ()
from /usr/local/cuda-7.5/lib64/libcudart.so.7.5
#18 0x00007fffeae691a5 in mxnet::storage::GPUDeviceStorage::Alloc (size=2400)
at src/storage/./gpu_device_storage.h:39
#19 0x00007fffeae6b640 in mxnet::storage::PooledStorageManager<mxnet::storage::GPUDeviceStorage, 4294967296ul>::Alloc (this=0x15db0a0, size=2400)
at src/storage/./pooled_storage_manager.h:57
#20 0x00007fffeae687ba in mxnet::StorageImpl::Alloc (this=0xfa65c0, size=2400,
ctx=...) at src/storage/storage.cc:83
#21 0x00007fffea565f4d in mxnet::NDArray::Chunk::CheckAndAlloc (this=0x14f9ab0)
at include/mxnet/./ndarray.h:331
#22 0x00007fffea565ef9 in mxnet::NDArray::Chunk::Chunk (this=0x14f9ab0,
size=600, ctx=..., delay_alloc_=false, dtype=0)
at include/mxnet/./ndarray.h:326
#23 0x00007fffea579c11 in _gnu_cxx::new_allocatormxnet::NDArray::Chunk::construct<mxnet::NDArray::Chunk, unsigned long, mxnet::Context&, bool&, int&>(mxnet::NDArray::Chunk, unsigned long&&, mxnet::Context&, bool&, int&) (
this=0x7fffffffb8c7, _p=0x14f9ab0)
at /usr/include/c++/5/ext/new_allocator.h:120
#24 0x00007fffea5790d6 in std::allocator_traitsstd::allocator<mxnet::NDArray::Chunk >::construct<mxnet::NDArray::Chunk, unsigned long, mxnet::Context&, bool&, int&>(std::allocatormxnet::NDArray::Chunk&, mxnet::NDArray::Chunk
, unsigned long&&, mxnet::Context&, bool&, int&) (__a=..., __p=0x14f9ab0)
at /usr/include/c++/5/bits/alloc_traits.h:530
#25 0x00007fffea577895 in std::_Sp_counted_ptr_inplace<mxnet::NDArray::Chunk, std::allocatormxnet::NDArray::Chunk, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<unsigned long, mxnet::Context&, bool&, int&>(std::allocatormxnet::NDArray::Chunk, unsigned long&&, mxnet::Context&, bool&, int&) (this=0x14f9aa0,
__a=...) at /usr/include/c++/5/bits/shared_ptr_base.h:522
---Type to continue, or q to quit---
#26 0x00007fffea575df0 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<mxnet::NDArray::Chunk, std::allocatormxnet::NDArray::Chunk, unsigned long, mxnet::Context&, bool&, int&>(std::Sp_make_shared_tag, mxnet::NDArray::Chunk, std::allocatormxnet::NDArray::Chunk const&, unsigned long&&, mxnet::Context&, bool&, int&) (this=0xfe7e78, __a=...)
at /usr/include/c++/5/bits/shared_ptr_base.h:617
#27 0x00007fffea573a81 in std::__shared_ptr<mxnet::NDArray::Chunk, (__gnu_cxx::_Lock_policy)2>::__shared_ptrstd::allocator<mxnet::NDArray::Chunk, unsigned long, mxnet::Context&, bool&, int&>(std::_Sp_make_shared_tag, std::allocatormxnet::NDArray::Chunk const&, unsigned long&&, mxnet::Context&, bool&, int&) (
this=0xfe7e70, __tag=..., __a=...)
at /usr/include/c++/5/bits/shared_ptr_base.h:1097
#28 0x00007fffea5714ac in std::shared_ptrmxnet::NDArray::Chunk::shared_ptrstd::allocator<mxnet::NDArray::Chunk, unsigned long, mxnet::Context&, bool&, int&>(std::_Sp_make_shared_tag, std::allocatormxnet::NDArray::Chunk const&, unsigned long&&, mxnet::Context&, bool&, int&) (this=0xfe7e70, __tag=..., __a=...)
at /usr/include/c++/5/bits/shared_ptr.h:319
#29 0x00007fffea56e774 in std::allocate_shared<mxnet::NDArray::Chunk, std::allocatormxnet::NDArray::Chunk, unsigned long, mxnet::Context&, bool&, int&>(std::allocatormxnet::NDArray::Chunk const&, unsigned long&&, mxnet::Context&, bool&, int&) (_a=...) at /usr/include/c++/5/bits/shared_ptr.h:620
#30 0x00007fffea569676 in std::make_shared<mxnet::NDArray::Chunk, unsigned long, mxnet::Context&, bool&, int&>(unsigned long&&, mxnet::Context&, bool&, int&)
() at /usr/include/c++/5/bits/shared_ptr.h:636
#31 0x00007fffea5657d0 in mxnet::NDArray::NDArray (this=0xfe7e70, shape=...,
ctx=..., delay_alloc=false, dtype=0) at include/mxnet/./ndarray.h:45
#32 0x00007fffeae76f89 in MXNDArrayCreateEx (shape=0x7fffc04255e8, ndim=2,
dev_type=2, dev_id=0, delay_alloc=0, dtype=0, out=0x7fffee75c230)
at src/c_api/c_api.cc:148
#33 0x00007ffff5b1d380 in ffi_call_unix64 ()
at -------src-dir-------/Python-3.5.1/Modules/ctypes/libffi/src/x86/unix64.S:76
#34 0x00007ffff5b1cb25 in ffi_call (cif=,
fn=0x7fffeae76ee0 <MXNDArrayCreateEx(mx_uint const
, mx_uint, int, int, int, int, NDArrayHandle
)>, rvalue=, avalue=0x7fffffffbd60)
at -------src-dir-------/Python-3.5.1/Modules/ctypes/libffi/src/x86/ffi64.c:525
#35 0x00007ffff5b145ec in call_function_pointer (argcount=7,
resmem=0x7fffffffbdb0, restype=, atypes=,
avalues=0x7fffffffbd60,
pProc=0x7fffeae76ee0 <MXNDArrayCreateEx(mx_uint const
, mx_uint, int, int, int, int, NDArrayHandle
)>, flags=4353)
at -------src-dir-------/Python-3.5.1/Modules/_ctypes/callproc.c:811
#36 ctypes_callproc (
pProc=0x7fffeae76ee0 <MXNDArrayCreateEx(mx_uint const
, mx_uint, int, int, int, int, NDArrayHandle*)>, argtuple=0x7fffffffbf20, flags=4353,
---Type to continue, or q to quit---
argtypes=, restype=0x722408, checker=0x0)
at -------src-dir-------/Python-3.5.1/Modules/_ctypes/callproc.c:1149
#37 0x00007ffff5b0cc53 in PyCFuncPtr_call (self=,
inargs=, kwds=0x0)
at -------src-dir-------/Python-3.5.1/Modules/_ctypes/_ctypes.c:3869
#38 0x00007ffff794f056 in PyObject_Call (func=0x7fffc54feb38,
arg=, kw=) at Objects/abstract.c:2165
#39 0x00007ffff7a2a0b2 in do_call (nk=, na=7,
pp_stack=0x7fffffffc1c8, func=0x7fffc54feb38) at Python/ceval.c:4887
#40 call_function (oparg=, pp_stack=0x7fffffffc1c8)
at Python/ceval.c:4683
#41 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#42 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=4, kws=0xfa5480, kwcount=0, defs=0x7ffff5dc9f40, defcount=1,
kwdefs=0x0, closure=0x0, name=0x7fffee75a108, qualname=0x7fffee75a108)
at Python/ceval.c:3966
#43 0x00007ffff7a2a8b4 in fast_function (nk=, na=4,
n=, pp_stack=0x7fffffffc3e8, func=0x7fffee774ae8)
at Python/ceval.c:4764
#44 call_function (oparg=, pp_stack=0x7fffffffc3e8)
at Python/ceval.c:4681
#45 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#46 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=3, kws=0x7fffc5002200, kwcount=0, defs=0x7fffee756620,
defcount=2, kwdefs=0x0, closure=0x0, name=0x7ffff7e97538,
qualname=0x7ffff7e97538) at Python/ceval.c:3966
#47 0x00007ffff7a2a8b4 in fast_function (nk=, na=3,
n=, pp_stack=0x7fffffffc608, func=0x7fffee772378)
at Python/ceval.c:4764
#48 call_function (oparg=, pp_stack=0x7fffffffc608)
at Python/ceval.c:4681
#49 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#50 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=2, kws=0x29e3ab8, kwcount=1, defs=0x7fffee7794e0, defcount=2,
kwdefs=0x0, closure=0x0, name=0x7ffff5d32ce0, qualname=0x7ffff5d32ce0)
at Python/ceval.c:3966
#51 0x00007ffff7a2a8b4 in fast_function (nk=, na=2,
n=, pp_stack=0x7fffffffc828, func=0x7fffee772950)
at Python/ceval.c:4764
#52 call_function (oparg=, pp_stack=0x7fffffffc828)
---Type to continue, or q to quit---
at Python/ceval.c:4681
#53 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#54 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=4, kws=0x15250f0, kwcount=3, defs=0x7fffca59ad18, defcount=5,
kwdefs=0x0, closure=0x0, name=0x7fffca5a0670, qualname=0x7fffca5a0670)
at Python/ceval.c:3966
#55 0x00007ffff7a2a8b4 in fast_function (nk=, na=4,
n=, pp_stack=0x7fffffffca48, func=0x7fffca5919d8)
at Python/ceval.c:4764
#56 call_function (oparg=, pp_stack=0x7fffffffca48)
at Python/ceval.c:4681
#57 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#58 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=7, kws=0x0, kwcount=0, defs=0x7fffca5a24f8, defcount=1,
kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:3966
#59 0x00007ffff7a2c4d8 in PyEval_EvalCodeEx (_co=,
globals=, locals=, args=,
argcount=, kws=, kwcount=0,
defs=0x7fffca5a24f8, defcount=1, kwdefs=0x0, closure=0x0)
at Python/ceval.c:3987
#60 0x00007ffff7981f12 in function_call (func=0x7fffca591ae8,
arg=0x7ffff06adf50, kw=0x0) at Objects/funcobject.c:632
#61 0x00007ffff794f056 in PyObject_Call (func=0x7fffca591ae8,
arg=, kw=) at Objects/abstract.c:2165
#62 0x00007ffff796b57c in method_call (func=0x7fffca591ae8,
arg=0x7ffff06adf50, kw=0x0) at Objects/classobject.c:330
#63 0x00007ffff794f056 in PyObject_Call (func=0x7ffff7f46588,
arg=, kw=) at Objects/abstract.c:2165
#64 0x00007ffff79bfe63 in slot_tp_init (self=0x7ffff7ebdc50,
args=0x7fffc50642e8, kwds=0x0) at Objects/typeobject.c:6274
#65 0x00007ffff79b68af in type_call (type=,
args=0x7fffc50642e8, kwds=0x0) at Objects/typeobject.c:923
#66 0x00007ffff794f056 in PyObject_Call (func=0xca6798, arg=,
kw=) at Objects/abstract.c:2165
#67 0x00007ffff7a2a0b2 in do_call (nk=, na=6,
pp_stack=0x7fffffffcec8, func=0xca6798) at Python/ceval.c:4887
#68 call_function (oparg=, pp_stack=0x7fffffffcec8)
at Python/ceval.c:4683
#69 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#70 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
---Type to continue, or q to quit---
argcount=1, kws=0x7fffc507a360, kwcount=9, defs=0x7fffca59f720,
defcount=3, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
at Python/ceval.c:3966
#71 0x00007ffff7a2c4d8 in PyEval_EvalCodeEx (_co=,
globals=, locals=, args=,
argcount=, kws=, kwcount=9,
defs=0x7fffca59f720, defcount=3, kwdefs=0x0, closure=0x0)
at Python/ceval.c:3987
#72 0x00007ffff7982031 in function_call (func=0x7fffca591d90,
arg=0x7fffc5031d68, kw=0x7fffc0aa8188) at Objects/funcobject.c:632
#73 0x00007ffff794f056 in PyObject_Call (func=0x7fffca591d90,
arg=, kw=) at Objects/abstract.c:2165
#74 0x00007ffff796b57c in method_call (func=0x7fffca591d90,
arg=0x7fffc5031d68, kw=0x7fffc0aa8188) at Objects/classobject.c:330
#75 0x00007ffff794f056 in PyObject_Call (func=0x7ffff7f891c8,
arg=, kw=) at Objects/abstract.c:2165
#76 0x00007ffff79bfe63 in slot_tp_init (self=0x7fffc5018668,
args=0x7ffff7f92048, kwds=0x7fffc0aa8188) at Objects/typeobject.c:6274
#77 0x00007ffff79b68af in type_call (type=,
args=0x7ffff7f92048, kwds=0x7fffc0aa8188) at Objects/typeobject.c:923
#78 0x00007ffff794f056 in PyObject_Call (func=0xca8868, arg=,
kw=) at Objects/abstract.c:2165
#79 0x00007ffff7a2a0b2 in do_call (nk=, na=0,
pp_stack=0x7fffffffd348, func=0xca8868) at Python/ceval.c:4887
#80 call_function (oparg=, pp_stack=0x7fffffffd348)
at Python/ceval.c:4683
#81 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#82 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=7, kws=0x1591498, kwcount=16, defs=0x7fffca50a858, defcount=9,
kwdefs=0x0, closure=0x0, name=0x7fffca4f6618, qualname=0x7fffca4f6618)
at Python/ceval.c:3966
#83 0x00007ffff7a2a8b4 in fast_function (nk=, na=7,
n=, pp_stack=0x7fffffffd568, func=0x7fffc5c4ab70)
at Python/ceval.c:4764
#84 call_function (oparg=, pp_stack=0x7fffffffd568)
at Python/ceval.c:4681
#85 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#86 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=1, kws=0xf95bb0, kwcount=4, defs=0x7fffc54cb660, defcount=10,
kwdefs=0x0, closure=0x0, name=0x7ffff5d2b8f0, qualname=0x7fffca4f9f30)
at Python/ceval.c:3966
#87 0x00007ffff7a2a8b4 in fast_function (nk=, na=1,
---Type to continue, or q to quit---
n=, pp_stack=0x7fffffffd788, func=0x7fffc54f4400)
at Python/ceval.c:4764
#88 call_function (oparg=, pp_stack=0x7fffffffd788)
at Python/ceval.c:4681
#89 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#90 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=1, kws=0x7ffff7f8a9b0, kwcount=0, defs=0x0, defcount=0,
kwdefs=0x0, closure=0x0, name=0x7ffff5db1f80, qualname=0x7ffff5db1f80)
at Python/ceval.c:3966
#91 0x00007ffff7a2a8b4 in fast_function (nk=, na=1,
n=, pp_stack=0x7fffffffd9a8, func=0x7ffff7f28f28)
at Python/ceval.c:4764
#92 call_function (oparg=, pp_stack=0x7fffffffd9a8)
at Python/ceval.c:4681
#93 PyEval_EvalFrameEx (f=, throwflag=)
at Python/ceval.c:3185
#94 0x00007ffff7a2c349 in _PyEval_EvalCodeWithName (_co=,
globals=, locals=, args=,
argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0,
closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:3966
#95 0x00007ffff7a2c4d8 in PyEval_EvalCodeEx (_co=,
globals=, locals=, args=,
argcount=, kws=, kwcount=0, defs=0x0,
defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:3987
#96 0x00007ffff7a2c51b in PyEval_EvalCode (co=,
globals=, locals=) at Python/ceval.c:777
#97 0x00007ffff7a51750 in run_mod (arena=0x6f5b80, flags=0x7fffffffdcf0,
locals=0x7ffff7f40288, globals=0x7ffff7f40288, filename=0x7ffff5dc7070,
mod=0x715700) at Python/pythonrun.c:970
#98 PyRun_FileExFlags (fp=0x6bd820, filename_str=,
start=, globals=0x7ffff7f40288, locals=0x7ffff7f40288,
closeit=, flags=0x7fffffffdcf0) at Python/pythonrun.c:923
#99 0x00007ffff7a52d23 in PyRun_SimpleFileExFlags (fp=0x6bd820,
filename=, closeit=1, flags=0x7fffffffdcf0)
at Python/pythonrun.c:396
#100 0x00007ffff7a6de97 in run_file (p_cf=0x7fffffffdcf0,
filename=0x6032c0 L"model.py", fp=0x6bd820) at Modules/main.c:318
#101 Py_Main (argc=, argv=) at Modules/main.c:769
#102 0x0000000000400add in main (argc=2, argv=0x7fffffffde68)
at ./Programs/python.c:69
`

@piiswrong
Copy link
Contributor

FeedForward model will be deprecated soon.
We recommend using the module interface now. Try mx.module.BucketingModule

@lightingghost
Copy link
Contributor Author

lightingghost commented Jul 11, 2016

@piiswrong Thank you for helping me out. I have tried the mx.module.BucketingModule, but the same problem happens:

Traceback (most recent call last): File "/home/odin/Documents/cs224d/sent2mat_mx/model.py", line 217, in <module> main() File "/home/odin/Documents/cs224d/sent2mat_mx/model.py", line 214, in main model(config, train_data_iter, eval_data_iter) File "/home/odin/Documents/cs224d/sent2mat_mx/model.py", line 160, in model optimizer=opt) File "/home/odin/local/mxnet/python/mxnet/module/base_module.py", line 355, in fit self.forward_backward(data_batch) File "/home/odin/local/mxnet/python/mxnet/module/base_module.py", line 127, in forward_backward self.forward(data_batch, is_train=True) File "/home/odin/local/mxnet/python/mxnet/module/bucketing_module.py", line 256, in forward data_batch.provide_label) File "/home/odin/local/mxnet/python/mxnet/module/bucketing_module.py", line 209, in switch_bucket force_rebind=False, shared_module=self._curr_module) File "/home/odin/local/mxnet/python/mxnet/module/module.py", line 258, in bind shared_group, logger=self.logger) File "/home/odin/local/mxnet/python/mxnet/module/executor_group.py", line 95, in __init__ self.bind_exec(data_shapes, label_shapes, shared_group) File "/home/odin/local/mxnet/python/mxnet/module/executor_group.py", line 123, in bind_exec self.execs.append(self._bind_ith_exec(i, data_shapes, label_shapes, shared_group)) File "/home/odin/local/mxnet/python/mxnet/module/executor_group.py", line 406, in _bind_ith_exec assert aux_shapes[j] == arr.shape IndexError: list index out of range

I think the problems happens in executor_group.py line 402-408

        if shared_exec is None:

            aux_arrays = [nd.zeros(s, context, dtype=t) for s, t in zip(aux_shapes, aux_types)]
        else:
            for j, arr in enumerate(shared_exec.aux_arrays):
                assert aux_shapes[j] == arr.shape
                assert aux_types[j] == arr.dtype
            aux_arrays = shared_exec.aux_arrays[:]

which should be changed to

        if shared_exec is None:
            aux_arrays = [nd.zeros(s, context, dtype=t) for s, t in zip(aux_shapes, aux_types)]
        else:
            aux_arrays = [shared_exec.aux_dict[name] for name in aux_names]
            for j, arr in enumerate(aux_arrays):
                assert aux_shapes[j] == arr.shape
                assert aux_types[j] == arr.dtype

@piiswrong
Copy link
Contributor

your symbols for different buckets need to have the same number of arguments and auxiliary variables. You can verify it with sym.list_auxiliary_states()

@lightingghost
Copy link
Contributor Author

Thanks for letting me know that. But what I want to do is to add batch normalization to lstm, then the auxiliary states of mx.sym.BatchNorm is increasing with sequence length. Is there any way to overcome that limitation?

@piiswrong
Copy link
Contributor

You can set them to the same variable

@lightingghost
Copy link
Contributor Author

Thanks, but how can I do that? It seem the BatchNorm doest not have an option for me to set gamma or beta.

@piiswrong
Copy link
Contributor

piiswrong commented Jul 11, 2016

moving_var=varialbe('moving_var')

@lightingghost
Copy link
Contributor Author

Thanks. I will try that

@lightingghost
Copy link
Contributor Author

I have tried using setting gamma and beta, but it seems that the auxiliary variable of moving_mean and moving_var cannot be set in this way, is there a way to set those?

Thanks

@lightingghost
Copy link
Contributor Author

And I also notice that the training results of mx.module.BucketingModule and FeedForward are not the same. Using the same parameters, FeedForward can give a decreasing loss, while BucketingModule cannot, it seems the loss doesn't decrease for BucketingModule

@piiswrong
Copy link
Contributor

looks like you have to bind them to the same ndarray with sym.bind

@szha
Copy link
Member

szha commented Sep 28, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

@szha szha closed this as completed Sep 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants