Reference hhttps://mxnet.apache.org/api/architecture/exception_handling

In [5]:
import sys
sys.path.append("/workspace/server")
import warnings

from thera.python.mxnet import mxnet as mx
warnings.filterwarnings('ignore')
import os 
import random
import numpy as np
import mxnet as mx
from mxnet import gluon
import gluonnlp as nlp
# https://gluon-nlp.mxnet.io/master/examples/word_embedding/word_embedding.html
import re

# The below example shows how to handle exceptions for iterators. In this example, we populate files for data and labels with fewer number of labels compared to the number of samples. This should throw an exception.

CSVIter uses PrefetcherIter for loading and parsing data. The PrefetcherIter spawns a producer thread in the background which prefetches the data while the main thread consumes the data. The exception is thrown in the spawned producer thread during the prefetching, when the label is not found corresponding to a specific sample.

The exception is transported to the main thread, where it is rethrown when Next is called as part of the following line: for batch in iter(data_train).

In general, Exception may be rethrown as part of Next and BeforeFirst calls which correspond to reset() and next() methods in MXDataIter for Python language bindings.

In [11]:
cwd = os.getcwd()
data_path = os.path.join(cwd, "data.csv")
label_path = os.path.join(cwd, "label.csv")

with open(data_path, "w") as fout:
    for i in range(8):
        fout.write("1,2,3,4,5,6,7,8,9,10\n")

with open(label_path, "w") as fout:
    for i in range(7):
        fout.write("label"+str(i))

try:
    data_train = mx.io.CSVIter(data_csv=data_path, label_csv=label_path, data_shape=(1, 10),
                               batch_size=4)

    for batch in iter(data_train):
        print(data_train.getdata().asnumpy())
except mx.base.MXNetError as ex:
    print("Exception handled")
    print(ex)

Exception handled
[15:42:15] /workspace/server/third_party/mxnet/src/io/iter_csv.cc:137: Check failed: label_parser_->Next() Data CSV's row is smaller than the number of rows in label_csv

Stack trace returned 9 entries:
[bt] (0) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f5da81ccc5b]
[bt] (1) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f5da81cd428]
[bt] (2) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::io::CSVIterTyped<float>::Next()+0x2b5) [0x7f5da8481f65]
[bt] (3) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::io::BatchLoader::Next()+0x90) [0x7f5da847e830]
[bt] (4) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::io::PrefetcherIter::Init(std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,

Exception Handling for Operators

The below example shows how to handle exceptions for operators in the imperative mode.

For the operator case, the dependency engine spawns a number of threads if it is running in the ThreadedEnginePool or ThreadedEnginePerDevice mode. The final operator is executed in one of the spawned threads.

If an operator throws an exception during execution, this exception is propagated down the dependency chain. Once there is a synchronizing call i.e. WaitToRead for a variable in the dependency chain, the propagated exception is rethrown.

In the below example, I illustrate how an exception that occured in the first line is propagated down the dependency chain, and finally is rethrown when we make a synchronizing call to WaitToRead.

In [12]:
a = mx.nd.random.normal(0, 1, (2, 2))
b = mx.nd.random.normal(0, 2, (2, 2))
c = mx.nd.dot(a, b)
d = mx.nd.random.normal(0, -1, (2, 2)) # Standard deviation of the distribution. cant be -1
e = mx.nd.dot(c, d)
e.wait_to_read()

MXNetError: [15:42:50] /workspace/server/third_party/mxnet/src/operator/random/./sample_op.h:400: Check failed: param.scale > 0 (-1 vs. 0) scale parameter in gaussian has to be positive

Stack trace returned 10 entries:
[bt] (0) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f5da81ccc5b]
[bt] (1) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f5da81cd428]
[bt] (2) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(+0x163bab0) [0x7f5da8abbab0]
[bt] (3) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::Sample_<mshadow::cpu, mxnet::op::SampleNormalParam>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x86) [0x7f5da8adb156]
[bt] (4) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x290) [0x7f5da821c220]
[bt] (5) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(+0xd5cd4b) [0x7f5da81dcd4b]
[bt] (6) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x429) [0x7f5da81d1bc9]
[bt] (7) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0xe2) [0x7f5da81d5e32]
[bt] (8) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7f5da81d214a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f5e10603c80]



Although the above exception occurs when executing the operation which writes to the variable d in one of the child threads, it is thrown only when the synchronization happens as part of the line: e.wait_to_read().

Let us take another example. In the following case, we write to two variables and then wait_to_read for both. This example shows that any particular exception will not be thrown more than once.

In [14]:
a = mx.nd.random.normal(0, 1, (2, 2))
b = mx.nd.random.normal(0, -1, (2, 2))
c, d  = mx.nd.dot(a, b)
try:
    c.asnumpy()
except mx.base.MXNetError as ex:
    print("Exception handled")
    print(ex)
d.asnumpy()

Exception handled
[15:44:46] /workspace/server/third_party/mxnet/src/operator/random/./sample_op.h:400: Check failed: param.scale > 0 (-1 vs. 0) scale parameter in gaussian has to be positive

Stack trace returned 10 entries:
[bt] (0) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f5da81ccc5b]
[bt] (1) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f5da81cd428]
[bt] (2) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(+0x163bab0) [0x7f5da8abbab0]
[bt] (3) /workspace/server/third_party/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::Sample_<mshadow::cpu, mxnet::op::SampleNormalParam>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxn

array([1.9254046e-35, 0.0000000e+00], dtype=float32)