-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
component: trainingRelates to the SageMaker Training PlatformRelates to the SageMaker Training Platformtype: bug
Description
Describe the bug
I've trained a semantic segmentation model using sagemaker's builtin algorithm (using the default container image).
I'm trying to run a batch transform job to evaluate the model on a 168 jpgs dataset, using a json manifest.
After a few predictions I received an "Algorithm error" with a (native) stacktrace, and the job seems to hang.
To reproduce
- Train a semantic segmentation model (builtin algorithm), with a "resnet-50" backbone and "deeplab" decoder, with default crop\base size (using use_pretrained_model = True)
- Create a json manifest containing s3 prefixes of 168, 960x540, jpgs.
- Run the following code:
estimator = sagemaker.estimator.Estimator.attach(TRAINING_JOB_NAME)
transformer = estimator.transformer(output_path = OUTPUT_PATH, accept = "image/png", instance_count = 1, instance_type = "ml.c5.2xlarge")
transformer.transform(data = manifest_uri, data_type = "ManifestFile", content_type = "image/jpeg", wait = True, logs = True)
Expected behavior
Batch transform should successfully run predictions for all jpgs.
Screenshots or logs
...............................Docker entrypoint called with argument(s): serve
Running default environment configuration script
Running custom environment configuration script
[01/11/2021 19:46:48 INFO 140257709246272] loaded entry point class algorithm.serve.server_config:config_api
[01/11/2021 19:46:48 INFO 140257709246272] loading entry points
[01/11/2021 19:46:48 INFO 140257709246272] font search path ['/opt/amazon/lib/python2.7/site-packages/matplotlib/mpl-data/fonts/ttf', '/opt/amazon/lib/python2.7/site-packages/matplotlib/mpl-data/fonts/afm', '/opt/amazon/lib/python2.7/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts']
[01/11/2021 19:46:48 INFO 140257709246272] generated new fontManager
[01/11/2021 19:46:48 INFO 140257709246272] loaded request iterator application/x-image
[01/11/2021 19:46:48 INFO 140257709246272] loaded request iterator image/png
[01/11/2021 19:46:48 INFO 140257709246272] loaded request iterator image/jpeg
[01/11/2021 19:46:48 INFO 140257709246272] loaded response encoder application/x-recordio-protobuf
[01/11/2021 19:46:48 INFO 140257709246272] loaded response encoder image/png
[01/11/2021 19:46:48 INFO 140257709246272] loaded response encoder application/x-numpy
[01/11/2021 19:46:48 INFO 140257709246272] loaded entry point class algorithm:model
[01/11/2021 19:46:48 INFO 140257709246272] Loaded iterator creator application/x-image for content type ('application/x-image', '1.0')
[01/11/2021 19:46:48 INFO 140257709246272] Loaded iterator creator application/x-recordio for content type ('application/x-recordio', '1.0')
[01/11/2021 19:46:48 INFO 140257709246272] Loaded iterator creator image/png for content type ('image/png', '1.0')
[01/11/2021 19:46:48 INFO 140257709246272] Loaded iterator creator application/json for content type ('application/json', '1.0')
[01/11/2021 19:46:48 INFO 140257709246272] Loaded iterator creator image/jpeg for content type ('image/jpeg', '1.0')
[01/11/2021 19:46:48 WARNING 140257709246272] Requesting context without setting the requested num of gpus. Using 'auto'
[01/11/2021 19:46:49 INFO 140257709246272] nvidia-smi took: 0.0251131057739 secs to identify 0 gpus
[01/11/2021 19:46:49 INFO 140257709246272] Number of GPUs being used: 0
[01/11/2021 19:46:49 INFO 140257709246272] Number of server workers: 2
[2021-01-11 19:46:49 +0000] [1] [INFO] Starting gunicorn 19.7.1
[2021-01-11 19:46:49 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2021-01-11 19:46:49 +0000] [1] [INFO] Using worker: sync
[2021-01-11 19:46:49 +0000] [110] [INFO] Booting worker with pid: 110
[01/11/2021 19:46:49 INFO 140257709246272] loading model...
[01/11/2021 19:46:49 WARNING 140257709246272] Requesting context without setting the requested num of gpus. Using 'auto'
[01/11/2021 19:46:49 INFO 140257709246272] Number of GPUs being used: 0
[01/11/2021 19:46:49 INFO 140257709246272] Using device: 0 for worker PID: 110
[2021-01-11 19:46:49 +0000] [119] [INFO] Booting worker with pid: 119
[01/11/2021 19:46:49 INFO 140257709246272] loading model...
[01/11/2021 19:46:49 WARNING 140257709246272] Requesting context without setting the requested num of gpus. Using 'auto'
[01/11/2021 19:46:49 INFO 140257709246272] Number of GPUs being used: 0
[01/11/2021 19:46:49 INFO 140257709246272] Using device: 0 for worker PID: 119
[01/11/2021 19:46:49 INFO 140257709246272] Loading weights from /opt/ml/model/model_best.params
[01/11/2021 19:46:49 INFO 140257709246272] Loading weights from /opt/ml/model/model_best.params
[01/11/2021 19:46:49 INFO 140257709246272] ...model loaded.
[01/11/2021 19:46:49 WARNING 140257709246272] Requesting context without setting the requested num of gpus. Using 'auto'
[01/11/2021 19:46:49 INFO 140257709246272] Number of GPUs being used: 0
#metrics {"Metrics": {"execution_parameters.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394409.301922, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394409.298721}
[01/11/2021 19:46:49 INFO 140257709246272] ...model loaded.
[19:46:49] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 147456 bytes with malloc directly
[19:46:49] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 147456 bytes with malloc directly
[19:46:50] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 33177600 bytes with malloc directly
[19:46:50] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 33177600 bytes with malloc directly
2021-01-11T19:46:49.305:[sagemaker logs]: MaxConcurrentTransforms=2, MaxPayloadInMB=10, BatchStrategy=MULTI_RECORD
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394418.171682, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394409.309916}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394418.580642, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394409.301981}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394418.171682, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394409.309916}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394418.580642, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394409.301981}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394432.399818, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394425.493936}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394432.399818, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394425.493936}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394433.375697, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394426.266663}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394433.375697, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394426.266663}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394439.309802, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394432.399879}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394439.309802, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394432.399879}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394440.44823, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394433.375759}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394440.44823, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394433.375759}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394441.217746, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394439.309893}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394441.217746, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394439.309893}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394448.126516, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394441.217805}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394448.126516, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394441.217805}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394455.079818, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394448.126577}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394455.079818, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394448.126577}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394461.990609, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394455.079878}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394461.990609, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394455.079878}
[01/11/2021 19:47:48 ERROR 140257709246272] Algorithm Error: [19:47:47] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_convolution.cc:387: Check failed: weight_mem->get_primitive_desc() == fwd->fwd_pd.weights_primitive_desc()
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
(caused by MXNetError)
Caused by: [19:47:47] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_convolution.cc:387: Check failed: weight_mem->get_primitive_desc() == fwd->fwd_pd.weights_primitive_desc()
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
#metrics {"Metrics": {"invocations_error.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394468.170742, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394461.990746}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394468.170806, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394468.170794}
[2021-01-11 19:47:48 +0000] [119] [INFO] Worker exiting (pid: 119)
[01/11/2021 19:47:48 ERROR 140257709246272] Algorithm Error: [19:47:47] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_convolution.cc:387: Check failed: weight_mem->get_primitive_desc() == fwd->fwd_pd.weights_primitive_desc()
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
(caused by MXNetError)
Caused by: [19:47:47] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_convolution.cc:387: Check failed: weight_mem->get_primitive_desc() == fwd->fwd_pd.weights_primitive_desc()
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
#metrics {"Metrics": {"invocations_error.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394468.170742, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394461.990746}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394468.170806, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394468.170794}
[2021-01-11 19:47:48 +0000] [119] [INFO] Worker exiting (pid: 119)
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394468.714679, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394461.651111}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394468.714679, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394461.651111}
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
(caused by MXNetError)
Caused by: [19:47:53] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_convolution.cc:387: Check failed: weight_mem->get_primitive_desc() == fwd->fwd_pd.weights_primitive_desc()
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
#metrics {"Metrics": {"invocations_error.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394474.673656, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394468.714735}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394474.67372, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394474.673707}
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
(caused by MXNetError)
Caused by: [19:47:53] /opt/brazil-pkg-cache/packages/AIAlgorithmsMXNet/AIAlgorithmsMXNet-1.4.x.1856.0/AL2012/generic-flavor/src/src/operator/nn/mkldnn/mkldnn_convolution.cc:387: Check failed: weight_mem->get_primitive_desc() == fwd->fwd_pd.weights_primitive_desc()
Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libmxnet.so(dmlc::StackTrace()+0x189) [0x7f902e466dd9]
[bt] (1) /opt/amazon/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f902e467c4a]
[bt] (2) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForwardFullFeature(mxnet::op::MKLDNNConvFullParam const&, mxnet::OpContext const&, mxnet::op::MKLDNNConvForward*, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x437) [0x7f90300a9297]
[bt] (3) /opt/amazon/lib/libmxnet.so(mxnet::op::MKLDNNConvolutionForward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x496) [0x7f90300aaa56]
[bt] (4) /opt/amazon/lib/libmxnet.so(+0x2965df0) [0x7f902ffe7df0]
[bt] (5) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext)+0xf5) [0x7f902fc04275]
[bt] (6) /opt/amazon/lib/libmxnet.so(+0x24fa8cb) [0x7f902fb7c8cb]
[bt] (7) /opt/amazon/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x36c) [0x7f902fb84e0c]
[bt] (8) /opt/amazon/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xca) [0x7f902fb86e3a]
[bt] (9) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7f902fb8478b]
#metrics {"Metrics": {"invocations_error.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394474.673656, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394468.714735}
#metrics {"Metrics": {"invocations.count": {"count": 1, "max": 1, "sum": 1.0, "min": 1}}, "EndTime": 1610394474.67372, "Dimensions": {"Host": "UNKNOWN", "Operation": "scoring", "Algorithm": "SemanticSegmentationModel"}, "StartTime": 1610394474.673707}
System information
A description of your system. Please provide:
- SageMaker Python SDK version:
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Semantic Segmentation
- Framework version:
- Python version:
- CPU or GPU: CPU
- Custom Docker image (Y/N): No
Additional context
Model container: 811284229777.dkr.ecr.us-east-1.amazonaws.com/semantic-segmentation:1
Metadata
Metadata
Assignees
Labels
component: trainingRelates to the SageMaker Training PlatformRelates to the SageMaker Training Platformtype: bug