Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

动态图预训练模型加载出错 #616

Closed
merlinarer opened this issue Nov 25, 2020 · 1 comment
Closed

动态图预训练模型加载出错 #616

merlinarer opened this issue Nov 25, 2020 · 1 comment

Comments

@merlinarer
Copy link

模型:fcn + hrnet
配置文件与remote里相同,但是预训练模型加载失败
配置:Ubuntu16.04,cuda10.1
报错如下:
`/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py:43: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType). Implicit conversion to integers using int is deprecated, and may be removed in a future version of Python.
self.trace(type, inputs, outputs, attrs,
W1125 02:26:38.781530 22128 device_context.cc:338] Please NOTE: device: 1, CUDA Capability: 75, Driver API Version: 10.2, Runtime API Version: 10.2
W1125 02:26:38.781566 22128 device_context.cc:346] device: 1, cuDNN Version: 7.6.
/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py:43: DeprecationWarning: an integer is required (got type paddle.fluid.core_avx.VarType). Implicit conversion to integers using int is deprecated, and may be removed in a future version of Python.
self.trace(type, inputs, outputs, attrs,
W1125 02:26:38.862706 22127 device_context.cc:338] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.2, Runtime API Version: 10.2
W1125 02:26:38.862746 22127 device_context.cc:346] device: 0, cuDNN Version: 7.6.
2020-11-25 02:26:48 [INFO] Loading pretrained model from ../pretrained_model/hrnet_w48_ssld/
Traceback (most recent call last):
File "train.py", line 141, in
main(args)
File "train.py", line 124, in main
cfg.model,
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/cvlibs/config.py", line 230, in model
self._model = self._load_object(model_cfg)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/cvlibs/config.py", line 270, in _load_object
params[key] = self._load_object(val)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/cvlibs/config.py", line 279, in _load_object
return component(**params)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/models/backbones/hrnet.py", line 768, in HRNet_W48
model = HRNet(
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/models/backbones/hrnet.py", line 154, in init
self.init_weight()
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/models/backbones/hrnet.py", line 196, in init_weight
utils.load_pretrained_model(self, self.pretrained)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/utils/utils.py", line 68, in load_pretrained_model
para_state_dict = paddle.load(pretrained_model)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/framework/io.py", line 377, in load
load_result = _load_state_dict_from_save_params(model_path)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/framework/io.py", line 103, in _load_state_dict_from_save_params
_dygraph_tracer().trace_op(
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py", line 43, in trace_op
self.trace(type, inputs, outputs, attrs,
paddle.fluid.core_avx.EnforceNotMet:


C++ Traceback (most recent call last):

0 paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, paddle::platform::Place const&, bool)
1 paddle::imperative::PreparedOp::Run(paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap const&)
2 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, signed char>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
3 paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
4 paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>::LoadLodTensor(std::istream&, paddle::platform::Place const&, paddle::framework::Variable*, paddle::framework::ExecutionContext const&) const
5 paddle::framework::DeserializeFromStream(std::istream&, paddle::framework::LoDTensor*, paddle::platform::DeviceContext const&)
6 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
7 paddle::platform::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

InvalidArgumentError: Tensor version 1904018048 is not supported, only version 0 is supported.
[Hint: Expected version == 0U, but received version:1904018048 != 0U:0.] (at /paddle/paddle/fluid/framework/lod_tensor.cc:311)
[operator < load > error]
Traceback (most recent call last):
File "train.py", line 141, in
main(args)
File "train.py", line 124, in main
cfg.model,
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/cvlibs/config.py", line 230, in model
self._model = self._load_object(model_cfg)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/cvlibs/config.py", line 270, in _load_object
params[key] = self._load_object(val)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/cvlibs/config.py", line 279, in _load_object
return component(**params)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/models/backbones/hrnet.py", line 768, in HRNet_W48
model = HRNet(
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/models/backbones/hrnet.py", line 154, in init
self.init_weight()
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/models/backbones/hrnet.py", line 196, in init_weight
utils.load_pretrained_model(self, self.pretrained)
File "/home/workspace/merlin/projects/datafountain_remote_sensing_seg/dypaddle/dygraph/paddleseg/utils/utils.py", line 68, in load_pretrained_model
para_state_dict = paddle.load(pretrained_model)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/framework/io.py", line 377, in load
load_result = _load_state_dict_from_save_params(model_path)
File "/root/miniconda3/lib/python3.8/site-packages/paddle/framework/io.py", line 103, in _load_state_dict_from_save_params
_dygraph_tracer().trace_op(
File "/root/miniconda3/lib/python3.8/site-packages/paddle/fluid/dygraph/tracer.py", line 43, in trace_op
self.trace(type, inputs, outputs, attrs,
paddle.fluid.core_avx.EnforceNotMet:


C++ Traceback (most recent call last):

0 paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, paddle::platform::Place const&, bool)
1 paddle::imperative::PreparedOp::Run(paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap const&)
2 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, int>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, signed char>, paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, long> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
3 paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
4 paddle::operators::LoadOpKernel<paddle::platform::CUDADeviceContext, float>::LoadLodTensor(std::istream&, paddle::platform::Place const&, paddle::framework::Variable*, paddle::framework::ExecutionContext const&) const
5 paddle::framework::DeserializeFromStream(std::istream&, paddle::framework::LoDTensor*, paddle::platform::DeviceContext const&)
6 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
7 paddle::platform::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

InvalidArgumentError: Tensor version 1904018048 is not supported, only version 0 is supported.
[Hint: Expected version == 0U, but received version:1904018048 != 0U:0.] (at /paddle/paddle/fluid/framework/lod_tensor.cc:311)
[operator < load > error]
INFO 2020-11-25 02:26:52,710 utils.py:275] terminate all the procs
ERROR 2020-11-25 02:26:52,715 utils.py:443] ABORT!!! Out of all 2 trainers, the trainer process with rank=[0, 1] was aborted. Please check its log.
INFO 2020-11-25 02:26:55,718 utils.py:275] terminate all the procs
`

@merlinarer
Copy link
Author

已解决,预训练模型需要明确到文件本身

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant