Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom_Device的模型权重永远保存在CPU上 #47786

Closed
engineer1109 opened this issue Nov 9, 2022 · 8 comments
Closed

Custom_Device的模型权重永远保存在CPU上 #47786

engineer1109 opened this issue Nov 9, 2022 · 8 comments
Assignees
Labels
PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭 type/bug-report 报bug

Comments

@engineer1109
Copy link
Contributor

bug描述 Describe the Bug

使用Custom_Device 导入任意带有权重的模型

使用export GLOG_v=6 运行控制台

I1109 10:53:16.095129 951421 naive_executor.cc:58] 140736751935488 run Op(matmul_v2), inputs:{X[linear_68.tmp_0:float[3, 512, 312]({})(Place(OpenCL:0))], Y[linear_51.w_0:float[312, 2]({})(Place(cpu))]}, outputs:{Out[linear_68.tmp_1:float[3, 512, 312]({})(Place(OpenCL:0))]}. on scope 0x555558b77cb0
I1109 10:53:16.095139 951421 operator.cc:212] Place(OpenCL:0) Op(matmul_v2), inputs:{X[linear_68.tmp_0:float[3, 512, 312]({})(Place(OpenCL:0))], Y[linear_51.w_0:float[312, 2]({})(Place(cpu))]}, outputs:{Out[linear_68.tmp_1:float[3, 512, 312]({})(Place(OpenCL:0))]}.
I1109 10:53:16.095149 951421 device_context.cc:105] DeviceContextPool Get: Place(OpenCL:0)
I1109 10:53:16.095152 951421 device_context.cc:105] DeviceContextPool Get: Place(OpenCL:0)
I1109 10:53:16.095166 951421 operator.cc:2299] Transform Variable linear_51.w_0 from {data_type[float]; data_layout[NCHW]; place[Place(cpu)]; library_type[PLAIN]} to {data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(OpenCL:0)]; library_type[PLAIN]}
I1109 10:53:16.095172 951421 scope.cc:199] Create variable linear_51.w_0
I1109 10:53:16.095175 951421 data_device_transform.cc:22] DeviceTransform in, src_place Place(cpu) dst_place: Place(OpenCL:0)
I1109 10:53:16.095180 951421 device_context.cc:105] DeviceContextPool Get: Place(cpu)
I1109 10:53:16.095181 951421 device_context.cc:105] DeviceContextPool Get: Place(OpenCL:0)
I1109 10:53:16.095183 951421 tensor_util.cc:468] TensorCopySync 312, 2 from Place(cpu) to Place(OpenCL:0)
I1109 10:53:16.095186 951421 allocator_facade.cc:343] GetAllocator Place(OpenCL:0) 2496
I1109 10:53:16.095189 951421 tensor_util.cc:481] src:0x7fff6b3ea040, dst:0x8648
I1109 10:53:16.095192 951421 memcpy.cc:70] memory::Copy 2496 Bytes from Place(cpu) to Place(OpenCL:0), stream=0
I1109 10:53:16.095317 951421 device_context.cc:105] DeviceContextPool Get: Place(OpenCL:0)
memory_copy_h2d Device0x8648/ Host0x7fff6b3ea040Size2496
memcpy_h2d 0.023ms

LOG中在运行一个matmul_v2的OpenCL算子,
X[linear_68.tmp_0:float3, 512, 312(Place(OpenCL:0)) 是上一个模型的输入
Y[linear_51.w_0:float312, 2(Place(cpu)) 是模型的权重

无论运行多少次,权重一直存在CPU上,每次都要对设备拷贝进行转换

memory_copy_h2d Device0x8648/ Host0x7fff6b3ea040Size2496
memcpy_h2d 0.023ms

如何避免转换的过程,让权重存在OpenCL的内存上

其他补充信息 Additional Supplementary Information

No response

@paddle-bot
Copy link

paddle-bot bot commented Nov 9, 2022

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@paddle-bot paddle-bot bot added the PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc label Nov 9, 2022
@engineer1109
Copy link
Contributor Author

    paddle_infer::Config config;
    config.SetModel(m_pdmodelPath,
                     m_pdiparamsPath);

    config.EnableCustomDevice("OpenCL", 0);
    config.EnableMemoryOptim();


    config.SwitchUseFeedFetchOps(0);

    return paddle_infer::CreatePredictor(config);

config设置

@engineer1109
Copy link
Contributor Author

I1109 11:08:43.436408 955834 operator.cc:279] Place(cpu) Op(load_combine), inputs:{}, outputs:{Out[embedding_4.w_0:float[40000, 312]({})(Place(cpu)), embedding_5.w_0:float[2048, 312]({})(Place(cpu)), embedding_6.w_0:float[4, 312]({})(Place(cpu)), embedding_7.w_0:float[16, 312]({})(Place(cpu)), layer_norm_10.b_0:float[312]({})(Place(cpu)), layer_norm_10.w_0:float[312]({})(Place(cpu)), layer_norm_11.b_0:float[312]({})(Place(cpu)), layer_norm_11.w_0:float[312]({})(Place(cpu)), layer_norm_12.b_0:float[312]({})(Place(cpu)), layer_norm_12.w_0:float[312]({})(Place(cpu)), layer_norm_13.b_0:float[312]({})(Place(cpu)), layer_norm_13.w_0:float[312]({})(Place(cpu)), layer_norm_14.b_0:float[312]({})(Place(cpu)), layer_norm_14.w_0:float[312]({})(Place(cpu)), layer_norm_15.b_0:float[312]({})(Place(cpu)), layer_norm_15.w_0:float[312]({})(Place(cpu)), layer_norm_16.b_0:float[312]({})(Place(cpu)), layer_norm_16.w_0:float[312]({})(Place(cpu)), layer_norm_17.b_0:float[312]({})(Place(cpu)), layer_norm_17.w_0:float[312]({})(Place(cpu)), layer_norm_9.b_0:float[312]({})(Place(cpu)), layer_norm_9.w_0:float[312]({})(Place(cpu)), linear_26.b_0:float[234]({})(Place(cpu)), linear_26.w_0:float[312, 234]({})(Place(cpu)), linear_27.b_0:float[234]({})(Place(cpu)), linear_27.w_0:float[312, 234]({})(Place(cpu)), linear_28.b_0:float[234]({})(Place(cpu)), linear_28.w_0:float[312, 234]({})(Place(cpu)), linear_29.b_0:float[312]({})(Place(cpu)), linear_29.w_0:float[234, 312]({})(Place(cpu)), linear_30.b_0:float[936]({})(Place(cpu)), linear_30.w_0:float[312, 936]({})(Place(cpu)), linear_31.b_0:float[312]({})(Place(cpu)), linear_31.w_0:float[936, 312]({})(Place(cpu)), linear_32.b_0:float[234]({})(Place(cpu)), linear_32.w_0:float[312, 234]({})(Place(cpu)), linear_33.b_0:float[234]({})(Place(cpu)), linear_33.w_0:float[312, 234]({})(Place(cpu)), linear_34.b_0:float[234]({})(Place(cpu)), linear_34.w_0:float[312, 234]({})(Place(cpu)), linear_35.b_0:float[312]({})(Place(cpu)), linear_35.w_0:float[234, 312]({})(Place(cpu)), linear_36.b_0:float[936]({})(Place(cpu)), linear_36.w_0:float[312, 936]({})(Place(cpu)), linear_37.b_0:float[312]({})(Place(cpu)), linear_37.w_0:float[936, 312]({})(Place(cpu)), linear_38.b_0:float[234]({})(Place(cpu)), linear_38.w_0:float[312, 234]({})(Place(cpu)), linear_39.b_0:float[234]({})(Place(cpu)), linear_39.w_0:float[312, 234]({})(Place(cpu)), linear_40.b_0:float[234]({})(Place(cpu)), linear_40.w_0:float[312, 234]({})(Place(cpu)), linear_41.b_0:float[312]({})(Place(cpu)), linear_41.w_0:float[234, 312]({})(Place(cpu)), linear_42.b_0:float[936]({})(Place(cpu)), linear_42.w_0:float[312, 936]({})(Place(cpu)), linear_43.b_0:float[312]({})(Place(cpu)), linear_43.w_0:float[936, 312]({})(Place(cpu)), linear_44.b_0:float[234]({})(Place(cpu)), linear_44.w_0:float[312, 234]({})(Place(cpu)), linear_45.b_0:float[234]({})(Place(cpu)), linear_45.w_0:float[312, 234]({})(Place(cpu)), linear_46.b_0:float[234]({})(Place(cpu)), linear_46.w_0:float[312, 234]({})(Place(cpu)), linear_47.b_0:float[312]({})(Place(cpu)), linear_47.w_0:float[234, 312]({})(Place(cpu)), linear_48.b_0:float[936]({})(Place(cpu)), linear_48.w_0:float[312, 936]({})(Place(cpu)), linear_49.b_0:float[312]({})(Place(cpu)), linear_49.w_0:float[936, 312]({})(Place(cpu)), linear_51.b_0:float[2]({})(Place(cpu)), linear_51.w_0:float[312, 2]({})(Place(cpu))]}.

所有模型权重都在CPU上

@engineer1109
Copy link
Contributor Author

I1109 11:08:43.436393 955834 load_combine_op.h:81] loading tensor: linear_51.w_0
I1109 11:08:43.436394 955834 allocator_facade.cc:343] GetAllocator Place(cpu) 2496
I1109 11:08:43.436408 955834 operator.cc:279] Place(cpu) Op(load_combine),

@jiweibo
Copy link
Contributor

jiweibo commented Nov 9, 2022

您好,可以参考ir_params_sync_among_pass,将host权重挪到device中。
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/analysis/passes/ir_params_sync_among_devices_pass.cc#L165

@engineer1109
Copy link
Contributor Author

@jiweibo 看起来只有 CopyParamsToGpu CopyParamsToNpu , 自定义设备需要自己提PR是吗?

@engineer1109
Copy link
Contributor Author

@jiweibo #48221 PR Fix

@luotao1 luotao1 assigned qingqing01 and unassigned pangyoki Nov 23, 2022
@engineer1109
Copy link
Contributor Author

b6aa9f5 已经完成修复。现在Custom Device也有了自己的IR Pass处理,能极大减少IO性能消耗,提升推理速度。

@paddle-bot paddle-bot bot added status/close 已关闭 and removed status/following-up 跟进中 labels Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭 type/bug-report 报bug
Projects
None yet
Development

No branches or pull requests

4 participants