Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PIR API adaptor No.190、191】 Migrate paddle.scatter,paddle.scatter_nd_add into pir #58548

Merged
merged 14 commits into from Nov 7, 2023

Conversation

enkilee
Copy link
Contributor

@enkilee enkilee commented Nov 1, 2023

PR types

Others

PR changes

APIs

Description

PIR API 推全升级

paddle.scatter 迁移升级至 pir,并更新单测 单测覆盖率:26/27
paddle.scatter_nd_add 迁移升级至 pir,并更新单测 单测覆盖率:11/11

scatter的static_test报错:显存不足

2023-11-03 17:31:19 ----------------------------------------------------------------------
2023-11-03 17:31:19 Traceback (most recent call last):
2023-11-03 17:31:19   File "/paddle/build/python/paddle/pir_utils.py", line 119, in impl
2023-11-03 17:31:19     func(*args, **kwargs)
2023-11-03 17:31:19   File "/mnt/paddle/build/test/legacy_test/test_scatter_op.py", line 711, in test_static_graph
2023-11-03 17:31:19     res = gpu_exe.run(
2023-11-03 17:31:19   File "/paddle/build/python/paddle/base/executor.py", line 1637, in run
2023-11-03 17:31:19     res = self._run_pir_impl(
2023-11-03 17:31:19   File "/paddle/build/python/paddle/base/executor.py", line 1950, in _run_pir_impl
2023-11-03 17:31:19     self._pir_feed_data(program, feed, scope)
2023-11-03 17:31:19   File "/paddle/build/python/paddle/base/executor.py", line 1269, in _pir_feed_data
2023-11-03 17:31:19     cur_feed = _as_lodtensor(cur_feed, self.place, var_type)
2023-11-03 17:31:19   File "/paddle/build/python/paddle/base/executor.py", line 721, in _as_lodtensor
2023-11-03 17:31:19     tensor.set(data, place)
2023-11-03 17:31:19 MemoryError: 
2023-11-03 17:31:19 
2023-11-03 17:31:19 --------------------------------------
2023-11-03 17:31:19 C++ Traceback (most recent call last):
2023-11-03 17:31:19 --------------------------------------
2023-11-03 17:31:19 0   float* phi::DenseTensor::mutable_data<float>(phi::Place const&, unsigned long)
2023-11-03 17:31:19 1   phi::DenseTensor::mutable_data(phi::Place const&, phi::DataType, unsigned long)
2023-11-03 17:31:19 2   phi::memory_utils::AllocShared(phi::Place const&, unsigned long)
2023-11-03 17:31:19 3   paddle::memory::AllocShared(phi::Place const&, unsigned long)
2023-11-03 17:31:19 4   paddle::memory::allocation::AllocatorFacade::AllocShared(phi::Place const&, unsigned long)
2023-11-03 17:31:19 5   paddle::memory::allocation::AllocatorFacade::Alloc(phi::Place const&, unsigned long)
2023-11-03 17:31:19 6   paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
2023-11-03 17:31:19 7   paddle::memory::allocation::Allocator::Allocate(unsigned long)
2023-11-03 17:31:19 8   paddle::memory::allocation::Allocator::Allocate(unsigned long)
2023-11-03 17:31:19 9   paddle::memory::allocation::Allocator::Allocate(unsigned long)
2023-11-03 17:31:19 10  paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)
2023-11-03 17:31:19 11  std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
2023-11-03 17:31:19 12  phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
2023-11-03 17:31:19 
2023-11-03 17:31:19 ----------------------
2023-11-03 17:31:19 Error Message Summary:
2023-11-03 17:31:19 ----------------------
2023-11-03 17:31:19 ResourceExhaustedError: 
2023-11-03 17:31:19 
2023-11-03 17:31:19 Out of memory error on GPU 0. Cannot allocate 10.260804GB memory on GPU 0, 11.500977GB memory has been allocated and available memory is only 4.271362GB.
2023-11-03 17:31:19 
2023-11-03 17:31:19 Please check whether there is any other process using GPU 0.
2023-11-03 17:31:19 1. If yes, please stop them, or start PaddlePaddle on another GPU.
2023-11-03 17:31:19 2. If no, please decrease the batch size of your model. 
2023-11-03 17:31:19  (at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:86)

@paddle-bot paddle-bot bot added the contributor External developers label Nov 1, 2023
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Nov 3, 2023
@YuanRisheng
Copy link
Contributor

这个报错单测先取消在pir模式下执行,Paddle同学看一下这个问题

@MarioLulab
Copy link
Contributor

@enkilee 这个报错单测先取消在pir模式下执行吧,然后这个 pr 先合入。Paddle 同学解决一下 ci oom 的问题~

@enkilee
Copy link
Contributor Author

enkilee commented Nov 7, 2023

收到

Copy link
Contributor

@MarioLulab MarioLulab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Aurelius84 Aurelius84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@luotao1 luotao1 merged commit 77e4ada into PaddlePaddle:develop Nov 7, 2023
28 checks passed
zeroRains pushed a commit to zeroRains/Paddle that referenced this pull request Nov 8, 2023
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
SecretXV pushed a commit to SecretXV/Paddle that referenced this pull request Nov 28, 2023
@enkilee enkilee deleted the pir-api-190-191 branch December 14, 2023 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers HappyOpenSource 快乐开源活动issue与PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants