Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL][Texture] Improved texture memory planning #15058

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

srkreddy1238
Copy link
Contributor

@srkreddy1238 srkreddy1238 commented Jun 7, 2023

Motivated form the fact that textures can be allocated over a clBuffer object and the size of backing clBuffer can be computed based on hardware image pitch alignment.

This optimizes the overall memory allocation on device and helps greately the models with large memory requirements.

Improvised the graph memory planner to not differentiate buffer and texture storage tokens and reuse them across. The texture pool in OpenCL runtime is rebranded as memory pool that handles allocation for both buffer and image objects.

NDArray to DeviceAPI interface is extended with AllocDataSpaceView and FreeDataSpaceView. These new API's acommodates accessing same physical memory as clBuffer / clImage objects.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jun 7, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

1 similar comment
@tvm-bot
Copy link
Collaborator

tvm-bot commented Jun 7, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@srkreddy1238 srkreddy1238 changed the title [OpenCL][Texture] Improved texture memmory planning and runtime memor… [OpenCL][Texture] Improved texture memory planning Jun 7, 2023
@srkreddy1238 srkreddy1238 requested a review from tqchen June 7, 2023 19:41
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly took a look at this PR. It contains many changes, but no tests. I'd like to see unit tests for these changes.

@srkreddy1238
Copy link
Contributor Author

Quickly took a look at this PR. It contains many changes, but no tests. I'd like to see unit tests for these changes.

I agree. I am working on it.

Appreciate a review on the interface changes (DeviceAPI, NDArray) and over all design aspects mean while.

@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch 2 times, most recently from 613d0f0 to 8351151 Compare June 9, 2023 03:12
@srkreddy1238 srkreddy1238 force-pushed the texture_enhancements branch 6 times, most recently from 59cca16 to ed8b70e Compare June 11, 2023 17:43
@srkreddy1238
Copy link
Contributor Author

@echuraev can you take a look now ?

Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. It was a public holiday. Several comments.

src/runtime/graph_executor/graph_executor.cc Outdated Show resolved Hide resolved
src/runtime/cuda/cuda_device_api.cc Outdated Show resolved Hide resolved
src/runtime/metal/metal_device_api.mm Outdated Show resolved Hide resolved
src/runtime/opencl/memory_pool.cc Outdated Show resolved Hide resolved
src/runtime/opencl/memory_pool.cc Outdated Show resolved Hide resolved
tests/cpp-runtime/opencl/opencl_texture_pool_test.cc Outdated Show resolved Hide resolved
tests/cpp-runtime/opencl/opencl_texture_pool_test.cc Outdated Show resolved Hide resolved
tests/cpp-runtime/opencl/texture_copy_test.cc Outdated Show resolved Hide resolved
tests/cpp-runtime/opencl/texture_copy_test.cc Outdated Show resolved Hide resolved
* \param dtype The type of elements.
* \param mem_scope The memory scope of allocated tensor.
* \return The allocated device pointer.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this set of changes. I feel that maybe we need something different that makes the memory allocator more explicit. Let me elaborate in reply

@tqchen
Copy link
Member

tqchen commented Jun 13, 2023

Thanks @srkreddy1238 . Reading the set of changes, i feel that we need something that is more lifted out from the DeviceAPI since this is getting closer to the two stage concept.

  • Allocator allocates physical Storage object (in this case backed by clBuffer)
  • The storage object then allocates the NDArray(backed by the storage object).

We already have some of similar concept in memory manager, e.g. in relax, but of course the memory manager as it is does not yet reflect the need of supporting things like creating different memory scope. This would help to move some of the complexity in the texture part to a specialized allocator that creates the Storage with related behavior.

The main rationale is to keep the overall DeviceAPI minimum without behavior of things like pooling and move some of the logic to Storage, NDArray separation, that can hopefully also enable direct support in VM backed runtimes like unity. Maybe we can discuss a bit on how to make that part more generalized that can be reused.

@srkreddy1238
Copy link
Contributor Author

Thanks @echuraev and @tqchen for a quick review.

I see MemoryManager can meet the requirements of the pool concept we have here. I will improve the PR by

  • Initializing the MemoryManager (PooledAllocator) from graph runtime (StorageSetup).
  • NDArray->Empty => PooledAllocator->Empty (Assuming Pooled Allocator works for all runtimes or I cam limit this for texture scoped graphs).
  • NDArray->CreateView => DeviceAPI->AllocDataSpaceView : This will take care of various views over NDArray

Another situation I came across multiple is about runtime::GetDataSize(container->dl_tensor)

Can we move this to DeviceAPI ? In case of texture there is device specific attributes (image row pitch) that defines the underlaying memory size and accessing these attributes outside doesn't look to be a great idea.

@srkreddy1238
Copy link
Contributor Author

@tqchen later I realized that the MemoryManager interface is part of relax (unity branch, not mainline). Do I bring bring it into relay and tailor it ? Or Is there any other ideas ?

@tqchen
Copy link
Member

tqchen commented Jun 14, 2023

One thing that I can probably do is do do a bit of refactoring and bring it to runtime folder so we can leverage it in most places, I may need a few weeks to get to this. There is also already one in relay that can be used as getting started. GetDataSize is also a logic that can be specific to the pool.

@srkreddy1238
Copy link
Contributor Author

@tqchen

I enhanced to reuse the existing VM runtime memory manager by graph executor at 31d7de0

Basically,

  • Promoted memory manager from runtime/vm to runtime level
  • Redirected graph executor to use Memory Manager interface for StorageSetup.
  • Corresponding changes in OpenCL Runtime to work with this MemoryManager.

Let me know your opinion on these.

Another recommendation I have is redirecting AllocWorkspace and FreeWorkspace to MemoryManager and drop workspace_pool. workspace_pool is slightly different from pooled_allocator (workspace pool looks for best fit where as pooled allocator look for exact fit) and we can amend this functionality.

…y allocation

Motivated form the fact that textures can be allocated over a clBuffer object and
the size of backing clBuffer can be computed based on hardware image pitch alignment.

This optimizes the overall memory allocation on device and helps greately the models with
large memory requirements.

Improvised the graph memory planner to not differentiate buffer and texture storage
tokens and reuse them across. The texture pool in OpenCL runtime is rebranded as memory pool
that handles allocation for both buffer and image objects.

NDArray to DeviceAPI interface is extended with AllocDataSpaceView and FreeDataSpaceView.
These new API's acommodates accessing same physical memory as clBuffer / clImage objects.

* MemoryPool test cases and lint errors.

* test cases and fallback support.

* bug fix and cpp-runtime tests cases for texture views.

* various cl device info organized

* fix graph plan memory bug and correct the testcase.

* device attribute handling

* Some fallback for texture plan on devices w/o cl_khr_image2d_from_buffer

 * Memory Manager

Move the VM memory manager to the runtime level.
Use this memory manager for graph runtime.

 *  Resolve conflicts for VerifyDataType and Buffer

* review comments
@srkreddy1238
Copy link
Contributor Author

@tqchen @echuraev can you take another look on this ?
I have rebased and improvised based on the new memory manager interface.

Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. Several comments.

src/runtime/memory/memory_manager.cc Show resolved Hide resolved
src/runtime/opencl/opencl_device_api.cc Show resolved Hide resolved
src/runtime/opencl/opencl_device_api.cc Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_device_api.cc Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_device_api.cc Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_device_api.cc Outdated Show resolved Hide resolved
srkreddy1238 and others added 2 commits October 10, 2023 18:07
Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@srkreddy1238
Copy link
Contributor Author

@tqchen & @yongwww can you take a look on this PR ?

@srkreddy1238
Copy link
Contributor Author

@tqchen can you take a look on this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants