-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenCL][Texture] Improved texture memory planning #15058
base: main
Are you sure you want to change the base?
Conversation
1 similar comment
35ca39f
to
fa689e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quickly took a look at this PR. It contains many changes, but no tests. I'd like to see unit tests for these changes.
fa689e5
to
33e854b
Compare
I agree. I am working on it. Appreciate a review on the interface changes (DeviceAPI, NDArray) and over all design aspects mean while. |
613d0f0
to
8351151
Compare
59cca16
to
ed8b70e
Compare
@echuraev can you take a look now ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. It was a public holiday. Several comments.
* \param dtype The type of elements. | ||
* \param mem_scope The memory scope of allocated tensor. | ||
* \return The allocated device pointer. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this set of changes. I feel that maybe we need something different that makes the memory allocator more explicit. Let me elaborate in reply
Thanks @srkreddy1238 . Reading the set of changes, i feel that we need something that is more lifted out from the DeviceAPI since this is getting closer to the two stage concept.
We already have some of similar concept in memory manager, e.g. in relax, but of course the memory manager as it is does not yet reflect the need of supporting things like creating different memory scope. This would help to move some of the complexity in the texture part to a specialized allocator that creates the Storage with related behavior. The main rationale is to keep the overall DeviceAPI minimum without behavior of things like pooling and move some of the logic to Storage, NDArray separation, that can hopefully also enable direct support in VM backed runtimes like unity. Maybe we can discuss a bit on how to make that part more generalized that can be reused. |
Thanks @echuraev and @tqchen for a quick review. I see
Another situation I came across multiple is about Can we move this to DeviceAPI ? In case of texture there is device specific attributes (image row pitch) that defines the underlaying memory size and accessing these attributes outside doesn't look to be a great idea. |
@tqchen later I realized that the |
One thing that I can probably do is do do a bit of refactoring and bring it to runtime folder so we can leverage it in most places, I may need a few weeks to get to this. There is also already one in relay that can be used as getting started. GetDataSize is also a logic that can be specific to the pool. |
ed8b70e
to
31d7de0
Compare
I enhanced to reuse the existing VM runtime memory manager by graph executor at 31d7de0 Basically,
Let me know your opinion on these. Another recommendation I have is redirecting |
31d7de0
to
d3dee67
Compare
788ffe0
to
ae78943
Compare
ae78943
to
537e87f
Compare
…y allocation Motivated form the fact that textures can be allocated over a clBuffer object and the size of backing clBuffer can be computed based on hardware image pitch alignment. This optimizes the overall memory allocation on device and helps greately the models with large memory requirements. Improvised the graph memory planner to not differentiate buffer and texture storage tokens and reuse them across. The texture pool in OpenCL runtime is rebranded as memory pool that handles allocation for both buffer and image objects. NDArray to DeviceAPI interface is extended with AllocDataSpaceView and FreeDataSpaceView. These new API's acommodates accessing same physical memory as clBuffer / clImage objects. * MemoryPool test cases and lint errors. * test cases and fallback support. * bug fix and cpp-runtime tests cases for texture views. * various cl device info organized * fix graph plan memory bug and correct the testcase. * device attribute handling * Some fallback for texture plan on devices w/o cl_khr_image2d_from_buffer * Memory Manager Move the VM memory manager to the runtime level. Use this memory manager for graph runtime. * Resolve conflicts for VerifyDataType and Buffer * review comments
537e87f
to
2b48572
Compare
* Allocators need not aware of scope. They only do plaing memory allocations. * DeviceAPI can handle the scope.
ccc058b
to
24610c8
Compare
43db208
to
6ca6a53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM. Several comments.
Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
@tqchen can you take a look on this PR ? |
Motivated form the fact that textures can be allocated over a clBuffer object and the size of backing clBuffer can be computed based on hardware image pitch alignment.
This optimizes the overall memory allocation on device and helps greately the models with large memory requirements.
Improvised the graph memory planner to not differentiate buffer and texture storage tokens and reuse them across. The texture pool in OpenCL runtime is rebranded as memory pool that handles allocation for both buffer and image objects.
NDArray to DeviceAPI interface is extended with AllocDataSpaceView and FreeDataSpaceView. These new API's acommodates accessing same physical memory as clBuffer / clImage objects.