Delete methods for sequences inside managers #113

unexploredtest · 2021-01-16T12:45:26Z

This solves #36 partially, the only thing that remains is creating a method that gives the ability delete a given anonymous sequence.

It's my first time making a PR for a C++ project, so it might not be good. I have provided comments for each commit explaining what each does.

unexploredtest · 2021-01-16T13:18:43Z

Why is this PR trying to completely delete and renew single_include/kompute/Kompute.hpp? Just a few lines were added, right?

axsaucedo · 2021-01-16T14:20:16Z

Thank you for the contribution @aliPMPAINT - I think I'll need to dive into it a bit further, as we need to confirm all the subsequent resources are being freed (so there are no memory leaks). It's also important to make sure to consider that some asynchronous sequences may be running, so these should either be awaited or an issue should be raised.

In regards to your question, this is most likely due to the line ending, as the output file would have different line endings if built on windows vs linux (I think it would be good to standardise this with post proxessing / regex).

axsaucedo · 2021-01-16T14:42:54Z

Ok just had a quick look, it seems there is a failing test:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestSequence
[ RUN      ] TestSequence.SequenceDestructorViaManager
[2021-01-16 14:41:45.218] [debug] [Manager.cpp:144] Kompute Manager creating instance
[2021-01-16 14:41:45.218] [debug] [Manager.cpp:169] Kompute Manager adding debug validation layers
[2021-01-16 14:41:45.881] [debug] [Manager.cpp:202] Kompute Manager Instance Created
[2021-01-16 14:41:45.882] [debug] [Manager.cpp:206] Kompute Manager adding debug callbacks
[2021-01-16 14:41:45.883] [debug] [Manager.cpp:229] Kompute Manager creating Device
[2021-01-16 14:41:45.884] [info] [Manager.cpp:255] Using physical device index 0 found GeForce GTX 1650
[2021-01-16 14:41:45.967] [debug] [Manager.cpp:312] Kompute Manager device created
[2021-01-16 14:41:45.968] [debug] [Manager.cpp:326] Kompute Manager compute queue obtained
[2021-01-16 14:41:45.969] [debug] [Manager.cpp:100] Kompute Manager creating Sequence object
[2021-01-16 14:41:45.969] [debug] [Manager.cpp:119] Kompute Manager createManagedSequence with sequenceName: newSequence and queueIndex: 0
[2021-01-16 14:41:45.970] [debug] [Sequence.cpp:17] Kompute Sequence Constructor with existing device & queue
[2021-01-16 14:41:45.970] [debug] [Sequence.cpp:259] Kompute Sequence creating command pool
[2021-01-16 14:41:45.970] [debug] [Sequence.cpp:275] Kompute Sequence Command Pool Created
[2021-01-16 14:41:45.971] [debug] [Sequence.cpp:281] Kompute Sequence creating command buffer
[2021-01-16 14:41:45.971] [debug] [Sequence.cpp:297] Kompute Sequence Command Buffer Created
[2021-01-16 14:41:45.971] [debug] [Manager.cpp:53] Kompute Manager Destructor started
[2021-01-16 14:41:45.972] [debug] [Manager.cpp:63] Kompute Manager explicitly running destructor for managed sequences
[2021-01-16 14:41:45.972] [info] [Manager.cpp:68] Destroying device
[2021-01-16 14:41:45.991] [debug] [Manager.cpp:71] Kompute Manager Destroyed Device
[2021-01-16 14:41:46.049] [debug] [Manager.cpp:93] Kompute Manager Destroyed Instance
C:\Users\axsau\Programming\vulkan\vulkan-compute\test\TestSequence.cpp(40): error: Value of: sq->isInit()
  Actual: true
Expected: false
[2021-01-16 14:41:46.050] [debug] [Sequence.cpp:28] Kompute Sequence Destructor started
[2021-01-16 14:41:46.051] [info] [Sequence.cpp:221] Freeing CommandBuffer
[2021-01-16 14:41:46.052] [debug] [Sequence.cpp:230] Kompute Sequence Freed CommandBuffer
[2021-01-16 14:41:46.052] [info] [Sequence.cpp:234] Destroying CommandPool
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.
[  FAILED  ] TestSequence.SequenceDestructorViaManager (845 ms)
[----------] 1 test from TestSequence (845 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (848 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestSequence.SequenceDestructorViaManager

 1 FAILED TEST

unexploredtest · 2021-01-16T14:57:38Z

I can guess what made the test fail, I have changed the manager's destructor(the first commit), and as the sequence was created outside of the manager's scope, it didn't destruct it thoroughly.

The reason I did so was because before,

        for (const std::pair<std::string, std::shared_ptr<Sequence>>& sqPair :
             this->mManagedSequences) {
            sqPair.second->freeMemoryDestroyGPUResources();
        }

and this->mManagedSequences.clear(); both tried to free the sequences, and thus they were being deleted twice, like this example(the one provided on colab):

===========================================
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:221] Freeing CommandBuffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:234] Destroying CommandPool
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:248] Kompute Sequence clearing operations buffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:221] Freeing CommandBuffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:234] Destroying CommandPool
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:248] Kompute Sequence clearing operations buffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:221] Freeing CommandBuffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:234] Destroying CommandPool
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:248] Kompute Sequence clearing operations buffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:221] Freeing CommandBuffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:234] Destroying CommandPool
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:248] Kompute Sequence clearing operations buffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:221] Freeing CommandBuffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:234] Destroying CommandPool
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:248] Kompute Sequence clearing operations buffer
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:32] Kompute Sequence destructor called but sequence is not initialized so no need to removing GPU resources.
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:32] Kompute Sequence destructor called but sequence is not initialized so no need to removing GPU resources.
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:32] Kompute Sequence destructor called but sequence is not initialized so no need to removing GPU resources.
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:32] Kompute Sequence destructor called but sequence is not initialized so no need to removing GPU resources.
[2020-11-18 08:32:22.196] [info] [Sequence.cpp:32] Kompute Sequence destructor called but sequence is not initialized so no need to removing GPU resources.
[2020-11-18 08:32:22.196] [info] [Manager.cpp:72] Destroying device

If we observe([2020-11-18 08:32:22.196] [info] [Sequence.cpp:32] Kompute Sequence destructor called but sequence is not initialized so no need to removing GPU resources.), sequences were trying to be freed twice, the first commit solves this issue, but I didn't anticipate the fact that being on different scopes can cause problems.

axsaucedo · 2021-01-16T17:01:45Z

Ok I see, I think it would still be important to ensure the destructor is called correctly in the manager to avoid lingering resources, as ultimately the memory in the GPU is managed hierarchically

unexploredtest · 2021-01-16T17:43:29Z

Yeah sure, and sorry for my silly mistake

axsaucedo · 2021-01-16T18:07:31Z

No worries, the memory management hierary is not too complex once you get your head around it, but it has certainly gotten me in some long debugging sessions, using memory profilers does help understand what objects / code components may be causing issues, more detail via #15

alexander-g · 2021-02-05T07:59:57Z

@axsaucedo Bump, this is quite important.
If you are unsure about asynchronous operations, it would already help to make those methods private and to only clean up anonymous sequences in synchronous ops, i.e. in evalOpDefault directly after the sequence is evaluated.

axsaucedo · 2021-02-05T08:10:51Z

@alexander-g hmm that is actually a pretty good idea... ok I agree this is quite important, thank you @aliPMPAINT for the initial work, I will pull the branch, and do some deeper testing, as well as expose via python interface to make sure we can merge. Thanks both for driving forward this areas.

axsaucedo · 2021-02-06T17:19:08Z

@alexander-g currently looking at this, I now remember why we don't delete the function right after the sequence is executed. The reason why this is by design is because the sequence actually acquires the GPU memory ownership of the Tensors when running the OpCreateTensors. More specifically, it is currently in the memory hierarchy for the OpBase to have a choice whether to free the Tensors it owns or not - here is the line that shows how OpCreateTensor owns the tensors it uses by setting the last value to true
https://github.com/EthicalML/vulkan-kompute/blob/df5477a2d76b920232811e8513579240467ad673/src/OpTensorCreate.cpp#L18

Here you can see the line that specifies the ownership of the tensors themselves:
https://github.com/EthicalML/vulkan-kompute/blob/df5477a2d76b920232811e8513579240467ad673/src/include/kompute/operations/OpBase.hpp#L66

This is actually tied to the discussion I provided in your #130 PR (#130 (comment)) where I basically mention that there may be a better way to think about the OpTensorCreate operation, and the hierarchy of the Tensor memory management overall. This is something that would require a deeper dive, primarily as I need to check if it can make sense for the memory ownership to no longer be Manager -> Sequence -> Op -> Tensor, and update it to remove Tensor as a dependency of the operation. This could require a less trivial refactor, as it's not clear what that hierarchy would be otherwise.

This is teh reason why anonymous Sequences are actually kept. This could actually be change such that anonymous sequences are destroyed after execution, which is what we had at the very beginning but as you can imagine this led to tensor memory being destroyed right after a OpTensorCreate in an anonymous function. But this woudl require specific awareness of users, which would have to always run the OpTensorCreate in a non-anonymous function, and then the rest of the operations in an anonymous function.

Does this make sense?

axsaucedo · 2021-02-06T18:32:33Z

For completeness it may be worth sharing another piece of insight to provide the full picture. Namely that when I was initially desigining the tensors, i had an idea where instead of having two tensors - a Staging and a Device tensor - all the logic would be contained as a single Tensor. Namely this single tensor would contain both relevant memories for the staging and host tensors. The reason why this was not pursued at the end is because the idea was to provide further access and granularity into how the memory is made available to the user - this way the user can actually decide to build their own OpTensor operations, which may own different tensors, sometimes perhaps destroying the staging tensor completely to avoid having the memory overhead.

With this in mind, it could be revisited, but there is that disadvantage that for any device tensor, there would be always the overhead of an extra memory component that would be obscured by the user. The advantage of this is that in this case the staging tensor wouldnt' be created directly by the OpCreate operation, which means that the tensors could be owned by the top level manager. For all of these there are various tradeoffs that could be reassessed.

For now, the simplest way to approach this is to enable a flag that alllows for deletion of anonymous sequences as soon as they are executed, which can be used by more advanced users that would know that using the OpCreateTensor would involve memory management, and hence should be executed in a non-anonymous function that should be managed manually (and then deleted explicitly with the deleteNamedSequence.

axsaucedo · 2021-02-06T18:32:54Z

I will update #36 with the discussion from this issue as well

axsaucedo · 2021-02-10T08:47:20Z

Closing in favour of #113

aliPMPAINT added 3 commits January 16, 2021 00:01

Fixed destructor for managed sequences

3a3229d

Added function to destroy named seq

3b1442c

Added a method to destrol all anon seqs

62e875f

axsaucedo self-requested a review January 16, 2021 14:37

axsaucedo added the enhancement New feature or request label Jan 16, 2021

Using the first destuctor as a test fails

7c7c67d

This was referenced Feb 6, 2021

Create delete function in manager to free / destroy sequence #36

Closed

Tensor memory management and memory hierarchy redesign #136

Closed

Added destroy functions for tensors and sequences (named and object) #146

Merged

axsaucedo closed this Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete methods for sequences inside managers #113

Delete methods for sequences inside managers #113

unexploredtest commented Jan 16, 2021 •

edited

Loading

unexploredtest commented Jan 16, 2021 •

edited

Loading

axsaucedo commented Jan 16, 2021

axsaucedo commented Jan 16, 2021

unexploredtest commented Jan 16, 2021 •

edited

Loading

axsaucedo commented Jan 16, 2021

unexploredtest commented Jan 16, 2021

axsaucedo commented Jan 16, 2021

alexander-g commented Feb 5, 2021

axsaucedo commented Feb 5, 2021

axsaucedo commented Feb 6, 2021

axsaucedo commented Feb 6, 2021

axsaucedo commented Feb 6, 2021

axsaucedo commented Feb 10, 2021

Delete methods for sequences inside managers #113

Delete methods for sequences inside managers #113

Conversation

unexploredtest commented Jan 16, 2021 • edited Loading

unexploredtest commented Jan 16, 2021 • edited Loading

axsaucedo commented Jan 16, 2021

axsaucedo commented Jan 16, 2021

unexploredtest commented Jan 16, 2021 • edited Loading

axsaucedo commented Jan 16, 2021

unexploredtest commented Jan 16, 2021

axsaucedo commented Jan 16, 2021

alexander-g commented Feb 5, 2021

axsaucedo commented Feb 5, 2021

axsaucedo commented Feb 6, 2021

axsaucedo commented Feb 6, 2021

axsaucedo commented Feb 6, 2021

axsaucedo commented Feb 10, 2021

unexploredtest commented Jan 16, 2021 •

edited

Loading

unexploredtest commented Jan 16, 2021 •

edited

Loading

unexploredtest commented Jan 16, 2021 •

edited

Loading