Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164

axsaucedo · 2021-02-27T08:10:32Z

Fixes #54
Fixes #12
Fixes: #166
Fixes: #161 (partially)
Fixes: #160

This PR contains a relatively large refactor that amends the memory hierarchies between components, adding the following:

Tensors are now managed by top level manager so can be used across many sequences / operations
Operations no longer manage resources and hence no longer hold vulkan components
Algorithms and operations no longer hold command buffers so they can used across multiple sequences
Algorithms can now be created directly through the manager
Operations can now be created directly through the manager
Manager now manages all tensors, algorithms and sequences
Sequences still manage operations
Shared pointers now have explicit garbage collection as manager uses weak pointers to clean when destroyed but components outside of manager get destroyed automatically when reference counter reaches zero
All commands are now executed directly through sequence
Concept of named / anonymous sequences are no longer existent as these can be created on the fly easily
All sequence commands return a pointer to themselves to enable for functional-like interface
Tensors / Sequences / Algorithsm are always initialised when created, and when they are destroyed can no longer be "rebuilt"

This is what the new interface looks like:

// This tensor GPU resources will be destroyed by the manager when the manager gets out of scope
std::shared_ptr<kp::Tensor> tensorA = nullptr;

{
  kp::Manager mgr;
  
  tensorA = mgr.tensor({0, 1, 2});

  {
    // This tensor will be destroyed when leaving the scope as ref count will reach 0
    // The manager holds a weak ptr so it won't re-destroy it (possible to clean empty pointers with mgr.clear()
    std::shared_ptr<kp::Tensor> tensorB = mgr.tensor({1, 2, 3});

    std::vector<uint32_t> spirv = kp::Shader::compile_source(/* string shader here */);

    kp::Workgroup wg = { 3, 1, 1 };
    kp::Constants specConst = { 5 };
    std::shared_ptr<kp::Algorithm> algorithm =
        mgr.algorithm({tensorA,tensorB}, spirv, wg, specConst);

    std::shared_ptr<kp::OpAlgoDispatch> opAD = { new kp::OpAlgoDispatch(algorithm) };

    // this is the new equivallent of an anonymous sequence (or named sequence)
    // Similar to other resources it gets destroyed when all refcount gets to zero
    // which means that in this case it will get destroyed after evaluating
    mgr.sequence()
        ->record<kp::OpTensorSyncDevice>({tensorA, tensorB})
        ->record(opAD) // possible to create operations outside and record/eval them
        ->record(opAD, [0.1, 0.2, 0.3]) // Injecting new push constant
        ->record(opAD, [2.5, 0.8, 1.9]) // Injecting another push constant
        ->eval()
        ->eval() // possible to record multiple commands and eval multiple times
        ->eval()
        ->eval<kp::OpTensorSyncLocal>({tensorA, tensorB}); // Also possible to eval (rerecord and eval) in single command
      // Sequence gets destroyed as not stored in heap variable

  } // TensorB, algorithm destroyed

} // TensorA gets destroyed by manager, tensorB already freed so manager doesnt destroy from weakptr

// TensorA heap object gets destroyed but does not free any gpu resources

@aliPMPAINT @alexander-g @SimLeek would be great to hear your thoughts on the new interface.

There was quite a lot of different interfaces that "could have been", but I feel like all were explored, many which led to dead ends - people may wonder, "well what about if X was created via Y instead", but it seems like this flow / interface / memory management hierarchy is the one that seems optimal at this point in time. If people are interested of all the brainstorming involved as wel as other interfaces that were explored you can look at this gist (it's not tidy, it's just a dump of all the different interfaces that were attempted) https://gist.github.com/axsaucedo/87bacf8d7d31fa0319e67cf6ff24a026

Outstanding:

Finalise adding functionality for push constants
Identify optimisations for new developer flow
Amend python bindings to support new interface
Finish updating the rest of the tests
Add tests to cover the new interface further
Add tests for push constants

Further things to explore:

Currently the creation of shared_ptr adds a slight overhead, if the garbage-collection interface doesn't add much value to destroy objects when ref count reaches zero outside of manager, it could be amended that all resources are managed directly my the kp::Manager, and all references outside can just be pointers (instead of smart / shared_ptr). This could add further speed increases, however at this point it's not clear that the speed increases would be significant, especially if people are aware of this extra cost - mainly on the sequence interface as it creates a shared_ptr on each record/eval.

unexploredtest · 2021-02-27T15:46:30Z

The new interface looks neat! Just a question

Algorithms can now be created directly through the manager

Is it required to be manually created, or optional and can be created automatically?

axsaucedo · 2021-02-27T16:05:52Z

@aliPMPAINT great question, the short answer is no, but you can see how it's still possible to delegate the logic to handle SPIRV code as part of the Operations, the example provided is with the OpMult class which rebuilds the algorithm internally, mainly ensuring it's possible to contain the logic to retrieve / load spirv from the class itself:

    mgr.sequence()
        ->eval<kp::OpTensorSyncDevice>(params)
        ->eval<kp::OpMult>(params, mgr.algorithm())
        ->eval<kp::OpTensorSyncLocal>(params);

You can look at the OpMult class, which basically provides its own imlpementation that rebuilds the kp::Algorithm https://github.com/EthicalML/vulkan-kompute/blob/198fb46eb628051dd3a800b586bfca63ceb71aa2/src/include/kompute/operations/OpMult.hpp

The only thing required is basically passing an algorithm that contains the respective tensors to use, that's basically it.

There were tradeoffs that had to be made, primarily taking into consideration the limitatsion of the vulkan resources (pipeline being tied to descriptorsets) as well as operations not having access to the sequnece / manager. You can see the different interfaces explored but that seemed to be the one that covered all corners.

If you are curious on the internals of the new memory hierarchy, this diagram visualises the relationships between the different components as well as the owning relationships between each of the classes:

axsaucedo added 16 commits February 27, 2021 14:49

Initial checkpoint with reasonable workflow

9aae5d6

Added baseline functionality including new memory models

635fdb0

Updated core initial running test

f35a62e

Working initial base

3f12882

Further tests added to new structure

6378583

Initial base set of tests aligned with new interface

fb617d1

Updated to enable for opmult to work

3304767

Added algo executions tests

b3abbf1

Fixed all OpTensorSync tests

fd0f0d3

Added TensorOpTensorCreate tests

7fb1515

Removed the brainstorming code in the py file

08a5543

Updated testopshaders

2b09c55

Added TensorOpCopy tests

f206c62

Updated sequence and sequence tests

9788c79

Update cmakelists to align with required setup

9d206c3

Fixed integration tests fails due to pipeline not freed

198fb46

axsaucedo force-pushed the 160_op_memory branch from e7f03a4 to 198fb46 Compare February 27, 2021 15:28

Python implementation

4c4d073

axsaucedo mentioned this pull request Feb 28, 2021

Deep Learning Convolutional Neural Network (CNN) example implementation #162

Closed

axsaucedo added 7 commits February 28, 2021 07:57

All python tests pass

91d3b9a

Added support for push constants

7dc1f35

Updated python to align with current configuration

38f356f

Updated examples in readme

ddb7770

Updated ref architecture

75315db

Reformat

63e220a

Updated examples

4fddf74

axsaucedo added this to In progress in 0.7.0 via automation Feb 28, 2021

axsaucedo added this to In progress in 0.6.1 via automation Feb 28, 2021

axsaucedo added c++ python labels Feb 28, 2021

axsaucedo merged commit 672cf22 into master Feb 28, 2021

0.7.0 automation moved this from In progress to Done Feb 28, 2021

0.6.1 automation moved this from In progress to Done Feb 28, 2021

axsaucedo mentioned this pull request Mar 13, 2021

Avoid copy when returning python numpy / array #139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164

Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164

axsaucedo commented Feb 27, 2021 •

edited

unexploredtest commented Feb 27, 2021

axsaucedo commented Feb 27, 2021 •

edited

Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164

Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164

Conversation

axsaucedo commented Feb 27, 2021 • edited

unexploredtest commented Feb 27, 2021

axsaucedo commented Feb 27, 2021 • edited

axsaucedo commented Feb 27, 2021 •

edited

axsaucedo commented Feb 27, 2021 •

edited