Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend memory hierarchy to enable for push constants and functional interface for more flexible operations #164

Merged
merged 24 commits into from Feb 28, 2021

Conversation

axsaucedo
Copy link
Member

@axsaucedo axsaucedo commented Feb 27, 2021

Fixes #54
Fixes #12
Fixes: #166
Fixes: #161 (partially)
Fixes: #160

This PR contains a relatively large refactor that amends the memory hierarchies between components, adding the following:

  • Tensors are now managed by top level manager so can be used across many sequences / operations
  • Operations no longer manage resources and hence no longer hold vulkan components
  • Algorithms and operations no longer hold command buffers so they can used across multiple sequences
  • Algorithms can now be created directly through the manager
  • Operations can now be created directly through the manager
  • Manager now manages all tensors, algorithms and sequences
  • Sequences still manage operations
  • Shared pointers now have explicit garbage collection as manager uses weak pointers to clean when destroyed but components outside of manager get destroyed automatically when reference counter reaches zero
  • All commands are now executed directly through sequence
  • Concept of named / anonymous sequences are no longer existent as these can be created on the fly easily
  • All sequence commands return a pointer to themselves to enable for functional-like interface
  • Tensors / Sequences / Algorithsm are always initialised when created, and when they are destroyed can no longer be "rebuilt"

This is what the new interface looks like:

// This tensor GPU resources will be destroyed by the manager when the manager gets out of scope
std::shared_ptr<kp::Tensor> tensorA = nullptr;

{
  kp::Manager mgr;
  
  tensorA = mgr.tensor({0, 1, 2});

  {
    // This tensor will be destroyed when leaving the scope as ref count will reach 0
    // The manager holds a weak ptr so it won't re-destroy it (possible to clean empty pointers with mgr.clear()
    std::shared_ptr<kp::Tensor> tensorB = mgr.tensor({1, 2, 3});

    std::vector<uint32_t> spirv = kp::Shader::compile_source(/* string shader here */);

    kp::Workgroup wg = { 3, 1, 1 };
    kp::Constants specConst = { 5 };
    std::shared_ptr<kp::Algorithm> algorithm =
        mgr.algorithm({tensorA,tensorB}, spirv, wg, specConst);

    std::shared_ptr<kp::OpAlgoDispatch> opAD = { new kp::OpAlgoDispatch(algorithm) };

    // this is the new equivallent of an anonymous sequence (or named sequence)
    // Similar to other resources it gets destroyed when all refcount gets to zero
    // which means that in this case it will get destroyed after evaluating
    mgr.sequence()
        ->record<kp::OpTensorSyncDevice>({tensorA, tensorB})
        ->record(opAD) // possible to create operations outside and record/eval them
        ->record(opAD, [0.1, 0.2, 0.3]) // Injecting new push constant
        ->record(opAD, [2.5, 0.8, 1.9]) // Injecting another push constant
        ->eval()
        ->eval() // possible to record multiple commands and eval multiple times
        ->eval()
        ->eval<kp::OpTensorSyncLocal>({tensorA, tensorB}); // Also possible to eval (rerecord and eval) in single command
      // Sequence gets destroyed as not stored in heap variable

  } // TensorB, algorithm destroyed

} // TensorA gets destroyed by manager, tensorB already freed so manager doesnt destroy from weakptr

// TensorA heap object gets destroyed but does not free any gpu resources

@aliPMPAINT @alexander-g @SimLeek would be great to hear your thoughts on the new interface.

There was quite a lot of different interfaces that "could have been", but I feel like all were explored, many which led to dead ends - people may wonder, "well what about if X was created via Y instead", but it seems like this flow / interface / memory management hierarchy is the one that seems optimal at this point in time. If people are interested of all the brainstorming involved as wel as other interfaces that were explored you can look at this gist (it's not tidy, it's just a dump of all the different interfaces that were attempted) https://gist.github.com/axsaucedo/87bacf8d7d31fa0319e67cf6ff24a026

Outstanding:

  • Finalise adding functionality for push constants
  • Identify optimisations for new developer flow
  • Amend python bindings to support new interface
  • Finish updating the rest of the tests
  • Add tests to cover the new interface further
  • Add tests for push constants

Further things to explore:

  • Currently the creation of shared_ptr adds a slight overhead, if the garbage-collection interface doesn't add much value to destroy objects when ref count reaches zero outside of manager, it could be amended that all resources are managed directly my the kp::Manager, and all references outside can just be pointers (instead of smart / shared_ptr). This could add further speed increases, however at this point it's not clear that the speed increases would be significant, especially if people are aware of this extra cost - mainly on the sequence interface as it creates a shared_ptr on each record/eval.

@unexploredtest
Copy link
Contributor

The new interface looks neat! Just a question

Algorithms can now be created directly through the manager

Is it required to be manually created, or optional and can be created automatically?

@axsaucedo
Copy link
Member Author

axsaucedo commented Feb 27, 2021

@aliPMPAINT great question, the short answer is no, but you can see how it's still possible to delegate the logic to handle SPIRV code as part of the Operations, the example provided is with the OpMult class which rebuilds the algorithm internally, mainly ensuring it's possible to contain the logic to retrieve / load spirv from the class itself:

    mgr.sequence()
        ->eval<kp::OpTensorSyncDevice>(params)
        ->eval<kp::OpMult>(params, mgr.algorithm())
        ->eval<kp::OpTensorSyncLocal>(params);

You can look at the OpMult class, which basically provides its own imlpementation that rebuilds the kp::Algorithm https://github.com/EthicalML/vulkan-kompute/blob/198fb46eb628051dd3a800b586bfca63ceb71aa2/src/include/kompute/operations/OpMult.hpp

The only thing required is basically passing an algorithm that contains the respective tensors to use, that's basically it.

There were tradeoffs that had to be made, primarily taking into consideration the limitatsion of the vulkan resources (pipeline being tied to descriptorsets) as well as operations not having access to the sequnece / manager. You can see the different interfaces explored but that seemed to be the one that covered all corners.

If you are curious on the internals of the new memory hierarchy, this diagram visualises the relationships between the different components as well as the owning relationships between each of the classes:

image

@axsaucedo axsaucedo added this to In progress in 0.7.0 via automation Feb 28, 2021
@axsaucedo axsaucedo added this to In progress in 0.6.1 via automation Feb 28, 2021
@axsaucedo axsaucedo merged commit 672cf22 into master Feb 28, 2021
0.7.0 automation moved this from In progress to Done Feb 28, 2021
0.6.1 automation moved this from In progress to Done Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment