Support for Timestamping #176

alexander-g · 2021-03-06T11:19:08Z

Added the option to latch timestamps after each operation with vkCmdWriteTimestamp().

Notes:

Timestamps might be not very accurate, especially for data copy. I've noticed that a shader dispatch takes longer directly after a copy. One might consider adding more timestamps within OpAlgoDispatch, e.g. after the memory barrier
The user needs to know in advance the maximum number of operations they want to record when creating a sequence. This might not always be possible.
The timestamps are returned as a simple list, one might consider adding a parameter timestamp_name="banana" to record() for easier identification which operation took how much time.

axsaucedo · 2021-03-06T11:37:54Z

Interesting - thanks for the PR! Can you show an example of how this works?

Looking at the relationships between commandbuffer and timestamp pool, it seems like it's more similar to that of the Tensor/Op or Algo/Op than to something that would be specific to a sequence. Is there a reason why this isn't explored instead as something similar to an Algorithm, or Tensor, where there can be a Timestamp object, and then we could have an OpTimestamp type that basically would be able to call the Timestamp's record.

That way also you would be able to set the timestamps explicitly as:

std::shared_ptr<kp::OpTimestamp> opTimestamp = { new kp::OpTimestamp(mgr.timestamp(...) };

mgr.sequence()
  ->record(opTimestamp)
  ->record<kp::OpTensorSyncDevice>(...)
  ->record(opTimestamp)
  ->record<kp::AlgoDispatch>(...)
  ->record<opTimestamp)
  ->record<kp::OpTensorSyncLocal>(...)
  ->record(opTimestamp)
  ->eval()

std::vector<std::uint64_t> timestamps = opTimestamp->timestamps()

It would be good to get a better insight on timestamps, how are you currently using them?

alexander-g · 2021-03-06T11:48:32Z

A simple example:

seq = mgr.sequence(nrOfTimestamps=100)
(seq.record(kp.OpTensorSyncDevice([a,b]))
    .record(kp.OpAlgoDispatch(algo0))
    .record(kp.OpAlgoDispatch(algo1))
    .record(kp.OpAlgoDispatch(algo2))
    .record(kp.OpTensorSyncLocal([c]))
    .eval())

print( np.diff(seq.get_timestamps()) )

>>>[ 6976 31264 16384 16416 9312]

OpTimestamp might be an alternative but requires more work for the user.

I've just started using this, so I cannot tell much about it yet. Previously I created a sequence for each operation and measured the time for it to evaluate as you suggested in #110 but this is slow and requires writing additional profiling code, so I was looking for alternatives and found that Vulkan has this built-in.

axsaucedo · 2021-03-06T12:06:30Z

Ok interesting, yes I see what you mean in regards to potentially requiring more work for the user. Is there a reason why we'd want to have a max number of timestamps? Would it not be possible to just allow the timestamps to be set in between each operation when enabled and then cleared every time eval is called without limits? Mainly as the re-record would also be cleared as well

alexander-g · 2021-03-06T12:13:39Z

A maximum number is required by Vulkan when creating a QueryPool.
rerecord() looks like it might help, but this would require the user to call manually wouldn't it?

axsaucedo · 2021-03-06T12:48:10Z

Edit: Did a first pass, looks like this initial approach in the PR is a good way to go and can be explored from there

Oh I see what you mean now by the point above in regards to not having the exact number of operations upfront... I agree that re-record could help but yeah it would require adding manual timestamp. It feels a bit awkward to expose the exact number of timestamps that needs to be set by the user - perhaps what could be set in this case is a function to trigger a re-record, this way the queue creation would be set up with the right size and it would only live for as long as the recorded commands request this explicitly. It could then be used as:

mgr.sequence()
   ->record(...)
   ->record(...)
   ->record(...)
   ->record(...)
   ->rerecord(true) // creates queue and adds timestamps
   ->eval()
   ->record(...) // deletes queue and records new command buffer
   ->rerecord(false) // re-records but doesn't add timestamps
   ->rerecord(true) // re-records and adds timestamps
   ->eval()

One thing to ask is whether the simple eval(...) could still have a timestamp, but it seems like it would be necessary to always run ->record()->rerecord()->eval() to add timestamps.

There's still a couple of things that are not clear:

What happens if you want to trigger re-record on an existing queue pool? Are you expected to keep track of all of the eval without clearing? Is that the expected behaviour, or would it be better to expect the user to read the queues after every eval / evalAwait?

axsaucedo

Just had another look and it seems that the current approach does seem to make sense, I did an initial pass, would be good to also dive into some of the points:

When calling record after eval is it still expected to use the same query pool? It seems like it makes more sense to have one pool specific for the recorded batch
Would be good to also implement rerecord(uint32_t totalTimestamps) to allow for resizing

Overall looks like a great addition - probably good to add tests and then we can use the test to add an example in the advanced-examples.rst documentation

python/src/main.cpp

single_include/kompute/Kompute.hpp

src/Manager.cpp

src/Sequence.cpp

src/include/kompute/Sequence.hpp

alexander-g · 2021-03-07T11:35:05Z

It feels a bit awkward to expose the exact number of timestamps that needs to be set by the user

It's not the exact number but the maximum number. The user can set it to a high number like one million if they are ok with allocating that additional memory.

What happens if you want to trigger re-record on an existing queue pool? Are you expected to keep track of all of the eval without clearing? Is that the expected behaviour, or would it be better to expect the user to read the queues after every eval / evalAwait?

I don't quite understand what you mean with that. (I believe) the query pool is simply a buffer where timestamps are written into. If I call rerecord() the buffer stays untouched and will be simply reused on next eval().

What's the point of rerecord anyway? I can't see a use case. Personally, I would rather create a new sequence instead.

axsaucedo

Looks good - only added a minor comment but should be quite fast change, it would also be great if we can add a test to make sure the exptected timestamps can be retrieved. Here's also some thoughts on the points above:

It's not the exact number but the maximum number. The user can set it to a high number like one million if they are ok with allocating that additional memory.

I agree it makes sense to initially allow the user to provide a default value - I think the ability to allow the user to update it could also make sense, and could be another main value / purpose of the re-record command, I mention further thoughts below.

I don't quite understand what you mean with that. (I believe) the query pool is simply a buffer where timestamps are written into. If I call rerecord() the buffer stays untouched and will be simply reused on next eval().

I was thinking that it would be useful to be able to "clear" and "resize" the timestamp queue, this could be a useful purpose of re-record. By extending it to receive the rerecord(uint32_t totalTimestamps) it can be possible to allow for either a re-creation of the pipeline with a different size, and/or a clearing of the current logged timestamps by recreating the pipeline with the same size.

What's the point of rerecord anyway? I can't see a use case. Personally, I would rather create a new sequence instead.

Tensors and algorithms can be rebuilt with different values / configurations, which means for a sequence to be "refreshed" it would need to be re-recorded. You can certainly go through all the record commands manually, but rerecord provides a simpler way to trigger a "reload" which can take into account any changes on tensors. There is a test in TestSequence which shows an example.

By the way, if you get a chance it would be good to get your overall thoughts on #177 as I'm adding support for multiple types on tensor (uint, int, float, double & bool).

src/Manager.cpp

alexander-g · 2021-03-07T13:19:57Z

Added test case but filtering it in Makefile because SwiftShader does not support timestamps.
There are also some problems when running on my GPU: the test succeeds if I run it alone, but fails when I run it together with all other tests with the "Device does not support timestamps" exception

axsaucedo

Ok - I've just tested locally, I didn't get any failures when running all the tests, and also passes when only running the test. As you say swiftshader doesn't seem to support Timestamps - but there's limitation in other aspects like supporting double so I think skipping it would be ok.

PR looks good, would be good to confirm whether the addition of the rerecord is something that we'd want as part of the PR or whether it's something that can be explored later on - it seems probably the latter makes more sense but may be worth creating an issue. Let me know and I can merge or hold accordingly.

alexander-g · 2021-03-07T13:58:03Z

I would merge it as it is. I will take a closer look at re-recording a bit later.

axsaucedo · 2021-03-07T14:00:45Z

Sounds good 👍

support for timestamps

6f5a8f8

Merge branch 'master' into timestamps

eb47d52

axsaucedo requested changes Mar 6, 2021

View reviewed changes

alexander-g force-pushed the timestamps branch from 929d794 to d622e09 Compare March 7, 2021 11:23

requested changes

6da6bca

alexander-g force-pushed the timestamps branch from d622e09 to 6da6bca Compare March 7, 2021 11:25

axsaucedo requested changes Mar 7, 2021

View reviewed changes

src/Manager.cpp Outdated Show resolved Hide resolved

test case

259d3f1

axsaucedo approved these changes Mar 7, 2021

View reviewed changes

axsaucedo merged commit cc1ec74 into KomputeProject:master Mar 7, 2021

This was referenced Mar 7, 2021

Add documentation for timestamping #178

Open

Timestamp Queue resize on Sequence rerecord #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Timestamping #176

Support for Timestamping #176

alexander-g commented Mar 6, 2021

axsaucedo commented Mar 6, 2021 •

edited

Loading

alexander-g commented Mar 6, 2021

axsaucedo commented Mar 6, 2021 •

edited

Loading

alexander-g commented Mar 6, 2021

axsaucedo commented Mar 6, 2021 •

edited

Loading

axsaucedo left a comment

alexander-g commented Mar 7, 2021 •

edited

Loading

axsaucedo left a comment

alexander-g commented Mar 7, 2021

axsaucedo left a comment

alexander-g commented Mar 7, 2021

axsaucedo commented Mar 7, 2021

Support for Timestamping #176

Support for Timestamping #176

Conversation

alexander-g commented Mar 6, 2021

axsaucedo commented Mar 6, 2021 • edited Loading

alexander-g commented Mar 6, 2021

axsaucedo commented Mar 6, 2021 • edited Loading

alexander-g commented Mar 6, 2021

axsaucedo commented Mar 6, 2021 • edited Loading

axsaucedo left a comment

Choose a reason for hiding this comment

alexander-g commented Mar 7, 2021 • edited Loading

axsaucedo left a comment

Choose a reason for hiding this comment

alexander-g commented Mar 7, 2021

axsaucedo left a comment

Choose a reason for hiding this comment

alexander-g commented Mar 7, 2021

axsaucedo commented Mar 7, 2021

axsaucedo commented Mar 6, 2021 •

edited

Loading

axsaucedo commented Mar 6, 2021 •

edited

Loading

axsaucedo commented Mar 6, 2021 •

edited

Loading

alexander-g commented Mar 7, 2021 •

edited

Loading