# Timing

Our initial idea for using the Persistent Memory devices relies on obtaining timing information for the various kernels in an nGraph function with the inputs/outputs in different different memory pools, and using this timing to assign weights to an optimization routine.

The questions this notebook is trying to answer is:

1. If profiling individual kernels in this way provides somewhat consistent and reliable results.
2. If profiling multiple kernel configurations at a time across the graph yields similar results to testing one configuration at a time. This optimization comre from the large number of configurations that have to be tested for larger graphs. Overlapping thests for non-neighboring ops can speed up profiling time by over an order of magnitude.

The general pipeline for performing the profiling is:

1. Build a nGraph function + executable
2. Get all nodes in the graph that we want to test and enumerate all the configurations to be tested.
3. Pick a configuration to test and configure the graph to reflect that configuration. Optionally, keep greedily picking configurations that don't overlap until no more can be selected.
4. Recompile the graph with the new memory configuration.
5. Use the built-in timing features of nGraph codegen to profile the running times for internal nodes.
6. Repeat until all configurations have been checked.

## Desired Outcomes

1. Timing for single config and multi-config runs similar.
2. Variation of timing for ops in the same configuration relatively small.

In [1]:
using Pkg; Pkg.activate(".")

using Runner, Checkpoints

# Setup checkpoints
setdepot("./timing-checkpoints")

┌ Info: Recompiling stale cache file /home/mark/.julia/compiled/v1.1/Runner/F5BZU.ji for Runner [4a6e9825-ed04-540b-82d3-f33d0e8d45fb]
└ @ Base loading.jl:1184
┌ Info: Precompiling Checkpoints [b4a3413d-e481-5afc-88ff-bdfbd6a50dce]
└ @ Base loading.jl:1186


"./timing-checkpoints"