Skip to content

[BUG] Memory allocations from flushing L2 can lead to significant delays between benchmark executions #100

@jrhemstad

Description

@jrhemstad

Description

In an attempt to gather more accurate timings, nvbench will "flush" the L2 cache by querying the device's L2 cache size, allocating device memory of that size, memset that memory to zero, and then free it.

NVBench will do this between every cold iteration. This can be quite expensive if there are a large number of cold iterations or points in the benchmark axis space. @GregoryKimball reported that this can cause up to a 1.2s delay between each iteration as cudaMalloc/cudaFree can be quite expensive.

Possible Solutions

  1. Add option to disable flushing L2 cache
  2. Avoid allocating/freeing every time and instead make a single allocation per device and memset the same every allocation each time.
  3. Enable user to provide their own allocator to allocate the memory used for flushing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions