[BUG] Memory allocations from flushing L2 can lead to significant delays between benchmark executions

## Description

In an attempt to gather more accurate timings, [nvbench will "flush" the L2 cache](https://github.com/NVIDIA/nvbench/blob/1a13a2e724b8aa8aee27649ac6878babb63862a6/nvbench/detail/l2flush.cuh#L28) by querying the device's L2 cache size, allocating device memory of that size, memset that memory to zero, and then free it. 

NVBench will do this between every cold iteration. This can be quite expensive if there are a large number of cold iterations or points in the benchmark axis space. @GregoryKimball reported that this can cause up to a 1.2s delay between each iteration as cudaMalloc/cudaFree can be quite expensive.

## Possible Solutions

1. Add option to disable flushing L2 cache
2. Avoid allocating/freeing every time and instead make a single allocation per device and memset the same every allocation each time.
3. Enable user to provide their own allocator to allocate the memory used for flushing. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Memory allocations from flushing L2 can lead to significant delays between benchmark executions #100

Description

Possible Solutions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Memory allocations from flushing L2 can lead to significant delays between benchmark executions #100

Description

Description

Possible Solutions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions