Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to control iteration via the concept of iteration spaces. #80

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
344878e
Allow users to control iteration via the concept of iteration spaces.
robertmaynard Jan 31, 2022
a25f578
Rename tie_axes to zip_axes
robertmaynard Feb 28, 2022
c3c86e1
implement easier API to add axis and zip/user iteration at the same time
robertmaynard Feb 28, 2022
91c8f43
Show zip versus linear iteration
robertmaynard Mar 1, 2022
f4570d4
Update docs/benchmarks.md
robertmaynard Apr 12, 2022
f791475
Update docs/benchmarks.md
robertmaynard Apr 12, 2022
f50a6dd
Update nvbench/axis_iteration_space.cxx
robertmaynard Apr 12, 2022
796f7f7
Update nvbench/axis_iteration_space.cuh
robertmaynard Apr 12, 2022
edba477
Update nvbench/axis_iteration_space.cxx
robertmaynard Apr 12, 2022
9337ba9
Update examples/custom_iteration_spaces.cu
robertmaynard Apr 12, 2022
a02d648
Update examples/custom_iteration_spaces.cu
robertmaynard Apr 12, 2022
e80392e
Update examples/custom_iteration_spaces.cu
robertmaynard Apr 12, 2022
4c964d2
Update examples/custom_iteration_spaces.cu
robertmaynard Apr 12, 2022
26467f3
More cleanup
robertmaynard Apr 12, 2022
5b000e8
More cleanup
robertmaynard Apr 12, 2022
ba8356f
Refactor names
robertmaynard Apr 12, 2022
40a6711
Document benchmark iteration space methods
robertmaynard Apr 12, 2022
e7b4800
Refactor axis spaces into separate TUs
robertmaynard Apr 12, 2022
9aa2feb
Add iteration_space_base docs
robertmaynard Apr 13, 2022
06a4c8f
rename files holding iteration_space_base to match name of class
robertmaynard Apr 13, 2022
454d1bf
Add more docs
robertmaynard May 2, 2022
8af9453
Add more docs
robertmaynard May 2, 2022
6fd0883
drop usage of std::tie in nvbench/axes_metadata.cxx
robertmaynard Aug 23, 2022
99395df
Update to cross reference docs
robertmaynard Aug 23, 2022
5ebe7fe
Update docs around iteration_space_base constructor
robertmaynard Aug 23, 2022
dc7e2b7
Drop ability to zip axii after construction
robertmaynard Aug 29, 2022
5708e6c
remove need for make_space_iterator
robertmaynard Aug 31, 2022
3ad3d65
update axis_space_iterator to use same method names as state_iterator
robertmaynard Aug 31, 2022
c2bfc99
remove need for output_indices
robertmaynard Sep 1, 2022
910b5cc
Simplified user iterators
robertmaynard Sep 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 34 additions & 6 deletions docs/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,9 +237,37 @@ NVBENCH_BENCH_TYPES(benchmark, NVBENCH_TYPE_AXES(input_types, output_types))
```

This would generate a total of 36 configurations and instantiate the benchmark 6
times. Keep the rapid growth of these combinations in mind when choosing the
number of values in an axis. See the section about combinatorial explosion for
more examples and information.
times.

Keep the rapid growth of combinations due to multiple parameter axes in mind when
choosing the number of values in an axis. See the section about combinatorial
explosion for more examples and information.

## Zipped Iteration of Value Axes

At times multiple value axes need to be iterated like they are actually a tuple
or zipped together. To enable this behavior you can request axes to be 'zipped'
together.

```cpp
// InputTypes: {char, int, unsigned int}
// OutputTypes: {float, double}
// NumInputs: {1000, 10000, 100000, 200000, 200000, 200000}
// Quality: {0.05, 0.1, 0.25, 0.5, 0.75, 1.}

using input_types = nvbench::type_list<char, int, unsigned int>;
using output_types = nvbench::type_list<float, double>;
NVBENCH_BENCH_TYPES(benchmark, NVBENCH_TYPE_AXES(input_types, output_types))
.set_type_axes_names({"InputType", "OutputType"})
.add_int64_axis("NumInputs", {1000, 10000, 100000, 200000, 200000, 200000})
.add_float64_axis("Quality", {0.05, 0.1, 0.25, 0.5, 0.75, 1.})
.zip_axes({"NumInputs", "Quality"});
robertmaynard marked this conversation as resolved.
Show resolved Hide resolved
robertmaynard marked this conversation as resolved.
Show resolved Hide resolved
```

Zipping these two axes reduces the total combinations from 216 to 36, reducing the
combinatorial explosion.

Note: Only value axes may be zipped together.

# Throughput Measurements

Expand Down Expand Up @@ -426,9 +454,9 @@ NVBENCH_BENCH_TYPES(my_benchmark,
```

For large configuration spaces like this, pruning some of the less useful
combinations (e.g. `sizeof(init_type) < sizeof(output)`) using the techniques
described in the "Skip Uninteresting / Invalid Benchmarks" section can help
immensely with keeping compile / run times manageable.
combinations using the techniques described in the "Zipped/Tied Iteration of Value Axes"
or "Skip Uninteresting / Invalid Benchmarks" section can help immensely with
keeping compile / run times manageable.
robertmaynard marked this conversation as resolved.
Show resolved Hide resolved

Splitting a single large configuration space into multiple, more focused
benchmarks with reduced dimensionality will likely be worth the effort as well.
1 change: 1 addition & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ set(example_srcs
stream.cu
throughput.cu
auto_throughput.cu
custom_iteration_spaces.cu
)

# Metatarget for all examples:
Expand Down
243 changes: 243 additions & 0 deletions examples/custom_iteration_spaces.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
/*
* Copyright 2021 NVIDIA Corporation
*
* Licensed under the Apache License, Version 2.0 with the LLVM exception
* (the "License"); you may not use this file except in compliance with
* the License.
*
* You may obtain a copy of the License at
*
* http://llvm.org/foundation/relicensing/LICENSE.txt
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include <nvbench/nvbench.cuh>

// Grab some testing kernels from NVBench:
#include <nvbench/test_kernels.cuh>

// Thrust vectors simplify memory management:
#include <thrust/device_vector.h>

#include <random>

//==============================================================================
// Multiple parameters:
// Varies block_size and num_blocks while invoking a naive copy of 256 MiB worth
// of int32_t.
void copy_sweep_grid_shape(nvbench::state &state)
{
// Get current parameters:
const int block_size = static_cast<int>(state.get_int64("BlockSize"));
const int num_blocks = static_cast<int>(state.get_int64("NumBlocks"));

// Number of int32s in 256 MiB:
const std::size_t num_values = 256 * 1024 * 1024 / sizeof(nvbench::int32_t);

// Report throughput stats:
state.add_element_count(num_values);
state.add_global_memory_reads<nvbench::int32_t>(num_values);
state.add_global_memory_writes<nvbench::int32_t>(num_values);

// Allocate device memory:
thrust::device_vector<nvbench::int32_t> in(num_values, 0);
thrust::device_vector<nvbench::int32_t> out(num_values, 0);

state.exec(
[block_size,
num_blocks,
num_values,
in_ptr = thrust::raw_pointer_cast(in.data()),
out_ptr = thrust::raw_pointer_cast(out.data())](nvbench::launch &launch) {
nvbench::copy_kernel<<<num_blocks, block_size, 0, launch.get_stream()>>>(
in_ptr,
out_ptr,
num_values);
});
}

//==============================================================================
// Naive iteration of both the BlockSize and NumBlocks axes.
// Will generate the full cartesian product of the two axes for a total of
// 16 invocations of copy_sweep_grid_shape.
NVBENCH_BENCH(copy_sweep_grid_shape)
.set_name("naive_copy_sweep_grid_shape")
.add_int64_axis("BlockSize", {32, 64, 128, 256})
.add_int64_axis("NumBlocks", {1024, 512, 256, 128});

//==============================================================================
// Zipped iteration of BlockSize and NumBlocks axes.
// Will generate only 4 invocations of copy_sweep_grid_shape
NVBENCH_BENCH(copy_sweep_grid_shape)
.set_name("tied_copy_sweep_grid_shape")
.add_zip_axes(nvbench::int64_axis{"BlockSize", {32, 64, 128, 256}},
nvbench::int64_axis{"NumBlocks", {1024, 512, 256, 128}});

//==============================================================================
// under_diag:
// Custom iterator that only searches the `X` locations of two axes:
// [- - - - X]
// [- - - X X]
// [- - X X X]
// [- X X X X]
// [X X X X X]
//
struct under_diag final : nvbench::user_axis_space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example implementations, it'd be helpful to have more comments describing what each component of this struct is doing (e.g. how do size and valid_count differ?)

{
under_diag(std::vector<std::size_t> input_indices,
std::vector<std::size_t> output_indices)
: nvbench::user_axis_space(std::move(input_indices),
std::move(output_indices))
{}

mutable std::size_t x_pos = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should find a way to make this work without mutable.

mutable std::size_t y_pos = 0;
mutable std::size_t x_start = 0;

nvbench::detail::axis_space_iterator do_get_iterator(axes_info info) const
{
// generate our increment function
auto adv_func = [&, info](std::size_t &inc_index,
std::size_t /*len*/) -> bool {
inc_index++;
x_pos++;
if (x_pos == info[0].size)
{
x_pos = ++x_start;
y_pos = x_start;
return true;
}
return false;
};

// our update function
std::vector<std::size_t> locs = m_output_indices;
auto diag_under =
[&, locs, info](std::size_t,
std::vector<nvbench::detail::axis_index> &indices) {
nvbench::detail::axis_index temp = info[0];
temp.index = x_pos;
indices[locs[0]] = temp;

temp = info[1];
temp.index = y_pos;
indices[locs[1]] = temp;
};

const size_t iteration_length = ((info[0].size * (info[1].size + 1)) / 2);
return nvbench::detail::make_space_iterator(2,
iteration_length,
adv_func,
diag_under);
}

std::size_t do_get_size(const axes_info &info) const
{
return ((info[0].size * (info[1].size + 1)) / 2);
}

std::size_t do_get_active_count(const axes_info &info) const
{
return ((info[0].size * (info[1].size + 1)) / 2);
}

std::unique_ptr<nvbench::iteration_space_base> do_clone() const
{
return std::make_unique<under_diag>(*this);
}
};

NVBENCH_BENCH(copy_sweep_grid_shape)
.set_name("user_copy_sweep_grid_shape")
.add_user_iteration_axes(
[](auto... args) -> std::unique_ptr<nvbench::iteration_space_base> {
return std::make_unique<under_diag>(args...);
},
nvbench::int64_axis("BlockSize", {64, 128, 256, 512, 1024}),
nvbench::int64_axis("NumBlocks", {1024, 521, 256, 128, 64}));

//==============================================================================
// gauss:
// Custom iteration space that uses a gauss distribution to
// sample the points near the middle of the index space
//
struct gauss final : nvbench::user_axis_space
robertmaynard marked this conversation as resolved.
Show resolved Hide resolved
{

gauss(std::vector<std::size_t> input_indices,
std::vector<std::size_t> output_indices)
: nvbench::user_axis_space(std::move(input_indices),
std::move(output_indices))
{}

nvbench::detail::axis_space_iterator do_get_iterator(axes_info info) const
{
const double mid_point = static_cast<double>((info[0].size / 2));

std::random_device rd{};
std::mt19937 gen{rd()};
std::normal_distribution<> d{mid_point, 2};

const size_t iteration_length = info[0].size;
std::vector<std::size_t> gauss_indices(iteration_length);
for (auto &g : gauss_indices)
{
auto v = std::min(static_cast<double>(info[0].size), d(gen));
v = std::max(0.0, v);
g = static_cast<std::size_t>(v);
}

// our update function
std::vector<std::size_t> locs = m_output_indices;
auto gauss_func = [=](std::size_t index,
std::vector<nvbench::detail::axis_index> &indices) {
nvbench::detail::axis_index temp = info[0];
temp.index = gauss_indices[index];
indices[locs[0]] = temp;
};

return nvbench::detail::make_space_iterator(1,
iteration_length,
gauss_func);
}

std::size_t do_get_size(const axes_info &info) const { return info[0].size; }

std::size_t do_get_active_count(const axes_info &info) const
{
return info[0].size;
}

std::unique_ptr<iteration_space_base> do_clone() const
{
return std::make_unique<gauss>(*this);
}
};
//==============================================================================
// Dual parameter sweep:
void dual_float64_axis(nvbench::state &state)
{
const auto duration_A = state.get_float64("Duration_A");
const auto duration_B = state.get_float64("Duration_B");

state.exec([duration_A, duration_B](nvbench::launch &launch) {
nvbench::sleep_kernel<<<1, 1, 0, launch.get_stream()>>>(duration_A +
duration_B);
});
}
NVBENCH_BENCH(dual_float64_axis)
.add_user_iteration_axes(
[](auto... args) -> std::unique_ptr<nvbench::iteration_space_base> {
return std::make_unique<gauss>(args...);
},
nvbench::float64_axis("Duration_A", nvbench::range(0., 1e-4, 1e-5)))
.add_user_iteration_axes(
[](auto... args) -> std::unique_ptr<nvbench::iteration_space_base> {
return std::make_unique<gauss>(args...);
},
nvbench::float64_axis("Duration_B", nvbench::range(0., 1e-4, 1e-5)));
4 changes: 4 additions & 0 deletions nvbench/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
set(srcs
axes_metadata.cxx
axis_base.cxx
iteration_space_base.cxx
benchmark_base.cxx
benchmark_manager.cxx
blocking_kernel.cu
Expand All @@ -10,6 +11,7 @@ set(srcs
device_manager.cu
float64_axis.cxx
int64_axis.cxx
linear_axis_space.cxx
markdown_printer.cu
named_values.cxx
option_parser.cu
Expand All @@ -20,6 +22,8 @@ set(srcs
string_axis.cxx
type_axis.cxx
type_strings.cxx
user_axis_space.cxx
zip_axis_space.cxx

detail/measure_cold.cu
detail/measure_hot.cu
Expand Down
Loading