Skip to content

Examples refactor#72

Merged
jiri-filipovic merged 160 commits into
HiPerCoRe:developmentfrom
Petronous:examples-refactor
Jun 1, 2026
Merged

Examples refactor#72
jiri-filipovic merged 160 commits into
HiPerCoRe:developmentfrom
Petronous:examples-refactor

Conversation

@Petronous
Copy link
Copy Markdown

@Petronous Petronous commented May 26, 2026

Refactors the Examples with a hierarchy of common base classes and a common CLI handling system, tied to https://is.muni.cz/auth/th/u3441/ .

ExampleBase provides basic functionality with no reference.
ExampleReferenceKernel is for Examples with a reference kernel.
ExampleReferenceComputation is for Examples with a reference computation.

Each Example customizes an appropriate base class through overriding methods.

Customizing the CLI for individual Examples is currently complicated due to focus on generality. It is intended to be rewritten in the future. Support for separate compiler tuning is currently not good, as it is a new feature; proper support is intended in the future.

premake5.lua has been refactored to reduce duplication in setting up Example projects.

AtfSamples have been split into separate projects. Legacy Examples have been updated and pulled up into the Examples folder.

FluidSimulation is not refactored due to being a third-party complex project. Microbenchmarks is not refactored due to being highly unusual and very likely to change in the near future.

Certain commits from upstream have been rebased into this branch in a miguided attempt at linear history. Removing them would risk further chaos. Apologies for the mess.

Petr Slonek and others added 30 commits April 28, 2026 16:05
…measurements

- UpdateArgument, DownloadArgument, CopyArgument now treat dataSize==0
  as "use full buffer size", matching the OpenCL backend convention
- Store standard deviation from ExecuteWithStableTiming in the
  ComputationResult, which was previously silently dropped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nize

- ClearData and ClearKernelData now call .wait() on futures before
  erasing them, preventing undefined behavior from destroying running
  std::async tasks
- SynchronizeQueue/SynchronizeQueues/SynchronizeDevice now wait for
  all pending compute and transfer actions to complete, matching the
  OpenCL backend's synchronization contract

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Local and Symbol memory types have no hardware equivalent on CPU.
Instead of failing with a buffer lookup error, skip them during
argument binding and log a warning. Similar to CUDA which also
skips these during argument binding (handling them via separate
backend-specific mechanisms).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable macro expansion for KTT_API/KTT_VIRTUAL_API visibility macros
so Doxygen correctly parses class and enum declarations. Fix @fn tag
mismatches in KernelResult.h (missing timestamp param, wrong return
type) and @param name mismatch in Tuner.h (powerParams -> preciseParams).
Add ParameterValueType.h to Doxygen INPUT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Before: profiling overhead related to infrastructure was accumulated in
KernelResult.m_Overhead

Now: it is accumulated in KernelResult.m_ProfilingOverhead

Also, minor fix in PythonTuner that appeared during rebase
Before: The kernel duration of the first run (i.e., the run without
profiling) was put into the result of the final pass, as this one gets
saved as the kernel run's result. However, this impacted the accoutning
of profiling runs overhead (as kernel duration of extra passes is added
to profiling runs overhead).

Now: The kernel duration of the first pass, along with kernel overhead
and compilation overhead of the first pass are copied into the final
pass results AFTER all the overheads are accounted for. This ensures
that the kernel duration of the first pass is preserved in the final
output, while the profiling runs overhead is correctly calculated.
Before: KernelResult:m_ExtraDuration included profiling infrastructure
overhead and was accumulated over all passes

Now:
- m_ExtraDuration stores only duration of user-specified launcher
tasks such as asynchronous data movements and synchronization, excluding
compilation, data movements and profiling infrastructure overhead.
- extra duration of the first pass is reported in the output and is used
for calculation in GetTotalDuration()
- extra duration of the profiling passes is accumulated in
m_ProfilingRunsOverhead and thus part of total overhead

Why:
- removes double counting, as the profiling infrastructure overhead is
already accounted for in m_ProfilingOverhead
- removes accumulated value of extra duration over all passes to show up
in total duration which was not correct
- removes accumulated value of extra duration over all passesto show up
in total overhead as part of profiling overhead, which was not correct
Before: Precise measurements for power and time lacked tracking for the
overhead they cost.

After: New category of overhead is added and accounted for.

Why: To track all overheads.
…ngine, there could be another stuff arising there...)
Petronous and others added 24 commits May 25, 2026 16:21
The current ExampleBase supports this very poorly. Will not be
refactored for now.

This reverts commit 7c8d95f.
Keep the rest of the old folder for compatibility with ReferenceVersions/AtfSamples
Copy link
Copy Markdown
Member

@jiri-filipovic jiri-filipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename ReferenceVersions to LegacyExamples and hide correctness scripts there. Otherwise seems good. We, of course, have to double-check if everything is fine before removing the LegacyExamples, but it seems ready to merge at development.

@Petronous
Copy link
Copy Markdown
Author

Rename ReferenceVersions to LegacyExamples and hide correctness scripts there. Otherwise seems good. We, of course, have to double-check if everything is fine before removing the LegacyExamples, but it seems ready to merge at development.

ReferenceVersions have been renamed and the scripts have been moved. I also discovered an uncaught bug in Sort, which is fixed now.

@Petronous Petronous requested a review from jiri-filipovic May 29, 2026 20:47
@jiri-filipovic jiri-filipovic merged commit 60851ca into HiPerCoRe:development Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants