Skip to content

Introduce thapi_sampling_daemon #369

Merged
Kerilk merged 13 commits intodevelfrom
ze_sampling
Jul 10, 2025
Merged

Introduce thapi_sampling_daemon #369
Kerilk merged 13 commits intodevelfrom
ze_sampling

Conversation

@TApplencourt
Copy link
Copy Markdown
Collaborator

@TApplencourt TApplencourt commented Jul 1, 2025

Sorry for the big PR, the diff is bigger than expected because I'm stupid but, in short the change are small.

We now have one thapi_sampling_daemon, who can open some libThapiSamplingPluging.so. libThapiSamplingPluging just need to expose a thapi_register_sampling_plugin function, and inside call thapi_register_sampling.

In short we:

  • Introduce libThapiSampling.so
    • Expose only 3 symbols, the register, init, and sampling_loop
  • Introduce thapi_sampling_register.h header, exposing only register, the only header than sampling plugin need
  • Introduce thapi_sampling.h for the thapi_sampling_daemon so sampling_loop and init.

This will help nathan and is new CXI plugin sample.

In the futur we can maybe clean thapi_sampling_daemon (we don't need pthread protection anymore) and the ze_sampling_pluging (no need to dl_open anymore, we can just call directly the pointers)

Copy link
Copy Markdown
Contributor

@nscottnichols nscottnichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will let @Kerilk do approval

Copy link
Copy Markdown
Collaborator

@Kerilk Kerilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert changes to sampling library.

@TApplencourt
Copy link
Copy Markdown
Collaborator Author

TApplencourt commented Jul 2, 2025

No, I didn't do the change for fun. Nothing fundamental changed, but some stuff needed change. I can add back unregister, this is the only major change (and use it where it was not before).

@Kerilk
Copy link
Copy Markdown
Collaborator

Kerilk commented Jul 2, 2025

The only things that needed to change in those files was extracting the heartbeats.

One possible improvement is delaying the thread creation to when the sampling list is not empty, and wait for the thread to die when the last event is unregistered.

The reason for the register/unregister is that you may want to start and stop sampling programmatically.

Plugins should unregister their events, though I agree that in practice the cleanup of the library should be good enough.

@TApplencourt
Copy link
Copy Markdown
Collaborator Author

TApplencourt commented Jul 2, 2025

The reason for the register/unregister is that you may want to start and stop sampling programmatically.

I will add it back. Even if I don't see how can we using in the current algorithm. The pluging just call expose one function, who will be called once, who will call registration. I suppose the plugin so can spawn fork in the plugin_registration and 20min later unregister this is true.


Require change in the lib:

  • We need to add a finalization callback, and associated fn_ in the struct; so user pluging can register some finalization step for their own plugin (for example closing the handle)
  • We need to a thapi_loop_stop function so we don't expose the extern volatile to the daemon_binary (see @nscottnichols previous comment)

  • Why do we need to spawn thread anymore? Now iprof force the daemon to be run only by one process.
  • Same for the pthread mutex, we know only one thread will call the daemon (xprof)

But I agree unreleated to the PR, we can do in another one.

@TApplencourt
Copy link
Copy Markdown
Collaborator Author

  • The CI is passing \o/.
  • Using the "new-old" sampling daemon.
  • Add an example "heartbeat" plugin

@nscottnichols
Copy link
Copy Markdown
Contributor

@Kerilk These changes seem to restore most of the original thapi_sampling.c and thapi_sampling.h code and is somewhat perscriptive on what to do when adding a new sampling plugin. For example, if I had something I wanted to sample that had a state I would initialize the state in the thapi_initialize_sampling_plugin() function and destroy the state in the thapi_finalize_sampling_plugin() function. The heartbeats have also been moved to their own plugin, giving an additional example of what to do.

My remaining question is, where should new sampling plugins be added to the code? The one option, as currently demonstrated in this PR, is to have each sampling plugin live in its backend directory. The ze_sampling_plugin.c is under the ze directory and heartbeat_sampling_plugin.c is under the sampling directory. An alternate approach would be to have a plugins subdirectory in the sampling directory with approriate backend subdirectories there as well. This could offer some more organization, instead of having the sampling plugins scattered to the wind across the repo. This second option is probably more appropriate to go into a separate PR. I only ask because I'm working on a CXI sampling plugin and want it to fit nicely within the structure of the code.

What are your thoughts on the current state of the PR and my question above?

@Kerilk
Copy link
Copy Markdown
Collaborator

Kerilk commented Jul 7, 2025

For code generation purposes, I think it is simpler to have sampling plugins with their respective backends. For a distribution perspective I think the current scheme works. I have questions about the capitalization though, here is the list of files we have currently:

lib/thapi/sampling
lib/thapi/sampling/libThapiSampling.la
lib/thapi/sampling/libheartbeatsampling.so
lib/thapi/sampling/libThapiSampling.so.1
lib/thapi/sampling/libThapiSampling.so
lib/thapi/sampling/libheartbeatsampling.la
lib/thapi/sampling/libThapiSampling.so.1.0.0
[...]
here/lib/thapi/bt2/libCUDAInterval.so
here/lib/thapi/bt2/libOMPInterval.so
here/lib/thapi/bt2/libZEInterval.so
here/lib/thapi/bt2/libXStripper.so
here/lib/thapi/bt2/libXTimeline.so
here/lib/thapi/bt2/libCLInterval.la
here/lib/thapi/bt2/libXTally.so
here/lib/thapi/bt2/libXTimeline.la
here/lib/thapi/bt2/libOMPInterval.la
here/lib/thapi/bt2/libHIPInterval.so
here/lib/thapi/bt2/libXTally.la
here/lib/thapi/bt2/libMPIInterval.la
here/lib/thapi/bt2/libXAggreg.la
here/lib/thapi/bt2/libZEInterval.la
here/lib/thapi/bt2/libMPIInterval.so
here/lib/thapi/bt2/libXAggreg.so
here/lib/thapi/bt2/libCLInterval.so
here/lib/thapi/bt2/libXStripper.la
here/lib/thapi/bt2/libHIPInterval.la
here/lib/thapi/bt2/libCUDAInterval.la
[...]
here/lib/thapi/ze/libze_loader.la
here/lib/thapi/ze/libze_loader.so
here/lib/thapi/ze/libzesampling.so
here/lib/thapi/ze/libze_loader.so.1
here/lib/thapi/ze/libze_loader.so.1.0.0
here/lib/thapi/ze/libzesampling.la

I think we should be consistent about naming our libraries, and I like the capitalized version better to be honnest.
So I propose renaming:

  • libzesampling.so => libZESampling.so
  • libheartbeatsampling.so => libHeartbeatSampling.so

@TApplencourt
Copy link
Copy Markdown
Collaborator Author

TApplencourt commented Jul 7, 2025

Consistency always good!

I guess I just got triped by libze_loader. And if I remenber correctly inside the different Makefile we are not super consistance neither. (also the opencl / cl, I guess we should fix that one day too)

I will try to make a README_NAMING_CONVENSION or something in a separate PR and will fix the Sampling naming into this one

@Kerilk
Copy link
Copy Markdown
Collaborator

Kerilk commented Jul 7, 2025

We can't change the ones which are not ours like libze_loader.so. I double checked the rest, we are pretty consistent, I think we already went over them once.

@TApplencourt
Copy link
Copy Markdown
Collaborator Author

TApplencourt commented Jul 7, 2025

I was thinking about nodist_libzetracepoints_la_SOURCES, nodist_libtestzesource_la_SOURCES for example. I guess we should camel case those guy too (even if we don't export it, just to have an easy to follow rules).
I guess for now we have

  • if we don't export lower case
  • if we export Camel Case. if we can

Will be simplest to always CamelCase if we can. (or whatever our convension is called, snake of something)

@Kerilk
Copy link
Copy Markdown
Collaborator

Kerilk commented Jul 7, 2025

I forgot about non-installed libraries. Agreed, Camel case is good and we should use it consistently when possible.

@TApplencourt
Copy link
Copy Markdown
Collaborator Author

TApplencourt commented Jul 9, 2025

Ok pushed, ready to merge (even if the CI is falling, it's unrelated to that...).
Will do the ther nondist is another PR

Copy link
Copy Markdown
Collaborator

@Kerilk Kerilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Kerilk Kerilk merged commit 7db06ea into devel Jul 10, 2025
18 of 19 checks passed
@nscottnichols nscottnichols deleted the ze_sampling branch July 10, 2025 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants