-
-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix domain not being freed at the end of scheduling loop #2974
Fix domain not being freed at the end of scheduling loop #2974
Conversation
@msimberg Thanks for thinking about this. May I suggest to instead make the domain type explicitly non-copyable and delete the underlying |
@hkaiser Thanks for you suggestion! That sounds like a good thing to do, I'll update the PR. |
73dd0e8
to
9065dbf
Compare
@hkaiser Could you please have a look? I now handle the thread specific itt domain inside the Note that I don't free the
I'm not sure what would happen if one tries to delete it anyway though. |
The inspect check is failing here: https://8644-4455628-gh.circle-artifacts.com/0/tmp/circle-artifacts.5wZtXB8/hpx_inspect_report.html |
484791f
to
ac4a410
Compare
|
||
/// \cond NOINTERNAL | ||
HPX_API_EXPORT hpx::util::itt::domain const& get_thread_itt_domain(); | ||
/// \endcond | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing HPX_EXPORT
from this function will cause linker errors on Windows and Mac/OS if the user references util::annotated_function
in his/her code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was completely removed. But see other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I didn't see that at first. I'm however concerned by making the thread_specific_ptr
directly 'visible' through HPX_EXPORT
. See comments below.
src/util/itt_notify.cpp
Outdated
{ | ||
HPX_ITT_TASK_END(domain_.domain_.get()); | ||
|
||
delete id_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we can delete
the memory? I have not gone back to the ITT docs, however AFAIR the ID is allocated using malloc
(not using new
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is allocated using new
a bit further down (line 441). However, I'm not entirely clear on what the difference is between itt_make_id
and itt_id_create
. Maybe it should not be allocated manually in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What general benefit do we gain from this refactoring of the domain
type? The domain is still not being free'd at the end of the scheduling loop (see ticket title).
What's the point of this refactoring in this case? (Note I see that the task is now being free'd, which looks like a correct change). |
The Hence my original fix was just to have an explicit call to free the |
If we have more than one domain instance created on the same thread, wouldn't this approach actually delete the underlying data prematurely? |
The underlying However, it might be that we can actually just delete/free the |
hpx/util/itt_notify.hpp
Outdated
___itt_domain* domain_; | ||
HPX_EXPORT static hpx::util::thread_specific_ptr< | ||
___itt_domain, itt_domain_tag | ||
> domain_; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if making the thread_specific_ptr
'visible' to other modules through HPX_EXPORT
is really necessary. Keeping it fully contained in the source file (itt_notify.cpp) without exposing it should be possible and sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the HPX_EXPORT
is not necessary. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I had in mind (although that's part of it). Sorry if I was not sufficiently clear with my comments.
What I meant is to remove the domain_
variable from the class entirely and make it static
to the file it is defined in. There is no need for the user to see this at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment completely slipped past me, sorry. Just wanted to say that I've seen it now and I'll take it into account once I look at this again.
@msimberg Thanks for your explanations. I don't think we should further complicate the design of this just to release a small bit of memory held by a thread-local variable. All of this code is used for debugging use cases in the end, thus leaving it as you propose (except for the exported |
@hkaiser Yeah, I was a bit wary to go even this far but I hope this is an acceptable trade-off. I've removed the Could you still confirm that it's okay that the |
I think we use the domain type for thread domains only, but in principle it can be used for other things as well (IIRC). |
I found the memory leak using address_sanitizer and am troubled because the itt_stuff shouldn't be allocated at all if we're not using it? shouldn't there be an #ifdef somewhere that prevents any of this being used unless we actually enable HPX_WITH_ITTNOTIFY in cmake? |
@@ -424,15 +424,15 @@ namespace hpx { namespace threads { namespace detail | |||
scheduling_counters& counters, scheduling_callbacks& params) | |||
{ | |||
std::atomic<hpx::state>& this_state = scheduler.get_state(num_thread); | |||
|
|||
#ifdef HPX_HAVE_ITTNOTIFY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@biddisco I don't think this is necessary as all the types in util::itt
are defined as being empty if HPX_HAVE_ITTNOTIFY
was not defined (see: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/itt_notify.hpp#L619-L731).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the #ifdef around that block does prevent the allocation of an empty object though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That gets optimized away, for sure. Even if not, it's just one byte on the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to
util::itt::domain domain = hpx::get_thread_itt_domain();
allocates an itt::domain
if (nullptr == d.get())
d.reset(new util::itt::domain(get_thread_name().c_str()));
and this is a full malloc. Not a single byte on the stack. I don't use ITTNOTIFY and I don't see why I should be allocating any of this. I think the #ifdef should stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not allocate anything if ITTNOTIFY
is disabled, see here: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/itt_notify.hpp#L619-L731. At least that's the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're both right.
On current master a struct is always allocated on the heap no matter if ITT is enabled or not. With ITT off the struct is empty and the use in scheduling_loop
is ifdefed out, so it might get optimized away.
If I get my RAII wrapper done it would indeed be an empty struct on the stack when ITT is off. Chances of this getting optimized out are higher, but I don't know if it would happen.
@hkaiser Is there anything you'd still like to see in this PR to have it merged? Regarding the |
@msimberg if we go back to your initial suggestion (which I'd be fine with) we still should delete the memory using an RAII wrapper instead of doing it manually.. |
@hkaiser I tried, and failed, to make a wrapper struct for the thread specific While doing this I realized that the current solution also has problems, in the sense that ITT tasks which get moved to another OS thread will end the ITT task with the wrong At the moment I don't have a solution for this, except for the initial suggestion without the RAII wrapper (but I agree that a RAII wrapper would be good to have). So I continue looking into this but progress may be slow... |
thread_domain is subclass of domain and initializes its member variable domain_ via a thread_specific_ptr<___itt_domain> instead of creating a new one for each instantiation.
…hpx into fix-get_thread_itt_domain-leak
hpx/util/itt_notify.hpp
Outdated
|
||
HPX_EXPORT domain(char const*); | ||
HPX_EXPORT domain(); | ||
HPX_EXPORT virtual ~domain(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really have to make the destructor virtual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't need explicit destructors at all as both types, domain
, and thread_domain
currently expose empty destructors only and thread_domain
doesn't even have members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, virtual
is not necessary in this case, but default destructors is even better.
{ | ||
HPX_ITT_TASK_END(domain_.domain_); | ||
|
||
delete id_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
b71349e
to
9844503
Compare
To not leave this hanging, I'm quite happy with the state of this right now.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
* Octotiger ITT hooks don't compile as is. Breaking changes are probably from STEllAR-GROUP/hpx#2974
* Octotiger ITT hooks don't compile as is. Breaking changes are probably from STEllAR-GROUP/hpx#2974
No description provided.