Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix domain not being freed at the end of scheduling loop #2974

Merged
merged 13 commits into from
Feb 16, 2018

Conversation

msimberg
Copy link
Contributor

No description provided.

@hkaiser
Copy link
Member

hkaiser commented Oct 25, 2017

@msimberg Thanks for thinking about this. May I suggest to instead make the domain type explicitly non-copyable and delete the underlying ___itt_domain in ~domain(), if needed?

@msimberg
Copy link
Contributor Author

@hkaiser Thanks for you suggestion! That sounds like a good thing to do, I'll update the PR.

@msimberg msimberg force-pushed the fix-get_thread_itt_domain-leak branch from 73dd0e8 to 9065dbf Compare November 16, 2017 08:48
@msimberg
Copy link
Contributor Author

@hkaiser Could you please have a look? I now handle the thread specific itt domain inside the domain struct. However, what I don't like about this is that now domain can only represent the thread domain. There was no other usage currently within hpx but is this a deal breaker?

Note that I don't free the ___itt_domain as per the domain API. Relevant bit:

The set of domains is expected to be static over the application's execution time, therefore, there is no mechanism to destroy a domain.

I'm not sure what would happen if one tries to delete it anyway though.

@sithhell
Copy link
Member

@msimberg msimberg force-pushed the fix-get_thread_itt_domain-leak branch from 484791f to ac4a410 Compare November 17, 2017 07:58

/// \cond NOINTERNAL
HPX_API_EXPORT hpx::util::itt::domain const& get_thread_itt_domain();
/// \endcond
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing HPX_EXPORT from this function will cause linker errors on Windows and Mac/OS if the user references util::annotated_function in his/her code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was completely removed. But see other comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't see that at first. I'm however concerned by making the thread_specific_ptr directly 'visible' through HPX_EXPORT. See comments below.

{
HPX_ITT_TASK_END(domain_.domain_.get());

delete id_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we can delete the memory? I have not gone back to the ITT docs, however AFAIR the ID is allocated using malloc (not using new).

Copy link
Contributor Author

@msimberg msimberg Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is allocated using new a bit further down (line 441). However, I'm not entirely clear on what the difference is between itt_make_id and itt_id_create. Maybe it should not be allocated manually in the first place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for clarifying.

Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What general benefit do we gain from this refactoring of the domain type? The domain is still not being free'd at the end of the scheduling loop (see ticket title).

@hkaiser
Copy link
Member

hkaiser commented Nov 17, 2017

@hkaiser Could you please have a look? I now handle the thread specific itt domain inside the domain struct. However, what I don't like about this is that now domain can only represent the thread domain. There was no other usage currently within hpx but is this a deal breaker?

Note that I don't free the ___itt_domain as per the domain API. Relevant bit:

The set of domains is expected to be static over the application's execution time, therefore, there is no mechanism to destroy a domain.

I'm not sure what would happen if one tries to delete it anyway though.

What's the point of this refactoring in this case? (Note I see that the task is now being free'd, which looks like a correct change).

@msimberg
Copy link
Contributor Author

The __itt_domain was never the problem (AFAIU). It was the domain that was wrapped in a thread_specific_ptr in runtime_impl.cpp (returned by get_thread_itt_domain) that was never freed once the thread exits. Now each domain is deleted once it goes out of scope.

Hence my original fix was just to have an explicit call to free the domain in the thread_specific_ptr at the end of scheduling_loop (although a better place would probably have been at the end of thread_func or somewhere together with other reset_tss functions).

@hkaiser
Copy link
Member

hkaiser commented Nov 17, 2017

If we have more than one domain instance created on the same thread, wouldn't this approach actually delete the underlying data prematurely?

@msimberg
Copy link
Contributor Author

The underlying ___itt_domain is in any case not deleted in the domain destructor, so in that sense there is no risk of deleting it prematurely. This may be a leak, but it's an "official" one because there is no way of deleting it using the ITT API (and my googling found no mention of manually deleting/freeing it either).

However, it might be that we can actually just delete/free the ___itt_domain. This would just require a bit of instance counting in the domain struct to know when we can free it. Alternatively a shared pointer to a single thread specific domain instance or something like this.

___itt_domain* domain_;
HPX_EXPORT static hpx::util::thread_specific_ptr<
___itt_domain, itt_domain_tag
> domain_;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if making the thread_specific_ptr 'visible' to other modules through HPX_EXPORT is really necessary. Keeping it fully contained in the source file (itt_notify.cpp) without exposing it should be possible and sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the HPX_EXPORT is not necessary. Removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I had in mind (although that's part of it). Sorry if I was not sufficiently clear with my comments.

What I meant is to remove the domain_ variable from the class entirely and make it static to the file it is defined in. There is no need for the user to see this at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment completely slipped past me, sorry. Just wanted to say that I've seen it now and I'll take it into account once I look at this again.

@hkaiser
Copy link
Member

hkaiser commented Nov 20, 2017

@msimberg Thanks for your explanations. I don't think we should further complicate the design of this just to release a small bit of memory held by a thread-local variable. All of this code is used for debugging use cases in the end, thus leaving it as you propose (except for the exported thread_specific_ptr, which causes ABI incompatibilities) is fine.

@msimberg
Copy link
Contributor Author

@hkaiser Yeah, I was a bit wary to go even this far but I hope this is an acceptable trade-off.

I've removed the HPX_EXPORT now as it indeed wasn't needed.

Could you still confirm that it's okay that the domain struct now can't be used for anything but a thread specific domain? I have next to no experience in using ITT (besides what I've learned while fixing this).

@hkaiser
Copy link
Member

hkaiser commented Nov 20, 2017

I think we use the domain type for thread domains only, but in principle it can be used for other things as well (IIRC).

@biddisco
Copy link
Contributor

I found the memory leak using address_sanitizer and am troubled because the itt_stuff shouldn't be allocated at all if we're not using it? shouldn't there be an #ifdef somewhere that prevents any of this being used unless we actually enable HPX_WITH_ITTNOTIFY in cmake?

@@ -424,15 +424,15 @@ namespace hpx { namespace threads { namespace detail
scheduling_counters& counters, scheduling_callbacks& params)
{
std::atomic<hpx::state>& this_state = scheduler.get_state(num_thread);

#ifdef HPX_HAVE_ITTNOTIFY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@biddisco I don't think this is necessary as all the types in util::itt are defined as being empty if HPX_HAVE_ITTNOTIFY was not defined (see: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/itt_notify.hpp#L619-L731).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the #ifdef around that block does prevent the allocation of an empty object though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That gets optimized away, for sure. Even if not, it's just one byte on the stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to

        util::itt::domain domain = hpx::get_thread_itt_domain();

allocates an itt::domain

        if (nullptr == d.get())
            d.reset(new util::itt::domain(get_thread_name().c_str()));

and this is a full malloc. Not a single byte on the stack. I don't use ITTNOTIFY and I don't see why I should be allocating any of this. I think the #ifdef should stay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not allocate anything if ITTNOTIFY is disabled, see here: https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/itt_notify.hpp#L619-L731. At least that's the intention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're both right.

On current master a struct is always allocated on the heap no matter if ITT is enabled or not. With ITT off the struct is empty and the use in scheduling_loop is ifdefed out, so it might get optimized away.

If I get my RAII wrapper done it would indeed be an empty struct on the stack when ITT is off. Chances of this getting optimized out are higher, but I don't know if it would happen.

@msimberg
Copy link
Contributor Author

@hkaiser Is there anything you'd still like to see in this PR to have it merged?

Regarding the domain being thread specific, there was indeed no non-thread specific use in hpx itself which is why I could do this change in the first place. But if that functionality should be kept I would quite honestly go back to the first commit of this PR and manually delete the memory, as that is by far the simplest and gets the job done.

@hkaiser
Copy link
Member

hkaiser commented Nov 30, 2017

@msimberg if we go back to your initial suggestion (which I'd be fine with) we still should delete the memory using an RAII wrapper instead of doing it manually..

@msimberg
Copy link
Contributor Author

msimberg commented Dec 4, 2017

@hkaiser I tried, and failed, to make a wrapper struct for the thread specific domain. The problem was HPX threads that get moved to another OS thread end up deleting the wrong thread specific domain.

While doing this I realized that the current solution also has problems, in the sense that ITT tasks which get moved to another OS thread will end the ITT task with the wrong domain (current thread rather than the starting thread). Not sure what VTune would do with this but it's different from the current master behaviour.

At the moment I don't have a solution for this, except for the initial suggestion without the RAII wrapper (but I agree that a RAII wrapper would be good to have). So I continue looking into this but progress may be slow...

@msimberg msimberg self-assigned this Dec 5, 2017
thread_domain is subclass of domain and initializes its member variable domain_
via a thread_specific_ptr<___itt_domain> instead of creating a new one for each
instantiation.

HPX_EXPORT domain(char const*);
HPX_EXPORT domain();
HPX_EXPORT virtual ~domain();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really have to make the destructor virtual?

Copy link
Member

@hkaiser hkaiser Jan 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need explicit destructors at all as both types, domain, and thread_domain currently expose empty destructors only and thread_domain doesn't even have members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, virtual is not necessary in this case, but default destructors is even better.

{
HPX_ITT_TASK_END(domain_.domain_);

delete id_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@msimberg msimberg force-pushed the fix-get_thread_itt_domain-leak branch from b71349e to 9844503 Compare January 30, 2018 14:40
@msimberg
Copy link
Contributor Author

To not leave this hanging, I'm quite happy with the state of this right now.

  • ITT domain is only handled by RAII wrappers
  • It's possible to create a custom domain with domain("name") or to use the thread specific one with thread_domain()
  • The underlying ___itt_domain pointer is stored in the wrapper, so it will not change if a task moves to other OS threads (i.e. it is not taken from the thread_specific_ptr every time it is accessed)

Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@msimberg msimberg merged commit 69100ee into STEllAR-GROUP:master Feb 16, 2018
@msimberg msimberg deleted the fix-get_thread_itt_domain-leak branch February 16, 2018 08:55
parsa added a commit to STEllAR-GROUP/octotiger that referenced this pull request Jan 18, 2019
* Octotiger ITT hooks don't compile as is. Breaking changes are probably from STEllAR-GROUP/hpx#2974
parsa added a commit to STEllAR-GROUP/octotiger that referenced this pull request Jan 18, 2019
* Octotiger ITT hooks don't compile as is. Breaking changes are probably from STEllAR-GROUP/hpx#2974
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants