Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

id_type local reference counting is wrong #1032

Closed
sithhell opened this issue Dec 17, 2013 · 11 comments
Closed

id_type local reference counting is wrong #1032

sithhell opened this issue Dec 17, 2013 · 11 comments

Comments

@sithhell
Copy link
Member

https://github.com/STEllAR-GROUP/hpx/blob/master/tests/regressions/id_type_ref_counting.cpp

triggers a failure in the current local refcounting scheme of id types. That is that it is possible to have different "families" of id types which have different reference counts. If one of those reference counts goes to zero and the GID was never split, the component is released to early.

During analysis of this test this happens in at least one place. Namely, the id_type returned from hpx::new_ has a distinct reference count to the id_type returned from this->get_gid() in a component.

@brycelelbach
Copy link
Member

I just want to note that the reason why this->get_gid() returns an uncounted GID is
to prevent a reference cycle. If a component holds a reference count to itself, it will
never be destroyed, because its reference count will never go to zero.

@sithhell
Copy link
Member Author

Why should a component hold a reference count to itself? Having unmanaged GIDs around makes it hard to reason about the correctness of a HPX program. IMHO, all id_types should be managed. Make unmanaged GIDS the rare exception. Just as a side note, the id_type returned from this->get_gid() in a component is never stored anywhere by default in the components.

@hkaiser
Copy link
Member

hkaiser commented Dec 18, 2013

What Bryce meant is that every component stores its own gid internally. This gid shouldn't be managed as otherwise the component would keep itself alive. Meanwhile, I think I have a solution to this which I will commit shortly.

@ghost ghost assigned hkaiser Dec 18, 2013
@brycelelbach
Copy link
Member

Please see my comment regarding this commit. I think this is the wrong way to
fix this. I think we should remove get_gid(), or make it get_gid() return a
gid_type. If I understand what you've done Hartmut it appears that every time I
call get_gid(), incref will be called. I think this is bad because I don't think that users
will expect a function like this->get_gid() to involve network overhead.

@hkaiser
Copy link
Member

hkaiser commented Dec 18, 2013

What alternative do you propose to fix #1032?

@sithhell
Copy link
Member Author

How does this involve network overhead? The AGAS server is on the locality. No overhead here.
Even if get_gid really returns a gid_type, it doesn't solve the problem the mentioned testcase demonstrates.

@brycelelbach
Copy link
Member

Ah, okay, the point about the AGAS server being on the locality is valid

@sithhell
Copy link
Member Author

Reopening because it was only fixed for simple components.

@sithhell
Copy link
Member Author

Looks like it is (mostly) fixed. The only error the testcase is still reporting is that components don't get destroyed.

@K-ballo
Copy link
Member

K-ballo commented Jan 4, 2014

The testcase fails to build with GCC4.4, it should be disabled for that compiler.

@hkaiser
Copy link
Member

hkaiser commented Jan 5, 2014

Well, managed components have not been fixed yet which might be the reason for the leak you're reporting.

hkaiser added a commit that referenced this issue Jan 5, 2014
hkaiser added a commit that referenced this issue Jan 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants