Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leaks. #625

Merged
merged 6 commits into from
Dec 14, 2020
Merged

Fix memory leaks. #625

merged 6 commits into from
Dec 14, 2020

Conversation

Hoikas
Copy link
Member

@Hoikas Hoikas commented Feb 9, 2020

This changeset fixes a major source of memory leaks in the vault client API. Any time hsRef<NetVaultNode> foo = new NetVaultNode; was called resulted in a leak due to a change in behavior when the old AtomicRef was merged with hsRefCnt.

@Hoikas
Copy link
Member Author

Hoikas commented Feb 9, 2020

For some background reading...

The onus of this is that on Gehn Beta, downloading a new copy of @DoobesURU Age Veelay Tsahvahn can cause the client to crash shortly after linking in with an std::bad_alloc. I started linking around and noticing that the client's memory usage seems to grow for each link. For example, standing in Relto for the first time, the client uses ~600mb. Linking to Gahreesen and back to Relto causes the client memory usage to be around ~900mb. That's an extreme example--smaller Ages like the Nexus have less of an impact. While I am fixing leaks reported by VLD, I'm really not seeing anything that corresponds to the observed behavior.

@Hoikas Hoikas force-pushed the leekspin branch 2 times, most recently from 54f80b6 to b354522 Compare February 9, 2020 23:59
@Hoikas Hoikas marked this pull request as ready for review February 10, 2020 00:28
@Hoikas Hoikas requested a review from zrax February 10, 2020 01:41
@Hoikas Hoikas force-pushed the leekspin branch 2 times, most recently from 0d4b2a5 to 08b3105 Compare September 7, 2020 00:06
@Hoikas
Copy link
Member Author

Hoikas commented Sep 7, 2020

I recently received a complaint in IRC that boiled down to folks with large buddy lists saw std::bad_alloc rather frequently, so I decided to revisit this changeset due to its fixing of leaked vault nodes. It has been updated to merge cleanly. However, due to an issue introduced in the upgrade to Python 3, we depend on #720.

@DamnBriggsy
Copy link
Contributor

Prior to these changes, the client would crash on MOULa in an hour or less. After these changes, it stayed up for over 10 hours with no problems.

@@ -123,7 +123,7 @@ hsRefCnt::hsRefCnt(int initRefs)
hsRefCnt::~hsRefCnt()
{
#ifdef HS_DEBUGGING
hsThrowIfFalse(fRefCnt == 1);
hsThrowIfFalse(fRefCnt <= 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases do we want to support where the ref count is < 1? I feel like we should catch too-many-unrefs as well as too few...

Copy link
Member Author

@Hoikas Hoikas Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetVaultNode's starting reference count is zero. An earlier refactor changed its base class from AtomicRef to hsRefCnt but neglected to address the difference in starting ref count, introducing all of these leaks. I initially wanted to add a template parameter for the starting ref count to hsRefCnt, but I wasn't able to come up with a clean solution that didn't expose implementation details to the rest of the engine

@Hoikas
Copy link
Member Author

Hoikas commented Dec 10, 2020

@zrax RE our discussion in IRC about this PR... Maybe the easiest path forward is to back out 80e00cd? That restores the starting ref count to 1, but uses a weird hack for initializing an hsRef from a newly allocated object. I'm not sure if we're going to find anything much cleaner. :/

@zrax
Copy link
Member

zrax commented Dec 10, 2020

Part of the problem here is the mix of managed (hsRef<T>) and unmanaged objects all over the code base... If we could assume all were managed, this could be simplified and made safer. Unfortunately, the code base is far from that goal, so we have these ugly hacks instead.

Perhaps instead of the New() template, we could add a more generic mechanism for adopting/stealing a ref. Something like (completely untested):

struct hsAdoptRef_Type {};
constexpr hsAdoptRef_Type hsAdoptRef;

template <class _Ref>
class hsRef
{
public:
    /* ... */

    hsRef(_Ref *obj, hsAdoptRef_Type) : fObj(obj) { }

    /* ... */

    void Adopt(_Ref *obj)
    {
        if (fObj)
            fObj->UnRef();
        fObj = obj;
        return *this;
    }

    /* ... */
};

Then in usage:

hsRef<Whatever> myref(new Whatever, hsAdoptRef);
// or
hsRef<Whatever> myref;
myref.Adopt(new Whatever);

In the long run, we'd want to eventually ensure that promotion from a "raw" hsRefCnt to an hsRef managed object is always one-way, which would mean that the adoption behavior becomes the default (and this workaround can be removed). We may actually be close already, but that would require more investigation and testing than I think you're willing to add in this PR ;)

@Hoikas
Copy link
Member Author

Hoikas commented Dec 10, 2020

Agreed. I'll look into this, although I may change hsAdoptRef to hsStealRef - I think the naming is clearer in that case 😛

When `hsRef` is constructed from a pointer, it increments the reference
count. Originally, the `NetVaultNode` reference count started at zero,
so this was fine. However, when the various reference counting bases
were consolidated, `NetVaultNode`'s count became one. This means that
any instance of `hsRef<NetVaultNode> ref = new NetVaultNode;` would have
a reference count of two, causing the object to be leaked.
@Hoikas
Copy link
Member Author

Hoikas commented Dec 12, 2020

It compiles! :shipit:

@Hoikas
Copy link
Member Author

Hoikas commented Dec 14, 2020

hsRef<_Ref>::New() is really gone now 🤣

@Hoikas
Copy link
Member Author

Hoikas commented Dec 14, 2020

💥

@Hoikas Hoikas merged commit 913276f into H-uru:master Dec 14, 2020
@Hoikas Hoikas deleted the leekspin branch December 14, 2020 21:21
@zrax zrax mentioned this pull request Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants