Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to temporarily store RefCounted object in a Ref/RefPtr inside its destructor #8748

Closed
wants to merge 1 commit into from

Conversation

rniwa
Copy link
Member

@rniwa rniwa commented Jan 18, 2023

fdc168d

Make it possible to temporarily store RefCounted object in a Ref/RefPtr inside its destructor
https://bugs.webkit.org/show_bug.cgi?id=250746

Reviewed by NOBODY (OOPS!).

This patch removes debug / security assertion that m_deletionHasBegun is not set in ref() / deref()
to allow temporarily storing a RefCounted object in a Ref/RefPtr inside its destructor.

Instead, assert that m_refCount is 1 at the end of destructor to make sure there is no outstanding
external references at the end of its destruction.

Apply the same fix to ThreadSafeRefCounted as well as Node.

* Source/WTF/wtf/RefCounted.h:
(WTF::RefCountedBase::ref const):
(WTF::RefCountedBase::hasOneRef const):
(WTF::RefCountedBase::~RefCountedBase):
(WTF::RefCountedBase::derefBase const):
* Source/WTF/wtf/ThreadSafeRefCounted.h:
(WTF::ThreadSafeRefCountedBase::~ThreadSafeRefCountedBase):
(WTF::ThreadSafeRefCountedBase::ref const):
(WTF::ThreadSafeRefCountedBase::hasOneRef const):
(WTF::ThreadSafeRefCountedBase::derefBase const):
* Source/WebCore/dom/Node.cpp:
(WebCore::Node::~Node):
* Source/WebCore/dom/Node.h:
(WebCore::Node::ref const):
(WebCore::Node::deref const):
* Tools/TestWebKitAPI/Tests/WTF/RefPtr.cpp:
(TestWebKitAPI::StoreRefInDestructor::create):
(TestWebKitAPI::StoreRefInDestructor::~StoreRefInDestructor):
(TestWebKitAPI::StoreRefInDestructor::helperFunction):
(TestWebKitAPI::WTF_RefPtr.RefInDestructor):

fdc168d

Misc iOS, tvOS & watchOS macOS Linux Windows
❌ πŸ§ͺ style βœ… πŸ›  ios βœ… πŸ›  mac βœ… πŸ›  wpe βœ… πŸ›  πŸ§ͺ win
βœ… πŸ§ͺ bindings βœ… πŸ›  ios-sim βœ… πŸ›  mac-AS-debug βœ… πŸ›  gtk βœ… πŸ›  wincairo
βœ… πŸ§ͺ webkitperl βœ… πŸ§ͺ ios-wk2 βœ… πŸ§ͺ api-mac βœ… πŸ§ͺ gtk-wk2
βœ… πŸ§ͺ api-ios βœ… πŸ§ͺ mac-wk1 βœ… πŸ§ͺ api-gtk
βœ… πŸ›  πŸ§ͺ jsc βœ… πŸ›  tv βœ… πŸ§ͺ mac-wk2 βœ… πŸ›  jsc-armv7
βœ… πŸ›  πŸ§ͺ jsc-arm64 βœ… πŸ›  tv-sim βœ… πŸ§ͺ mac-AS-debug-wk2 βœ… πŸ§ͺ jsc-armv7-tests
βœ… πŸ›  watch βœ… πŸ§ͺ mac-wk2-stress βœ… πŸ›  jsc-mips
βœ… πŸ›  watch-sim βœ… πŸ§ͺ jsc-mips-tests

@rniwa rniwa requested a review from cdumez as a code owner January 18, 2023 01:55
@rniwa rniwa self-assigned this Jan 18, 2023
…tr inside its destructor

https://bugs.webkit.org/show_bug.cgi?id=250746

Reviewed by NOBODY (OOPS!).

This patch removes debug / security assertion that m_deletionHasBegun is not set in ref() / deref()
to allow temporarily storing a RefCounted object in a Ref/RefPtr inside its destructor.

Instead, assert that m_refCount is 1 at the end of destructor to make sure there is no outstanding
external references at the end of its destruction.

Apply the same fix to ThreadSafeRefCounted as well as Node.

* Source/WTF/wtf/RefCounted.h:
(WTF::RefCountedBase::ref const):
(WTF::RefCountedBase::hasOneRef const):
(WTF::RefCountedBase::~RefCountedBase):
(WTF::RefCountedBase::derefBase const):
* Source/WTF/wtf/ThreadSafeRefCounted.h:
(WTF::ThreadSafeRefCountedBase::~ThreadSafeRefCountedBase):
(WTF::ThreadSafeRefCountedBase::ref const):
(WTF::ThreadSafeRefCountedBase::hasOneRef const):
(WTF::ThreadSafeRefCountedBase::derefBase const):
* Source/WebCore/dom/Node.cpp:
(WebCore::Node::~Node):
* Source/WebCore/dom/Node.h:
(WebCore::Node::ref const):
(WebCore::Node::deref const):
* Tools/TestWebKitAPI/Tests/WTF/RefPtr.cpp:
(TestWebKitAPI::StoreRefInDestructor::create):
(TestWebKitAPI::StoreRefInDestructor::~StoreRefInDestructor):
(TestWebKitAPI::StoreRefInDestructor::helperFunction):
(TestWebKitAPI::WTF_RefPtr.RefInDestructor):
@aproskuryakov
Copy link
Contributor

Normally, the idea is that objects can be half-destroyed in the destructor (because subclasses). So it is unsafe to Ref them back, or to use them in general.

Are RefCounted objects never subclassed? Or is there some other reason why this scenario is meaningful?

@geoffreygaren
Copy link
Contributor

Normally, the idea is that objects can be half-destroyed in the destructor (because subclasses). So it is unsafe to Ref them back, or to use them in general.

Are RefCounted objects never subclassed? Or is there some other reason why this scenario is meaningful?

As we deploy mechanical rules that require use of smart pointers, with local verifiability, we have started triggering cases where we ref and deref inside a destructor. Once our transition to smart pointers is complete, it will be impossible to call any non-self-member function that puts 'this' inside a local variable without triggering such a case. So, our options are:

  • Program without smart pointers
  • Program without calling non-self-member functions from destructors
  • Program without local variables
  • Remove this assertion

Of those options, removing this assertion seems to be the least bad option. To the extent that the assertion may sometimes catch use after partial or complete destruction, probably ASan, Guard Malloc, and fuzzing are better testing options for discovering that error, and smart pointers are a better implementation option for preventing that error.

Copy link
Contributor

@geoffreygaren geoffreygaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me

@aproskuryakov
Copy link
Contributor

  • Program without calling non-self-member functions from destructors

Assuming that "non-self-member functions" that you are thinking about are virtual function calls, my implication that this is a must regardless of whether we hit an assertion or not. It's a bug to do this, so eliminating that is necessary, not a "bad option".

@aproskuryakov
Copy link
Contributor

Overall, this patch is a bad idea, as it eliminates a critically important safety check. Please find a better solution.

Copy link
Contributor

@aproskuryakov aproskuryakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that this assertion has caught serious bugs in the past, perhaps more than any other single runtime check in our code. I don't see how it makes sense to eliminate it.

@geoffreygaren
Copy link
Contributor

I verified that this assertion has caught serious bugs in the past, perhaps more than any other single runtime check in our code. I don't see how it makes sense to eliminate it.

Since you're standing in the way of progress on smart pointers, it's not sufficient to prove that this assertion has caught serious bugs in the past. You need to prove that this assertion has caught more serious bugs than smart pointers have avoided -- and that smart pointers would not also prevent those bugs.

@geoffreygaren
Copy link
Contributor

  • Program without calling non-self-member functions from destructors

Assuming that "non-self-member functions" that you are thinking about are virtual function calls, my implication that this is a must regardless of whether we hit an assertion or not. It's a bug to do this, so eliminating that is necessary, not a "bad option".

I'm not talking about virtual function calls. I'm talking about calling any function, including a stand-alone function or a member function on another object.

It's not a bug to call a function inside a destructor.

@geoffreygaren
Copy link
Contributor

Overall, this patch is a bad idea, as it eliminates a critically important safety check. Please find a better solution.

If you have a suggestion for how to require smart pointers everywhere without triggering a ref inside a destructor, please do write it down (so I can submit it to a journal and apply you for a Turing award). Hand-waving at "better" is not a meaningful form of patch review.

@aproskuryakov
Copy link
Contributor

I have a lot of words to say in response to "standing in the way" and "Turing award", which I will keep to myself for now.

Perhaps a way to make this discussion constructive is to present an example where you think this is necessary. You said "calling any function", but the proposal in webkit-dev doesn't involve wrapping function arguments in RefPtrs.

The latest proposal was "Use smart pointers in all local variables and heap allocated values." Can you demonstrate an example where this requires Ref'ing half-deleted objects?

@rniwa
Copy link
Member Author

rniwa commented Jan 19, 2023

Perhaps a way to make this discussion constructive is to present an example where you think this is necessary. You said "calling any function", but the proposal in webkit-dev doesn't involve wrapping function arguments in RefPtrs.

Wrapping each function argument in a Ref / RefPtr is the eventual goal.

The latest proposal was "Use smart pointers in all local variables and heap allocated values." Can you demonstrate an example where this requires Ref'ing half-deleted objects?

For example, FormListedElement::formOwnerRemovedFromTree currently cannot use RefPtr because it can be invoked during node destruction. Generally, it's problematic whenever we have a helper function that uses a local variable pointing to a reference counted object.

A whole bunch of code inside Document::~Document() becomes invalid as well as soon as member functions such as topDocument(), styleScope(), extensionStyleSheets(), etc... start returning a RefPtr / Ref. The same goes for Frame::~Frame. Functions like loader() and script() can't be updated to return a Ref / RefPtr because it's used in the destructor.

@aproskuryakov
Copy link
Contributor

Wrapping each function argument in a Ref / RefPtr is the eventual goal.

Why do we need to make this change now in support of an eventual goal?

For example, FormListedElement::formOwnerRemovedFromTree currently cannot use RefPtr because it can be invoked during node destruction.

Thank you, this is a good concrete example. I can't mentally trace the execution chain that gets us there, but the existing code smells bad:

    // Can't use RefPtr here beacuse this function might be called inside ~ShadowRoot via addChildNodesToDeletionQueue. See webkit.org/b/189493.
    Node* rootNode = &asHTMLElement();

asHTMLElement is a virtual function, and if I'm guessing correctly, we are calling it on a half-destructed object that isn't even an HTMLElement at this point any more?

Adding RefPtr to the mix arguably doesn't make this anti-pattern much safer. It's trivial to have a use-after-free in this scenario even when everything is RefPtr'd.

To be clear, I agree with the desire to stop using a raw pointer here. Do you envision next steps that would prevent these use-after-frees caused by half-destructed objects?

becomes invalid as well as soon as member functions [...] start returning a RefPtr / Ref

If I understand it correctly, changing that is also only an eventual goal.

@rniwa
Copy link
Member Author

rniwa commented Jan 19, 2023

Wrapping each function argument in a Ref / RefPtr is the eventual goal.

Why do we need to make this change now in support of an eventual goal?

Because we want to deploy such code changes now as a way of evaluating either approach. Without this change being landed, doing that is rather difficult. We'd basically split the code base into two categories: one that gets called in all but inside destructors. One that gets used and should only be used inside destructors. That kind of code duplication seems rather counterproductive and may lead to new kinds of bugs.

For example, FormListedElement::formOwnerRemovedFromTree currently cannot use RefPtr because it can be invoked during node destruction.

Thank you, this is a good concrete example. I can't mentally trace the execution chain that gets us there, but the existing code smells bad:

    // Can't use RefPtr here beacuse this function might be called inside ~ShadowRoot via addChildNodesToDeletionQueue. See webkit.org/b/189493.
    Node* rootNode = &asHTMLElement();

asHTMLElement is a virtual function, and if I'm guessing correctly, we are calling it on a half-destructed object that isn't even an HTMLElement at this point any more?

No. The element in question isn't anywhere near the state of destruction but HTMLFormElement is so we can't store the result of form() into a RefPtr.

Our previous thought was that this is a bad code as you say, and we can mitigate this by avoiding to do "interesting" work inside destructors. The problem, we find it, is that there are just too many cases where this comes up.

Adding RefPtr to the mix arguably doesn't make this anti-pattern much safer. It's trivial to have a use-after-free in this scenario even when everything is RefPtr'd.

It would definitely make this code safe, and that matters because FormListedElement::formOwnerRemovedFromTree is called outside of form element's destructors.

To be clear, I agree with the desire to stop using a raw pointer here. Do you envision next steps that would prevent these use-after-frees caused by half-destructed objects?

That's not really a use-after-free of memory region per se but rather use of a destructed object that's yet to be reclaimed by the heap allocator.

becomes invalid as well as soon as member functions [...] start returning a RefPtr / Ref

If I understand it correctly, changing that is also only an eventual goal.

No, we want to make this change now, not in the future.

@aproskuryakov
Copy link
Contributor

Because we want to deploy such code changes now as a way of evaluating either approach.

This is not making sense to me. That's what branches are for, not trunk.

That's not really a use-after-free of memory region per se but rather use of a destructed object that's yet to be reclaimed by the heap allocator.

The destructed object can/will have pointers to deallocated memory. This is where the use after free is coming from.

@rniwa
Copy link
Member Author

rniwa commented Jan 19, 2023

Because we want to deploy such code changes now as a way of evaluating either approach.

This is not making sense to me. That's what branches are for, not trunk.

What do you mean by that? Could you elaborate?

That's not really a use-after-free of memory region per se but rather use of a destructed object that's yet to be reclaimed by the heap allocator.

The destructed object can/will have pointers to deallocated memory. This is where the use after free is coming from.

Not with our smart pointers because we clear the pointer values inside destructors. Our experience with iso-heap proves that there is a clear advantage in not allowing re-use of deallocated memory. The benefit of systematically preventing the use of destructed but not yet deallocated memory is not yet clear.

@aproskuryakov
Copy link
Contributor

Because we want to deploy such code changes now as a way of evaluating either approach.

This is not making sense to me. That's what branches are for, not trunk.

What do you mean by that? Could you elaborate?

I'm not sure what kind of evaluation requires making changes on trunk, and not even behind a disabled feature flag. Typically, that's not how we evaluate.

Starting with a change that disables an important security check is particularly suspect, and is not business as usual.

The destructed object can/will have pointers to deallocated memory. This is where the use after free is coming from.

Not with our smart pointers because we clear the pointer values inside destructors. Our experience with iso-heap proves that there is a clear advantage in not allowing re-use of deallocated memory. The benefit of systematically preventing the use of destructed but not yet deallocated memory is not yet clear.

A lot of classes don't use our smart pointers for data members. They use std:unique_ptr, or HashTable, or other classes that don't zero out pointers in their destructors. This is just a plain UAF, nothing novel.

@rniwa
Copy link
Member Author

rniwa commented Jan 19, 2023

What do you mean by that? Could you elaborate?

I'm not sure what kind of evaluation requires making changes on trunk, and not even behind a disabled feature flag. Typically, that's not how we evaluate.

We're talking about the deployment of smart pointers here, not an addition of new web facing API or behavior changes (e.g. live range selection). I don't think we could evaluate the value of deploying smart pointers without deploying them. Our prior experience with introducing more smart pointers in local variables proved to be an useful mitigation strategy for security bugs because we later found dozens of use-after-frees that were fixed by those deployment patches.

Starting with a change that disables an important security check is particularly suspect, and is not business as usual.

I'm not certain I'd agree with your characterization of "an important security check". We're at best checking that we're not storing an object that has started its destruction in a Ref or RefPtr in debug / ASAN builds. It's not mitigating any real security bugs in production builds whereas deploying smart pointers would be.

Not with our smart pointers because we clear the pointer values inside destructors. Our experience with iso-heap proves that there is a clear advantage in not allowing re-use of deallocated memory. The benefit of systematically preventing the use of destructed but not yet deallocated memory is not yet clear.

A lot of classes don't use our smart pointers for data members. They use std:unique_ptr, or HashTable, or other classes that don't zero out pointers in their destructors. This is just a plain UAF, nothing novel.

That seems like an argument for re-introducing our own WTF::UniquePtr. In the case of HashTable, I'm failing to see why not landing this patch will mitigate any security bugs thereof since, again, the current assertion is only enabled in debug / ASAN builds. I'm adding new release assertions to catch cases where we were previously doing use-after-free so landing this patch seems decidedly in favor of preventing security bugs.

@aproskuryakov
Copy link
Contributor

So... What is the evaluation plan?

Here is what I'd expect:

  1. Do a small yet complete adoption in a subsystem, on a branch.
  2. Adjust the plan according to learnings from step 1, and discuss it with a wider group of people in more detail.
  3. Do performance testing as applicable to the subsystem (would likely have to be a subtest, not our normal test suites where it would all drown in the noise).
  4. Merge to trunk and start wider adoption.

Progressing beyond stage 1 would be a big achievement. If "smart pointers everywhere" is applied to a hard case (e.g. affecting the Document destructor), it will likely cause enough code flow changes to require refactoring that's bigger than what you previously considered too complicated.

That seems like an argument for re-introducing our own WTF::UniquePtr.

If making all destructed subclass members safe to use is a requirement for success here, that needs to be part of a plan, with a review of what it actually takes to get there.

It's worth a separate discussion that isn't buried in review comments, but zeroing out the pointers is not a workable solution, in my opinion. Relying on zero pointer dereference as a security defense doesn't work in general, because the compiler will optimize out undefined behavior. It's particularly unlikely to work here, as much of the code gets inlined, thus being visible to the optimizer.

@aproskuryakov
Copy link
Contributor

I'm not certain I'd agree with your characterization of "an important security check".

Indeed, the assertion is fairly weak, and mostly tells one that they got too happy with overburdening their destructors. Yet, it caught actual security bugs enough times for us to be serious about what we replace it with.

In general, those huge destructors like ~Document are a nightmare, as you are well aware. They break when one looks at them wrong, are super hard to reason about, and bring their share of memory corruption bugs.

@fujii
Copy link
Contributor

fujii commented Jan 20, 2023

I agree with ap.

The assertions have been useful.
https://bugs.webkit.org/buglist.cgi?list_id=8928367&query_format=advanced&short_desc=m_deletionHasBegun&short_desc_type=allwordssubstr

In FormListedElement::formOwnerRemovedFromTree, form() should return WeakPtr.
FormAssociatedElement::form() const should return the WeakPtr rather than converting to a raw pointer.

HTMLFormElement* form() const { return m_form.get(); }

@othermaciej
Copy link
Contributor

Hi all, it seems to me the controversial assertion doesn’t actually prevent inappropriate use of RefCounted objects within the destructor, or even keeping a (raw) pointer to them beyond their lifetime. It just prevents calling ref/deref, with the goal, I guess, of preventing confusing pseudo-resurrection scenarios where the object ends up destructed by the ref count is bumped back above 0 by the time destruction is done.

I think pseudo-resurrection can be defended against in a way that still allows temporary refs during the destructor: add an assertion to the end of deref() that after the destructor is called, the object doesn’t have more than the expected refs. It might even be possible to make this a RELEASE_ASSERT, since this shouldn’t be the fast path of deref(). Or maybe it should go in ~RefCountedBase? If that’s the most-base class, its destructor should be called last, right? (Maybe this doesn’t work with multiple inheritance).

I guess another thing we’d want to ensure is that, even with temporary refs, we don’t re-enter the destructor, so deref() has to be smart enough to not delete the pointer again when a ref/deref pair occurs on an object that’s already in the course of being destructed.

I think these changes would add the same actual safety guarantees as the current assert (maybe better, if it could be a RELEASE_ASSERT and not just ASSERT_WITH_SECURITY_IMPLICATIONS), without conflicting with the purpose of this patch.

@rniwa
Copy link
Member Author

rniwa commented Jan 20, 2023

Hi all, it seems to me the controversial assertion doesn’t actually prevent inappropriate use of RefCounted objects within the destructor, or even keeping a (raw) pointer to them beyond their lifetime. It just prevents calling ref/deref, with the goal, I guess, of preventing confusing pseudo-resurrection scenarios where the object ends up destructed by the ref count is bumped back above 0 by the time destruction is done.

Right, and it only does so in debug / ASAN builds.

I think pseudo-resurrection can be defended against in a way that still allows temporary refs during the destructor: add an assertion to the end of deref() that after the destructor is called, the object doesn’t have more than the expected refs.

It might even be possible to make this a RELEASE_ASSERT, since this shouldn’t be the fast path of deref(). Or maybe it should go in ~RefCountedBase? If that’s the most-base class, its destructor should be called last, right? (Maybe this doesn’t work with multiple inheritance).

This PR does exactly that in RefCountedBase.

I guess another thing we’d want to ensure is that, even with temporary refs, we don’t re-enter the destructor, so deref() has to be smart enough to not delete the pointer again when a ref/deref pair occurs on an object that’s already in the course of being destructed.

This is already the case. See https://commits.webkit.org/216324@main for example. The last deref() leaves ref count at 1 so if there are multiple ref/deref calls, we would never try to delete it again.

I think these changes would add the same actual safety guarantees as the current assert (maybe better, if it could be a RELEASE_ASSERT and not just ASSERT_WITH_SECURITY_IMPLICATIONS), without conflicting with the purpose of this patch.

Right. As a matter of fact, this PR does exactly that. In lieu of ASSERT_WITH_SECURITY_IMPLICATIONS, we have a release assert that ref count is 1 at the end of RefCountedBase's destructor.

@aproskuryakov
Copy link
Contributor

Right, the PR already does what's proposed. I don't think that it's useful as a replacement, as we need to prevent using data members from super-classes that are already destructed. Guarding against (pseudo-)resurrection or double deletes is separate.

The goal and result of this PR is allowing more unsafe code, which is why I object to it.

@othermaciej
Copy link
Contributor

Right, the PR already does what's proposed. I don't think that it's useful as a replacement, as we need to prevent using data members from super-classes that are already destructed. Guarding against (pseudo-)resurrection or double deletes is separate.

I don’t see how the old assert prevents using data members from super-classes that are already destructed. Downcast operators (whether built in or our custom ones) wouldn’t do a ref(). At best, we’d be hoping that most such functions do a ref(). But the fact that this patch is needed seems to be evidence that many don’t.

OTOH landing this patch and then mechanically deploying smart pointer correctness would allow us to find cases where any functions outside the class do anything at all with a half-destructed object and do a project to eliminate such cases comprehensively, because they’d have to ref(). In theory we could do that incrementally first and then comprehensively deploy smart pointers. But I suspect many of the functions doing access without taking a ref during destruction are also used outside destruction, and their lack of ref may lead to latent memory corruption bugs in other situations. I don’t think it’s good to delay fixing those memory corruption bugs.

I’ll add that it’s not clear to me how easy it would be to code around the need to call a nontrivial function in a destructor. It’s probably a bigger project in the average case than deploying a smart pointer to a site that is missing one. It’s probably not good to block a relatively easy project with security benefits behind a harder project with security benefits. Maybe what we need to do is propose and schedule a project for when we do the cleanup of suspiciously complex code in destructors.

One more thought: deploying smart pointers comprehensively removes an entire class of bug (UAF of RefCounted objects), while the assert being removed here only finds some subset of a class of problems (inappropriate use of half-destructed objects) by chance.

@aproskuryakov
Copy link
Contributor

We've already done this. The branch was called 'main'.

Tell me more! What subsystem is the plan fully implemented in, and how did it go?

There is nothing preventing Ref / RefPtr in more places as is. The only scenario in which this PR changes behavior is when an object which has begun deleting itself is stored in Ref/RefPtr. In those cases, lifetime of objects will not change

Very good point. Let me think a bit.

@othermaciej
Copy link
Contributor

The scenario that I expect to happen if/when we land this patch:

  1. More Ref's are added in the code base, because there is no obstacle any more.
  2. This changes lifetime of objects, so they are destroyed later, and send their wasDestroyed and removeFrom-like delegate calls later.
  3. This introduces more bugs than it fixes, because our code has been painstakingly adjusted to have objects destroyed in the one true order.
  4. Fixing these is hard, and either gets us into a similar state of unsafe equilibrium, or forces us to develop and deploy coding techniques for safely destroying objects.

I think that we need to develop and deploy such techniques earlier than that.

Even disregarding the argument elsewhere that this patch would only allow RefPtr in paths that don’t change lifetime…

This seems like a fully general argument against fixing correctness of ref counting in our code, which surely cannot be right. We can’t hold back from fixing security bugs simply because in theory it could cause indirect effects that might cause non-obvious regressions. We have testing to protect us from regressions.

Also, this argument is no longer about this assert directly providing any protection. It’s about keeping the assert specifically to impede the smart pointer checker project. That doesn’t seem like a good reason to r- the patch. If smart pointer checking is bad, actually, that is a case that needs to be made in the context of reviewing our overall plan for improving security, not this one patch review.

One more thought: deploying smart pointers comprehensively removes an entire class of bug (UAF of RefCounted objects)

I don't think that we can call it comprehensive, as RefCounted objects will still be subject to UAF through destroyed pointers in half-destroyed objects (or if nulled out, there will be a secondary vulnerability caused by elimination of UB). There also isn't anything in the published plan about containers of RefCounted objects.

Ok, greatly narrows it to this rare case. (But I’m not sure this is actually possible b/c once the derived class destructor has run, I don’t think anything will downcast the object from the base class to the derived class, C++ will treat is as if it’s now the base class. Which could still cause bugs, but they would be more subtle logic bugs.)

It is certainly true that UAF bugs happen because the pointers are not smart, but that in itself is not enough - it's not like we simply have unbalanced manual new/delete these days.

I wouldn’t bet on this being true. To the extent it is, it’s thanks to use of smart pointers (single ownership and weak pointer variants as well as ref counting variants).

A lot of what we are saying is unproven. So I still propose the evaluation plan that I posted above, implementing this for a subsystem on a branch. This makes more sense to me than assuming that nothing can go wrong, and destabilizing the tree with reference cycles, null pointer and memory corruption crashes for evaluation purposes.

Security bugs existing due to missing ref counting is proven (unless you believe we have fixed the last ever). Attackers looking to exploit WebCore grep for raw pointers.

Smart pointers fixing individual security bugs has also been proven. And we also know code that correctly uses smart pointers has not been the site of this type of UAF type confusion bug, at least in bugs found by or reported to us.

I am not sure what remains to prove. That adding smart pointers en masse won’t change this? But we have already run that experiment by adding smart pointers en masse based on manual review, or on earlier versions of the tool.

Not sure what could provide more evidence short of doing the whole project.

@geoffreygaren
Copy link
Contributor

We've already done this. The branch was called 'main'.

Tell me more! What subsystem is the plan fully implemented in, and how did it go?

Define 'subsystem', 'fully implemented', and the success criteria for 'how did it go'.

We have 25607 uses of RefPtr, 6076 uses of RetainPtr, and 3861 uses of WeakPtr in WebKit. And, again, we adopted ARC, which has the same semantics as this patch (weaker, actually), in all of Safari. To the extent that we used to have use after free bugs all the time, and we now have them much more rarely, I would say it went really well.

To repeat my question above, what additional adoption would be bigger than the adoption we've done so far, and yet still 'small' and yet still 'complete', and how would you evaluate whether it was ready to merge back to main?

@aproskuryakov
Copy link
Contributor

Define 'subsystem', 'fully implemented', and the success criteria for 'how did it go'.

I don't think that's on me to define. What is your plan? Ryosuke said that this needed to be landed in order to evaluate, and sounds like there is no plan to evaluate anything, given this quoted question, and your other comments.

To repeat my question above, what additional adoption would be bigger than the adoption we've done so far, and yet still 'small' and yet still 'complete', and how would you evaluate whether it was ready to merge back to main?

Adding RefPtrs where engineers found them to make sense demonstrates nothing about what will happen when they are deployed consistently as a consequence of formal policy. This policy is almost certainly going to be incomplete/wrong in its first iteration at least. Did you try to actually make any code at all to work the way it's envisioned for all WebKit code to converge towards?

ARC is at best a philosophical example. The language is different enough (notably, well defined behavior of sending messages to nil), and toolchain support is on a different level.

Still thinking about Ryosuke's and Maciej's comments.

@rniwa
Copy link
Member Author

rniwa commented Jan 21, 2023

We've already done this. The branch was called 'main'.

Tell me more! What subsystem is the plan fully implemented in, and how did it go?

webkit.org/b/250637 / rdar://104274947 is an example of a security bug that got fixed by previous mass deployment of smart pointers. We've found numerous other cases where use-after-free bugs were fixed by other patches Jiewen and I landed a few years ago.

This seems like a fully general argument against fixing correctness of ref counting in our code, which surely cannot be right. We can’t hold back from fixing security bugs simply because in theory it could cause indirect effects that might cause non-obvious regressions. We have testing to protect us from regressions.

To be fair, we've found such an example in the past despite of these assertions. When Jiewen & I mass deployed smart pointers, new security bugs were introduced by the way of double deleting inside some destructors. This occurred because we used to let m_refCount become 0 in deref() before executing delete this, in which case, storing it again in a Ref or RefPtr inside its destructors would result in m_refCount to become 0 again, and results in double (or more) delete this. The said security bug were introduced despite of having the current debug / ASAN assertions because we didn't have any layout test that triggers this condition, and the same assertion wasn't enabled in production builds. We've since addressed this problem by leaving m_refCount at 1 in deref call which triggers delete this so that storing it again in Ref or RefPtr inside objects destructor would not result in a security bug.

This patch improves upon this learning experience by introducing a new release assert for m_refCount == 1 at the end of destruction in ~RefCountedBase. Whilst this doesn't fully protect us from using objects that had ran some destructors, as you elaborated, it still prevents us from leaving an external Ref or RefPtr to the object once the memory is deallocated, and does so in production builds.

@rniwa
Copy link
Member Author

rniwa commented Jan 21, 2023

To repeat my question above, what additional adoption would be bigger than the adoption we've done so far, and yet still 'small' and yet still 'complete', and how would you evaluate whether it was ready to merge back to main?

Adding RefPtrs where engineers found them to make sense demonstrates nothing about what will happen when they are deployed consistently as a consequence of formal policy. This policy is almost certainly going to be incomplete/wrong in its first iteration at least. Did you try to actually make any code at all to work the way it's envisioned for all WebKit code to converge towards?

Sounds like this is an argument against deploying types of smart pointers that can affect object lifetime like Ref and RefPtr in general. To your credit, we know such a deployment could introduce new security bugs as I've outlined in #8748 (comment) until we made the aforementioned change to keep m_refCount at 1 in the last call to deref(). Today, I don't believe we have a similar issue where storing an object which has begun its destruction in Ref / RefPtr can result in a new security bug.

And again, even if that there was such a concern / risk, I'd argue that such a concern is orthogonal to the PR at hand here. This PR removes assertions which will be hit if someone attempts to store an object which has begun its destruction in delete this inside deref(). Since this PR also introduces a new release assert at the end of ~RefCountedBase to assert that m_refCount remains at 1, it's impossible to keep such a reference beyond the completion of delete this, meaning that the lifetime of object in question is unaffected. While it's possible that someone writes a new code which stores a half-destructed RefCounted object in Ref / RefPtr during its destruction after this patch has landed, as Maciej has pointed out, such a code can already exist today if it used raw pointer / reference instead.

@aproskuryakov
Copy link
Contributor

Thank you Ryosuke, I really appreciate your constructive and specific answers. I agree that my concern about introducing bugs by changing object lifetime is orthogonal to this PR. But reading back through this discussion, I find myself at a loss - the debate is way too fluid, and it's not clear to me if positions are getting clarified as part of the debate, or if you, Geoff and Maciej are talking about different things. It feels more like the latter. May I confirm what the points are with you?

  1. Why is this PR needed, and why is it needed on main? You said that it's needed to evaluate approaches, and also that we did some mass deployment already. Geoff is saying that we don't need to evaluate anything. Maciej seems to be concerned that this debate blocks us from fixing specific security bugs, and also that the debate blocks us from doing mass deployment (but we did mass deployment before, and specific security bugs keep being fixed).

  2. I understand that we all want some kind of mass deployment of smart pointers, but the plan is not clear. I cannot comment on whether it is appropriate to Ref half-destroyed objects at this point in the project without understanding the bigger picture.

To illustrate why I'm asking this:

  • I reviewed the code in ReplaceSelectionCommand that you pointed out, and it doesn't do what was recently proposed on webkit-dev. There are local variables that are raw pointers, and return types of non-trivial functions are not smart pointers.

  • The webkit-dev proposal was to change how we write new code. But we also seem to be talking about automated mass conversion of code, and about larger scale manual adoption? And seems like there are multiple steps, with the later ones depending on new tooling?

  • As I mentioned earlier, I'm not only concerned about new security bugs being introduced by changing the lifetime (and while the improvement you are talking about is good, it doesn't address how exactly we get to not having dangling pointers, or any UB concerns). I'm concerned that it will be simply unworkable, e.g. no way to eliminate reference cycles when writing code by the book. It was already agreed upon on webkit-dev that part of the envisioned approach was unworkable without new tooling, and we haven't yet seen if this tooling will be sufficient. I'm also concerned that we won't eliminate enough security bugs even with a complete implementation of the plan. Maciej called it a rare case, but these security bugs are all rare cases.

  • Did we have discussions about what smart pointers to use, or are we willing to just evolve naturally? As one example, how did we decide to not use MiraclePtr? I read through some Google discussions around it, and the level of analysis that goes into those is high. They certainly don't shrug off UB.

@rniwa
Copy link
Member Author

rniwa commented Jan 22, 2023

  1. Why is this PR needed, and why is it needed on main? You said that it's needed to evaluate approaches, and also that we did some mass deployment already. Geoff is saying that we don't need to evaluate anything. Maciej seems to be concerned that this debate blocks us from fixing specific security bugs, and also that the debate blocks us from doing mass deployment (but we did mass deployment before, and specific security bugs keep being fixed).

The reason we need to land this PR is so that we can make more functions return smart pointer types instead of raw pointers & references. The evaluation process we've used in the past and we would like to use here is:

  1. Deploy more smart pointers on main
  2. Observe what kind of unexpected regressions we may find, and which security bugs are fixed

It's hard to do (2) on a branch other than main because main is what gets tested in our CI and OS builds people will live on are based off of. In theory, we can setup a new branch, add CI for that, and then ask people to live on those branch builds but I don't think that level of cautiousness is warranted for this PR to land or to adopt smart pointers in more places. Our experience is that deploying smart pointers causing a new regression is more of a rare exception than a norm. Given we don't make a branch for every new WebKit feature in existence, I'm failing to see why this PR or deploying smart pointers warrants such a high level of caution. What do you see are the distinctive risks that we're undertaking with this PR / deploying smart pointers that's not equally likely to happen with all other code changes that happen on main?

  1. I understand that we all want some kind of mass deployment of smart pointers, but the plan is not clear. I cannot comment on whether it is appropriate to Ref half-destroyed objects at this point in the project without understanding the bigger picture.

The plan is for me to experiment deploying smart pointers in more places using new rules and observe what kind of new smart pointer features are needed, or what kind of regressions we may unexpectedly introduce.

  • I reviewed the code in ReplaceSelectionCommand that you pointed out, and it doesn't do what was recently proposed on webkit-dev. There are local variables that are raw pointers, and return types of non-trivial functions are not smart pointers.

Right. Jiewen's patch was in no way comprehensive deployment of smart pointers. Yet, it still informed us of benefit of deploying smart pointers in local variables and because it fixed a real security bug without us even noticing it. Jiewen and I have also landed other patches which made some functions return smart pointer types as well, and that's the basis of why I'm suggesting to do the same in new code.

  • The webkit-dev proposal was to change how we write new code. But we also seem to be talking about automated mass conversion of code, and about larger scale manual adoption? And seems like there are multiple steps, with the later ones depending on new tooling?

Right, we want to deploy smart pointers everywhere eventually using the comprehensive rule we came up with a few years ago: https://lists.webkit.org/pipermail/webkit-dev/2020-September/031386.html

I don't think the tooling is up to that level of adoption yet so I'm proposing to adopt a smaller scope of using smart pointers in local variables & member variables in new code - new code because deploying smart pointers in new code (hopefully) shouldn't introduce (perf or not) regressions in existing code. This PR will help this limited scope adoption by allowing local variables in functions that get called inside destructors to also use smart pointers.

But most importantly, this PR will allow us to experiment with making more functions return smart pointers instead of raw pointers or references. As mentioned above, Jiewen has landed such a patch in the past, and we have some experience with it but one of the major roadblocks in deploying such code changers was the use of accessor functions getting called in destructors.

Of course, we can deploy more smart pointers in local variables and data members with the existing RefCounted's behavior outside of those functions that get called in destructors without this PR.

  • As I mentioned earlier, I'm not only concerned about new security bugs being introduced by changing the lifetime (and while the improvement you are talking about is good, it doesn't address how exactly we get to not having dangling pointers, or any UB concerns). I'm concerned that it will be simply unworkable, e.g. no way to eliminate reference cycles when writing code by the book. It was already agreed upon on webkit-dev that part of the envisioned approach was unworkable without new tooling, and we haven't yet seen if this tooling will be sufficient. I'm also concerned that we won't eliminate enough security bugs even with a complete implementation of the plan. Maciej called it a rare case, but these security bugs are all rare cases.

The way to avoid reference cycle is to use CheckedRef, CheckedPtr, or WeakPtr/ThreadSafeWeakPtr instead of using raw pointers and references. I agree with you that there is a risk that we introduce a new reference cycle or other kinds of bugs if we deploy smart pointers in more places but I don't think that risk is any higher than someone implementing a new WebKit feature or fixing an existing bug introducing the same.

  • Did we have discussions about what smart pointers to use, or are we willing to just evolve naturally? As one example, how did we decide to not use MiraclePtr? I read through some Google discussions around it, and the level of analysis that goes into those is high. They certainly don't shrug off UB.

I don't see why we can't adopt something like MiraclePtr if there is a valid use case for it in WebKit. Blink does employ a number of novel approaches to memory management including but not limited to LLVM plugin that automatically generates write barriers for generational garbage collection in oil pan (their garbage collector for C++ objects). Geoff and I have discussed about pros and cons of adopting such an approach in the past for example but we've concluded that what we're planning to do (as outlined in the webkit-dev thread) is pretty much isomorphic to what they did; it's just that we're doing it manually in the code base instead of doing it automatically by the way of a compiler plugin.

@aproskuryakov
Copy link
Contributor

Thank you, this is very helpful. Looks like there are two directions being explored in parallel, both sensible:

  1. Deploy a style checker that enforces some rules. This is where I'd expect a fairly large effort on branch to confirm that the proposed rules are workable. I guess what's being proposed is to skip this step, and just wait to see if WebKit developers revolt?

  2. Work that you describe above, make more functions return smart pointer types instead of raw pointers & references, in a largely ad hoc manner. I agree that this is main branch kind of work. Do you have any specific patches that are blocked on this PR? It would be useful to see if it's indeed unreasonably hard to avoid the need to ref half-destroyed objects.

To me, a function that returns Element* in a situation where the object is half-destroyed and is actually only a Node* is a big red flag, and going out of our way so that it could return RefPtr<Element> seems sad. I appreciate that the issue has been considered, and agree that the "split the code base into two categories" approach you described earlier hardly makes sense. A global project like Maciej described to stop doing any "interesting" work in any destructors before doing anything else also wouldn't be entirely pragmatic. I do however hope that with a wider discussion, we can come up with good ad hoc solutions which could then be generalized.

This would be impractical if the next step was to do automatic code replacement across the project, but with the actual next steps that are planned, this doesn't seem like a crazy thing to take a look at.

what we're planning to do (as outlined in the webkit-dev thread)

This is not very related to this PR, however reading the thread made me wish for two things:

  • a wiki document that describes what we are trying to do, not just a thread which patches the proposal with clarifications;
  • a discussion of why we can postpone figuring out what to do with containers (like Vector<Node*> or HashMap<RenderBox*, RenderFragmentContainer*>).

@rniwa
Copy link
Member Author

rniwa commented Jan 23, 2023

Thank you, this is very helpful. Looks like there are two directions being explored in parallel, both sensible:

  1. Deploy a style checker that enforces some rules. This is where I'd expect a fairly large effort on branch to confirm that the proposed rules are workable. I guess what's being proposed is to skip this step, and just wait to see if WebKit developers revolt?

Perhaps we should continue this discussion in webkit-dev or #8907. But we already have a lot of experience deploying smart pointers for local variables & member variables so I'm pretty confident that we can deploy this style rule although we may need to make exceptions for things like JSCell*.

  1. Work that you describe above, make more functions return smart pointer types instead of raw pointers & references, in a largely ad hoc manner. I agree that this is main branch kind of work. Do you have any specific patches that are blocked on this PR? It would be useful to see if it's indeed unreasonably hard to avoid the need to ref half-destroyed objects.

To me, a function that returns Element* in a situation where the object is half-destroyed and is actually only a Node* is a big red flag, and going out of our way so that it could return RefPtr<Element> seems sad. I appreciate that the issue has been considered, and agree that the "split the code base into two categories" approach you described earlier hardly makes sense. A global project like Maciej described to stop doing any "interesting" work in any destructors before doing anything else also wouldn't be entirely pragmatic. I do however hope that with a wider discussion, we can come up with good ad hoc solutions which could then be generalized.

Sure, let me go find such a code change.

what we're planning to do (as outlined in the webkit-dev thread)

This is not very related to this PR, however reading the thread made me wish for two things:

  • a wiki document that describes what we are trying to do, not just a thread which patches the proposal with clarifications;
  • a discussion of why we can postpone figuring out what to do with containers (like Vector<Node*> or HashMap<RenderBox*, RenderFragmentContainer*>).

Good points. Would you mind replying to the email thread? Perhaps I should write some document about how smart pointers should be used ideally, and outline our plan to get there?

@rniwa
Copy link
Member Author

rniwa commented Jan 24, 2023

Here's a patch to make FormAssociatedElement::form return a RefPtr<HTMLFormElement>: form-ref-ptr.patch.

With this patch applied, we hit the following assertion when running layout tests:

ASSERTION FAILED: !m_deletionHasBegun
/Volumes/Data/safari-2/OpenSource/Source/WebCore/dom/Node.h(803) : void WebCore::Node::ref() const
1   0x15dbb8959 WTFCrash
2   0x17eb60840 PAL::AXCustomContentFunction()
3   0x17f4d9b06 WebCore::Node::ref() const
4   0x181aeb37e WTF::DefaultRefDerefTraits<WebCore::HTMLFormElement>::refIfNotNull(WebCore::HTMLFormElement*)
5   0x181aeb349 WTF::RefPtr<WebCore::HTMLFormElement, WTF::RawPtrTraits<WebCore::HTMLFormElement>, WTF::DefaultRefDerefTraits<WebCore::HTMLFormElement> >::RefPtr(WebCore::HTMLFormElement*)
6   0x181aeb2dd WTF::RefPtr<WebCore::HTMLFormElement, WTF::RawPtrTraits<WebCore::HTMLFormElement>, WTF::DefaultRefDerefTraits<WebCore::HTMLFormElement> >::RefPtr(WebCore::HTMLFormElement*)
7   0x181aeaef4 WebCore::FormAssociatedElement::form() const
8   0x188d432c9 WebCore::FormListedElement::formWillBeDestroyed()
9   0x188e07802 WebCore::HTMLFormElement::~HTMLFormElement()
10  0x188e07f95 WebCore::HTMLFormElement::~HTMLFormElement()
11  0x188e07fb9 WebCore::HTMLFormElement::~HTMLFormElement()
12  0x1886c2798 WebCore::Node::removedLastRef()
13  0x17f4c8103 WebCore::Node::deref() const
14  0x17f4c7d2e WTF::DefaultRefDerefTraits<WebCore::Node>::derefIfNotNull(WebCore::Node*)
15  0x17f4c7c84 WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> >::~RefPtr()
16  0x17f4c7b65 WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> >::~RefPtr()
17  0x1865ef4f6 WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> >::operator=(WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> > const&)
18  0x1881e439f WebCore::removeDetachedChildrenInContainer(WebCore::ContainerNode&)
19  0x1881e3df4 WebCore::ContainerNode::removeDetachedChildren()
20  0x1881e4e45 WebCore::ContainerNode::~ContainerNode()
21  0x18849aa36 WebCore::Element::~Element()
22  0x188823481 WebCore::StyledElement::~StyledElement()
23  0x17f9e7675 WebCore::HTMLElement::~HTMLElement()
24  0x188da8905 WebCore::HTMLBodyElement::~HTMLBodyElement()
25  0x188da8925 WebCore::HTMLBodyElement::~HTMLBodyElement()
26  0x188da8949 WebCore::HTMLBodyElement::~HTMLBodyElement()
27  0x1886c2798 WebCore::Node::removedLastRef()
28  0x17f4c8103 WebCore::Node::deref() const
29  0x17f4c7d2e WTF::DefaultRefDerefTraits<WebCore::Node>::derefIfNotNull(WebCore::Node*)
30  0x17f4c7c84 WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> >::~RefPtr(
ASSERTION FAILED: !m_deletionHasBegun
/Volumes/Data/safari-2/OpenSource/Source/WebCore/dom/Node.h(803) : void WebCore::Node::ref() const
1   0x570a17959 WTFCrash
2   0x5919bf9c0 PAL::AXCustomContentFunction()
3   0x592338c86 WebCore::Node::ref() const
4   0x59494a4fe WTF::DefaultRefDerefTraits<WebCore::HTMLFormElement>::refIfNotNull(WebCore::HTMLFormElement*)
5   0x59494a4c9 WTF::RefPtr<WebCore::HTMLFormElement, WTF::RawPtrTraits<WebCore::HTMLFormElement>, WTF::DefaultRefDerefTraits<WebCore::HTMLFormElement> >::RefPtr(WebCore::HTMLFormElement*)
6   0x59494a45d WTF::RefPtr<WebCore::HTMLFormElement, WTF::RawPtrTraits<WebCore::HTMLFormElement>, WTF::DefaultRefDerefTraits<WebCore::HTMLFormElement> >::RefPtr(WebCore::HTMLFormElement*)
7   0x59494a074 WebCore::FormAssociatedElement::form() const
8   0x59bfdf81b WebCore::ValidatedFormListedElement::willChangeForm()
9   0x59bccf049 WebCore::HTMLInputElement::willChangeForm()
10  0x59bba2448 WebCore::FormListedElement::formWillBeDestroyed()
11  0x59bc66802 WebCore::HTMLFormElement::~HTMLFormElement()
12  0x59bc66f95 WebCore::HTMLFormElement::~HTMLFormElement()
13  0x59bc66fb9 WebCore::HTMLFormElement::~HTMLFormElement()
14  0x59b521918 WebCore::Node::removedLastRef()
15  0x592327283 WebCore::Node::deref() const
16  0x592326eae WTF::DefaultRefDerefTraits<WebCore::Node>::derefIfNotNull(WebCore::Node*)
17  0x592326e04 WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> >::~RefPtr()
18  0x592326ce5 WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> >::~RefPtr()
19  0x59944e676 WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> >::operator=(WTF::RefPtr<WebCore::Node, WTF::RawPtrTraits<WebCore::Node>, WTF::DefaultRefDerefTraits<WebCore::Node> > const&)
20  0x59b04351f WebCore::removeDetachedChildrenInContainer(WebCore::ContainerNode&)
21  0x59b042f74 WebCore::ContainerNode::removeDetachedChildren()
22  0x59b043fc5 WebCore::ContainerNode::~ContainerNode()
23  0x59b2f9bb6 WebCore::Element::~Element()
24  0x59b682601 WebCore::StyledElement::~StyledElement()
25  0x5928467f5 WebCore::HTMLElement::~HTMLElement()
26  0x59bc07905 WebCore::HTMLBodyElement::~HTMLBodyElement()
27  0x59bc07925 WebCore::HTMLBodyElement::~HTMLBodyElement()
28  0x59bc07949 WebCore::HTMLBodyElement::~HTMLBodyElement()
29  0x59b521918 WebCore::Node::removedLastRef()
30  0x592327283 WebCore::Node::deref() const

@fujii
Copy link
Contributor

fujii commented Jan 26, 2023

When ~Document is called, all Ref<Document> are already destructed and all RefPtr<Document> are null-ed.
It' OK to copy RefPtr<Document> during the ~Document time because it is null, no ref() is called.
However, WeakPtr<Document> returns a pointer to half-destructed Document object.
It needs some modification like as I said in #8748 (comment) .
Another possible solution is adding a new method willDestory to CanMakeWeakPtr class, and call the method at the beginning of ~Document.

@geoffreygaren
Copy link
Contributor

When ~Document is called, all Ref<Document> are already destructed and all RefPtr<Document> are null-ed.
It' OK to copy RefPtr<Document> during the ~Document time because it is null, no ref() is called.
However, WeakPtr<Document> returns a pointer to half-destructed Document object.
It needs some modification like as I said in #8748 (comment) .
Another possible solution is adding a new method willDestory to CanMakeWeakPtr class, and call the method at the beginning of ~Document.

This assertion is about RefPtr, not WeakPtr.

@fujii
Copy link
Contributor

fujii commented Jan 30, 2023

This assertion is about RefPtr, not WeakPtr.

Look at the patch and the backtrace in #8748 (comment) .
The assertion is failing when creating a RefPtr of HTMLFormElement from a pointer to half-destructed HTMLFormElement object that was returned by a WeakPtr during the HTMLFormElement destructor.

@rniwa
Copy link
Member Author

rniwa commented Jan 30, 2023

This assertion is about RefPtr, not WeakPtr.

Look at the patch and the backtrace in #8748 (comment) . The assertion is failing when creating a RefPtr of HTMLFormElement from a pointer to half-destructed HTMLFormElement object that was returned by a WeakPtr during the HTMLFormElement destructor.

I don't see follow how that relates to Document's destructor at all.

@fujii
Copy link
Contributor

fujii commented Jan 30, 2023

I don't see follow how that relates to Document's destructor at all.

The assertion failure in #8748 (comment) is about RefPtr<HTMLFormElement>.
It'd be quite informative if you attach the backtrace of RefPtr<Document> assertion failure like what you did for HTMLFormElement.

@aproskuryakov
Copy link
Contributor

Here's a patch to make FormAssociatedElement::form return a RefPtr: form-ref-ptr.patch.

I just realized that this patch isn't consistent with the plan posted to webkit-dev a couple weeks ago:

We made an exception to β€œtrivial functions” to allow chaining like the one above possible. A trivial function is any inlined accessor of a member variable the form: Document& document() { return m_document.get(); } or Frame* frame() { return m_frame.get(); }

But then again, this exception for trivial functions is not in https://github.com/WebKit/WebKit/wiki/Smart-Pointer-Usage-Rules. Could you please clarify if it's still PoR? Will we have trivial functions return RefPtr too?

I've started looking at the crash in a debugger, and another observation is that FormAssociatedElement::m_form is a WeakPtr. Shouldn't m_form.get() return 0 in this case, in the first place? This is how C++ weak_ptr behaves.

@rniwa
Copy link
Member Author

rniwa commented Jan 31, 2023

Here's a patch to make FormAssociatedElement::form return a RefPtr: form-ref-ptr.patch.

I just realized that this patch isn't consistent with the plan posted to webkit-dev a couple weeks ago:

Again, the plan here is to deploy smart pointers in more places than what I was suggesting in that thread. This PR is not needed to adopt the proposal to deploy smart pointers for data members & local variables. The point of this PR is to allow more deployment of smart pointers beyond those.

We made an exception to β€œtrivial functions” to allow chaining like the one above possible. A trivial function is any inlined accessor of a member variable the form: Document& document() { return m_document.get(); } or Frame* frame() { return m_frame.get(); }

But then again, this exception for trivial functions is not in https://github.com/WebKit/WebKit/wiki/Smart-Pointer-Usage-Rules. Could you please clarify if it's still PoR? Will we have trivial functions return RefPtr too?

That's covered here:

  1. Every object passed to a non-trivial function as an argument (including "this" pointer) should be stored as a Ref, RefPtr, CheckedRef, or CheckedPtr in the caller’s local scope unless it's an argument to the caller itself by the transitive property [1]. alpha.webkit.UncountedCallArgsChecker

@rniwa
Copy link
Member Author

rniwa commented Jan 31, 2023

We made an exception to β€œtrivial functions” to allow chaining like the one above possible. A trivial function is any inlined accessor of a member variable the form: Document& document() { return m_document.get(); } or Frame* frame() { return m_frame.get(); }

But then again, this exception for trivial functions is not in https://github.com/WebKit/WebKit/wiki/Smart-Pointer-Usage-Rules. Could you please clarify if it's still PoR? Will we have trivial functions return RefPtr too?

I've started looking at the crash in a debugger, and another observation is that FormAssociatedElement::m_form is a WeakPtr. Shouldn't m_form.get() return 0 in this case, in the first place? This is how C++ weak_ptr behaves.

C++ weak_ptr works that way because it only works with objects for which shared_ptr exists but that's not the case with our WeakPtr so I'm not certain that we can implement the same semantics. It probably involves making each class that inherits from CanMakeWeakPtr and RefCounted manually call revokeAll as the very first thing in its destructor.

@aproskuryakov
Copy link
Contributor

Sorry, I may not be following. https://github.com/WebKit/WebKit/wiki/Smart-Pointer-Usage-Rules also doesn't say that function returns will be smart pointers, as far as I can tell. So why do we want form-ref-ptr.patch, specifically this part:

-    HTMLFormElement* form() const { return m_form.get(); }
+    RefPtr<HTMLFormElement> form() const { return m_form.get(); }

@rniwa
Copy link
Member Author

rniwa commented Jan 31, 2023

Sorry, I may not be following. https://github.com/WebKit/WebKit/wiki/Smart-Pointer-Usage-Rules also doesn't say that function returns will be smart pointers, as far as I can tell. So why do we want form-ref-ptr.patch, specifically this part:

-    HTMLFormElement* form() const { return m_form.get(); }
+    RefPtr<HTMLFormElement> form() const { return m_form.get(); }

We may want that as an alternative to (3) storing every function argument in a smart pointer.

@geoffreygaren
Copy link
Contributor

This assertion is about RefPtr, not WeakPtr.

Look at the patch and the backtrace in #8748 (comment) . The assertion is failing when creating a RefPtr of HTMLFormElement from a pointer to half-destructed HTMLFormElement object that was returned by a WeakPtr during the HTMLFormElement destructor.

I see your point that it would be nice to tighten the behavior of WeakPtr, such that WeakPtr becomes null right at the beginning of destruction rather than some time later. Do you think that change should block further deployment of RefPtr? If so, why?

@fujii
Copy link
Contributor

fujii commented Feb 1, 2023

Do you think that change should block further deployment of RefPtr? If so, why?

There is no problem using more Ref and RefPtr unless you convert WeakPtr to RefPtr in the destruction phase.
The assertion is useful to catch such mistakes.

In FormListedElement::formOwnerRemovedFromTree, we don't have to convert a WeakPtr of form element to RefPtr because it just compares pointers. We don't have to keep a RefPtr there.

auto* currentForm = form();
for (auto* ancestor = asHTMLElement().parentNode(); ancestor; ancestor = ancestor->parentNode()) {
if (ancestor == currentForm) {

@geoffreygaren
Copy link
Contributor

Do you think that change should block further deployment of RefPtr? If so, why?

There is no problem using more Ref and RefPtr unless you convert WeakPtr to RefPtr in the destruction phase. The assertion is useful to catch such mistakes.

Sure, I think you are restating the point that tightening the behavior of WeakPtr could make this assertion go away in this instance. Do you think that change should block further deployment of RefPtr? If so, why?

Are you concerned about an attempt to retain an object past the end of destruction? Please note that this patch still asserts if a RefPtr retains an object past the end of destruction.

Even if we tighten the behavior of WeakPtr, this assertion will also fire any time we adopt RefPtr where we were previously using raw pointer. Is your proposal that converting all raw pointers to WeakPtr should also block further deployment of RefPtr? If so, why?

In FormListedElement::formOwnerRemovedFromTree, we don't have to convert a WeakPtr of form element to RefPtr because it just compares pointers. We don't have to keep a RefPtr there.

auto* currentForm = form();
for (auto* ancestor = asHTMLElement().parentNode(); ancestor; ancestor = ancestor->parentNode()) {
if (ancestor == currentForm) {

Unfortunately, I believe what you are proposing here is a classic memory safety error -- exactly the kind of error we are trying to resolve by further deploying RefPtr. If you do not retain the pointer you are comparing, it may be recycled by the memory allocator, and then you can get a false positive ==. This is sometimes a serious security bug.

You can try to stick to programming with raw pointers by reading every line of every function transitively called in this loop, and reasoning out that no line will free any objects. However, there are a few problems with that approach:

  • History has proven that human beings are not successful at this kind of reasoning -- we have the scars of thousands of use after free bugs, each one written by a very smart and dedicated engineer who attempted this reasoning
  • Even if you get this reasoning right, there's no way to keep it right in the future as other authors edit one or more of the transitive callees of this function, not always knowing that they are transitive callees of this function
  • Even if you get this right and all future authors get this right, there's no way to write a mechanical verifier that checks that it is right throughout the code base

Does that change your mind?

@fujii
Copy link
Contributor

fujii commented Feb 2, 2023

You are misunderstanding. Do you think formOwnerRemovedFromTree has the classic memory safety error because it's using a raw pointer? It has no problem even with a raw pointer or a weak pointer. The pointer is still valid in the function.

We should discuss the problem with concrete examples, like formOwnerRemovedFromTree.

@geoffreygaren
Copy link
Contributor

You are misunderstanding. Do you think formOwnerRemovedFromTree has the classic memory safety error because it's using a raw pointer?

Yes.

It has no problem even with a raw pointer or a weak pointer. The pointer is still valid in the function.

How did you reach this conclusion?

Was it by reading every line of every function transitively called in this loop, and reasoning out that no line will free any objects (now or in the future)? If so, please see my explanation above for why that is not a sustainable way to program a web browser (or probably any other software).

We should discuss the problem with concrete examples, like formOwnerRemovedFromTree.

Sure, it's sometimes helpful to look at concrete examples. However, if you only look at individual examples, you will "miss the forest for the trees" (not understand or appreciate a larger situation, problem, etc., because one is considering only a few parts of it).

The motivation for deploying more idiomatic use of RefPtr, and writing coding style guidelines and verification tools that can locally verify appropriate use of RefPtr, is that even though engineers can and do reason about raw pointers successfully in many individual cases, on the whole the practice of doing so leads to hundreds of use after free bugs per year in the WebKit project. The task of more widely deploying RefPtr is motivated by that global pattern.

Do you have a proposal for avoiding the next hundred use after free bugs in WebKit without deploying more idiomatic use of RefPtr?

@fujii
Copy link
Contributor

fujii commented Feb 3, 2023

You are misunderstanding. Do you think formOwnerRemovedFromTree has the classic memory safety error because it's using a raw pointer?

Yes.

Show us the testcase that demonstrates the bug.

@rniwa
Copy link
Member Author

rniwa commented Apr 12, 2024

Closing this in favor of https://commits.webkit.org/273805@main

@rniwa rniwa closed this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants