Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't rollback when TxnScopeGuard dtor is run twice #63

Closed
wants to merge 1 commit into from
Closed

Don't rollback when TxnScopeGuard dtor is run twice #63

wants to merge 1 commit into from

Conversation

nwellnhof
Copy link

Modules like Plack::Middleware::StackTrace save a reference to the
guard object when a (nested) rollback exception is thrown during
destruction resulting in the destructor being called twice. Inactivate
the guard after the first rollback.

Modules like Plack::Middleware::StackTrace save a reference to the
guard object when a (nested) rollback exception is thrown during
destruction resulting in the destructor being called twice. Inactivate
the guard after the first rollback.
@ribasushi
Copy link
Collaborator

Hi!

The conjecture "the destructor is called twice" is an oxymoron: something else is at play. It's too late for me to investigate this tonight, I will look in detail tomorrow. Whatever the reason - your patch is very likely a bandaid, not the actual solution.

Please stand by for more...

@ribasushi
Copy link
Collaborator

@nwellnhof I looked at the source of Plack::Middleware::StackTrace, and it doesn't seem to do anything extraordinary. I am very wary of applying fixes that appears to solve an issue that defies explanation.

Can you please give me more details, in particular what exactly is happening (error messages etc) before you apply the submitted patch.

@dbsrgits-sync dbsrgits-sync force-pushed the current/for_cpan_index branch 2 times, most recently from 543e57b to 8fa8927 Compare October 23, 2014 16:57
@nwellnhof
Copy link
Author

Yes, it's a difficult issue which is now fixed in Plack 1.0033. The root of the problem is that Plack::Middleware::StackTrace called Devel::StackTrace without the 'no_refs' option which resulted in an additional reference to the TxnScopeGuard object being created during its destruction. This leads to the destructor being called twice (or even more times). I'm OK with leaving the commit unapplied but I don't think it would hurt.

I didn't know this before investigating this issue, but a Perl destructor being called twice is not an oxymoron at all. All it takes is that the destructor stores another reference to the object being destroyed in some location that outlives the destructor.

@dbsrgits-sync dbsrgits-sync force-pushed the current/for_cpan_index branch 2 times, most recently from 25b391c to ac4e80d Compare October 24, 2014 09:36
@ribasushi
Copy link
Collaborator

I didn't know this before investigating this issue, but a Perl destructor being called
twice is not an oxymoron at all. All it takes is that the destructor stores another
reference to the object being destroyed in some location that outlives the destructor.

What you are describing is the DESTROY-abort technique, which DBIx::Class has been relying on to handle its leakless circular refs for about 4 years now. The problem is that as you state all it takes is that the destructor stores another reference - this means that something has to "around" DBIx::Class::Storage::TxnScopeGuard::DESTROY() itself, and I can not find this code in Devel::StackTrace.

Could you please point me to what I am missing?

Wrt your "it wouldn't hurt" comment - no it would not hurt. It would also leave broken state out there in the wild, while having DBIC silently working around it. If such code (wrapping DESTROY) indeed exist, and behaves as you describe - it is very dangerously broken, and DBIC needs to start warning very loudly when it encounters this problem.

For comparison please see 841efcb3f#diff-c13797cc2e5864c4a1d6a92ba65871b6R81. The original proposal was why don't we just "$@" and be done, which I refused to do and went out to write the linked piece instead.

I will write the same carefully worded shim for your issue, as soon as I understand which piece of the puzzle is actually causing it. This is how we make CPAN better as a whole ;)

@ribasushi
Copy link
Collaborator

Thinking about this more - I now understand what is happening and it is truly terrible - Devel::StackTrace is very seriously broken, by potentially causing all destructors caught in a frame to run an unlimited number of times (until perl itself nukes the loop from orbit).

The fix for this lies within Devel::StackTrace - it must be adjusted to detect it is constructing a stacktrace from within DESTROY and not create any new refs to the refaddr of the argument to any higher DESTROY frames. Consider the following small (hence contrived) example:

perl -MDevel::StackTrace -e '
  sub DESTROY { warn "calling destructor and doing foo"; main::foo() };
  {
    my $trace;
    sub foo {
      $trace = Devel::StackTrace->new
    };

    { my $obj_to_be_destroyed = bless {} }

    print $trace->as_string
  }

  print "the end\n"
'

On why I qualify it as terrible - you already ran into an issue with a destructors written in a way that relies on it being a highlander. In this case the effect is a corrupted txn-state on a $dbh. There are many other cases where having a destructor execute twice will have worse consequences.

I am filing a Devel::StackTrace bug on RT with the above for @autarch

Thanks a ton to @nwellnhof for the initial investigative work that led to this diagnosis.

@nwellnhof
Copy link
Author

Like I said, Devel::StackTrace has the 'no_refs' option to deal with this situation and which Plack uses now. One could reason that this option should be enabled by default but other than that, I can't see Devel::StackTrace at fault.

@autarch
Copy link

autarch commented Feb 5, 2015

Devel::StackTrace has defaulted to not capturing refs since the 2.00 release on November 1, 2014. Anyone doing a new install will get the sane behavior, at least.

@ribasushi
Copy link
Collaborator

This has finally been properly addressed for users of older versions of D::ST, or people suffering from similar deep magic: 3d56e02#diff-657de9db8d77a27e7e03a61c453f3839R220

Sorry for not taking your patch as-is, as while technically correct it fell short on the amount of instrumentation a problem like this calls for.

Thank you for raising this in the first place!

@nwellnhof
Copy link
Author

Thanks for addressing this issue so thoroughly!

@dakkar
Copy link

dakkar commented Sep 29, 2015

I have just discovered that, on at least one machine, the test introduced by 3d56e02 fails, because the destructor is not called twice. This is on a perl 5.8.8.

I'm not sure how much I can help with reproducing, since that's one of our production machines and perlbrew does not compile 5.8.8 😞

The comment at https://github.com/dbsrgits/dbix-class/blob/current/blead/t/storage/txn_scope_guard.t#L181 makes me suspect that 5.8.8 is weird in general.

@ribasushi
Copy link
Collaborator

I have just discovered that, on at least one machine, the test introduced by 3d56e02 fails, because the destructor is not called twice.

That is a bug in my logic then. I do not think 5.8.8 has anything to do with it either, as it isbeing smoked regularly both locally and on travis among with a number of other harder-to-find perls, so something on that machine is in play.

perlbrew does not compile 5.8.8

This is news to me - what do you mean? If you mean "it does not pass tests" - yes, a lot of the older perls fail around the DBFile stuff, I keep meaning to patch Devel::PatchPerl for this. But with -n everything should just work.

I really want to get to the bottom of this, is there any way I can get a temporary shell on this box to see things fail in motion? If not - can you please get me the complete perl -V (private if need be)?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants