New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: put is inline=true, but existing value is inline=false #28205
Comments
Finished the backfill of |
I think printing the existing value whenever this error occurs should put us on the hot trail. The AbortSpan is supposed to be all inline. |
@tschottdorf The existing value is
Is your suspicion that there is a non-metadata key present? |
I added a bit of additional logging to
I then ran the
But the only reference to that transaction ID is:
Seems like this is the only write of the abort span key. I'm not sure what this means. Perhaps something is taking a code path to write an abort span key without going through |
Interesting. So what do you know about the raw kv pair that is read when the put error occurs? Only that the decoded meta has |
Yeah, something funky is going on. I'm going to add a bunch more logging for abort span keys and reproduce again. Could be an engine bug. Could be something else. |
Here's another reproduction with the contents of
Is the txn-ID stored in the |
Note that I also instrumented the low-level code paths for batch.{Put,Clear} and didn't see the abort-span key in question ever being written via those code paths. Very curious. |
Wait, are you telling me that you're seeing an intent on the abort span? That's very wrong. Even seeing a versioned value would be very wrong. I guess we knew that something was very wrong before, but this is still surprising. I bet this is an intent written by the workload and it accidentally gets read back to the abort cache (i.e. it isn't really in the abort cache, just something goes wrong on the engine level). This is in line with your findings that the abort span key was never actually written - it probably wasn't! It was just about to be written for the first time. I think reasonable next steps are saving a data dir and dumping the keys (to see whether we persistently get this key displayed under the offending abortspan key for that run). If yes, great, problem almost fixed because we're going to track it down (I assume you're certain that you would know if it had actually been written there). If there just isn't an abortspan key, it must be a transient bug and I don't have a better idea other than printf-debugging all the way into RocksDB :( |
Yes, yes it is. This is reproducible, so you don't have to worry. The bug is already dead, it just doesn't know it yet. |
Got another two failures with a minor twist to the logging. Just before we generate the error, I seek the iterator to
And from a different run:
In both cases we seem to have an intent on an abort span key which should be impossible. Time to go instrument at a lower level to find out where this intent is coming from. Something that might be a coincidence, the |
If my memory serves me correctly, @spencerkimball ran into something very similar when working on the |
Yes I was seeing something like this when dealing with high contention. I put in tons of logging and kept arriving at impossible conclusions. Then something I changed caused the problem to disappear and I lost track of it. The details are hazy but I remember I even created a new engine to look up the spurious value, which seemed impossible, through a fresh code path, but my debug efforts kept foundering. It’s something subtle. |
Well I can still reproduce on current master. Takes between 20m and 3h to reproduce, but that's something I can track down. @spencerkimball Do you have any notes from your previous debugging effort? |
No I don't have notes. Damn, I was able to reproduce it in a minute or so using the YCSB workload A generator. But the changes I ended up with avoided the problem. I want to say the problem was much worse when I had a bug where I'd push txn's which didn't yet have a txn "key" because the txn's were still read-only. This of course caused those pushes to go to range 0, because the key was |
@nvanbenschoten I definitely encountered this problem in binaries from April. The problem has existed for at least that long. 9a54256 was merged on May 31. |
@nvanbenschoten I was just running the YCSB load generator and saw this error, which is now repeating endlessly: |
s/repeating endlessly/repeating until old versions of a row get GCed/ That's almost certainly due to a single row consuming more than 128MB of space, which makes some sense given the nature of YCSB. The result is expected and actually desirable given our current handling of old MVCC versions, see #24215. The alternatives are to let the range grow without bound or to violate the GC policy. How long were you running for before you hit that? I'd try dropping your gc_ttl and see if it goes away. |
The split problem sounds like a different issue. Let's file a new bug to track it. |
Btw, my hope and expectation is that I'll track this down tomorrow (or this weekend if reproduction trends to the longer side). |
I finally fixed my test setup to not wipe the cluster when a failure occurs. The next failure shows the same pattern as above, what appears to be an intent on an abort span key:
I stopped the cluster and dumped the range-data for
The stack trace I added at Also, I still have the instrumentation which detects all abort span puts and that is indicating that the above abort span key was never written. Current theory is that some rare code path is stomping/reusing memory inappropriately. |
Happened again and my instrumentation to print out the contents of the batch didn't output anything. Either that instrumentation was incorrect or the batch was empty. I added some more instrumentation to figure out which it is. Again on this failure, the abort span key with an intent was not present on disk. |
Additional instrumentation shows the batch is empty which was probably to be expected. So wtf is going on that we see an abort-span-key with an intent in memory, but it doesn't appear to exist on disk? |
This is where I ended up. No sensible explanation presented itself. |
I added some more instrumentation so that when
And the logging that occurs immediately after this (on the same goroutine, using the same batch) shows 155 abort-span keys, but the key above isn't present. Hmm, I just noticed I used |
Another repro, this time with the |
Ok, making progress. Rather using
And then the subsequent iteration using the prefix iterator:
Immediately after where that message is logged, I iterate over all of the abort-spans for the range with:
And the key |
(I really hope this doesn't turn out be a bug in RocksDB) |
Sure sounds like it. |
My instrumentation has evolved to the following:
The type switch was necessary because I wasn't quite sure of iterator I was dealing with and I wanted to reach inside of its internals. The logs show:
Then:
Then nothing. I didn't output a log message for the case where the iterator wasn't valid. (Not sure why I did that, it's been a long day).
I'd be suspicious of the recent RocksDB changes except that we've seen this bug back in April, well before those changes were made. |
AFAICT (at the moment), at some point we make a call to |
Current theory: we have 93 uses of |
I instrumented the In both the
Hard to see, but the value I audited the uses of |
Yet more instrumentation shows how funky this bug is. Whenever
This corresponds to the following bit of logging:
Let's break this down. The key passed to
Now the super interesting part. The last The small trace for the
I believe this corresponds to the So we made a call to |
I found a bug and it is a doozy and a minor miracle that it wasn't causing more problems. The easy to understand version: In 2.0 (and earlier) several additions to the I'm feeling pretty confident that this was causing the put-is-inline discrepancy, though I'll be running ycsb in a loop all day to verify. |
Amazing debugging. |
Specifically, we need a combination of MVCC calls which use
What's not obvious from the pretty-printed representation of the keys above is that A bit of additional instrumentation shows that the first 2 conditions mentioned above are reasonably common in our tests, but the 3rd condition doesn't occur. |
Fixed by #28794 (linked the wrong issue in the commit message). |
Awesome job tracking this down!
Doesn't occur or doesn't readily occur? If it's easy enough to detect these scenarios after the fact, it may be worth doing so loudly on master after the code freeze and leaving the detection in for a few days to search for any test flakes that were caused by this bug. That may help close a few long-standing test flakes that won't be reproducible from this point onward. I'm also curious if you think this could have materialized as different inconcistencies than just this |
AFAICT, it doesn't occur at all. I'm still scratching my head as to how it does occur in this scenario. As near as I can tell, the original memory the iterator pointed to needs to be freed and then reallocated and used to store the key when It is straightforward to add instrumentation to detect this scenario, though I'm not sure how to do it for particular builds (such as race-only) without increasing the size of the
Certainly possible. |
... though rare. I would guess that the recent spate of test flakiness is not due to this. Remember that this bug has been present since 2.0. |
pq: "/Local/RangeID/136/r/AbortSpan/\"c5278963-f8eb-421b-80ba-0e0030c600fe\"": put is inline=true, but existing value is inline=false
I've seen the error above a handful of times while running the
ycsb
roachtests in a loop. The error has only ever occurred with workloadsA
andB
. I'm testing historical binaries and so far the latest binary I've seen this error on was built from b5edd2a (5/29).I have logs from these roachtest runs, but the clusters have been wiped. Reproduction seems to occur in a few hours. Running ycsb workloads A and B in a loop might reproduce on master:
The text was updated successfully, but these errors were encountered: