-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Squatter 2.5.17 core dump - gdb backtrace #3918
Comments
You can ignore those "error: Cannot access memory at address 0xffff..." in the backtrace, those are just pointers into an mmap'd file, and the mmap is no longer available when you examine the core dump after the fact. I thought they might have been the result of a signed overflow or some kind of 32bit/64bit problem (which is why I'm here), but nope, just mmaps. Here's the code that's crashing in frame 0 of that backtrace: cyrus-imapd/imap/squat_build.c Lines 785 to 794 in 15e3180
Smells to me like I know nothing about these data structures. I can't tell whether Here's where cyrus-imapd/imap/squat_build.c Lines 1414 to 1418 in 15e3180
Looks like it thinks it knows how many it needs, and tries to allocate exactly that number. Maybe it miscalculates? A disgusting hack that might work is adding 1 to this computation, like Here's where WordDocEntry is defined: cyrus-imapd/imap/squat_build.c Lines 171 to 176 in 15e3180
Note that it very clearly describes itself as a circular linked list. So it's weird to me that the "allocator" expects to just increment and find a fresh one, without checking that it's not already in use or hasn't walked off the end. I guess the If you still have that core file, can you please run the following commands in gdb and paste the output?
|
@elliefm thanks, here it is!
|
Thanks. Please hang onto this core file if you can. I will need to ask you for more gdb output from it, but I don't know what to ask for yet. If you still have the command line that was used to invoke squatter, can you post it please? I don't know it will be useful or not, but maybe some option you're using or not using will turn out to be significant, if not to the cause of the bug, then at least for helping me track the program flow through the code. I do not know squatter or the squat database at all, so every little detail may help. Please censor any user or mailbox names -- if they're included in the command line I might to see that they were included, but I won't need to see what they were. |
What does gdb say for this:
|
Here it is:
|
The command was: squatter -v -i -s user/name/folder@domain.tld |
Thanks, that at least confirms that the problem is definitely the dereference of word_entry, and not some other weird thing I hadn't considered.
What's this say?
I see you're already running squatter on an individual folder (and not an entire user). Does this folder have subfolders, and does it succeed if you run it on the subfolders individually first, before running it on the folder itself? I assume not, since I assume you either already tried this, or there are no subfolders to try it on. But I'd like to confirm and not just assume. |
Looks like "sizeof WordDocEntry" is not correct, I changed it into "sizeof(WordDocEntry)", I hope this is what you meant:
About the amount of words, I see it totally absurd that it's really trying to index 200 million words on a folder! About folders and subfolders: I had to change the squatter bash script to run each individual folder of every account one at a time, so that in case of crash it would at least go on with the other folders and subfolders. |
Thanks! So,
should have allocated We crashed trying to write to That's an obscene allocation though. How much memory does your system have, and how much swap space? We know the allocation succeeded -- the
That's a good question; I'm not sure. I might be able to send you a patch that will log the words as it indexes them, and you could re-run it on this one folder and see what it logs? But if there's 7GB of them, the log would be absurd too... If I'm reading the code correctly, this 7GB is all just for "words starting with byte value 191" -- we would be doing similar work for words-starting-with-every-other-character too. 191 is outside the 0-127 us-ascii range, so I have no idea what that character would even be without knowing the encoding of the message(s) it was found in.
If it was trying to index the base64 text, it wouldn't involve character 191. Unless it had decoded the base64, and was trying to index the decoded content. I have no idea if it does this. Indexing attachment content could be useful, but only for attachments that are known to contain text. It would be useless to try and index the contents of a jpg or mp4 attachment, for example, but a pdf or word doc might be worth trying. But I don't even know what current squatter does wrt attachments, much less 2.5 squatter. |
Hang on a minute... if the program had 7GB allocated, then the core file you're getting backtraces from must be at least 7GB on disk to be complete. How big is this core file? I think gdb will warn when it's been truncated. But if it has been truncated, that means that this:
might just be telling us that the truncated core file doesn't contain that information, not that the address was unreadable at run time. Everything still looks like it crashed there, but with the core file probably truncated I'm suddenly uncertain... Hmm. I'm not sure what platform you're on, but here's a snippet from my
That seems to confirm my hunch that it "successfully" allocated that 7GB region, but then couldn't deliver it when we tried to use it. And continuing,
Well, that sounds like it's probably using an mmap for this allocation since it's much larger than 128kB. And we already talked about how if a pointer points into an mmap'd region at runtime, we can't examine it from the core file later because it no longer exists. D'oh! It's sounding more and more to me like there isn't a crash bug here, so much as just that it runs out of memory. Which I'm not saying that's not a bug; but it's not a crash bug, it's a "naively assumes infinite memory" bug. I guess it might also be a "tries to index stuff it shouldn't" bug, and maybe if it wasn't indexing stuff it shouldn't, it wouldn't need so much memory. Hard to say.
Please let me know if this seems useful to you, and if so I'll see what I can come up with |
@elliefm actually the core file is around 4GB and gdb complains with "warning: Unexpected size of section `.reg2/1' in core file." at startup. I looked at both the source codes of 2.5 and 3.4 and it's clear that 3.4 is doing a lot less work, and producing much less big files, because the new version does a lot of work on content-type, probably skipping a lot of non-searchable parts. Look here, 2.5.17: https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-2.5.17/imap/index.c#L4057 and here, 3.4.3: https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-3.4.3/imap/index.c#L5855 If only I could check for content type and skip if not "text/*", that may work great. |
We have modified source of 2.5.11 to allow squatter to skip indexing useless binary data: that was the reason for the huge amount ot memory and index files, and time needed to index. I hace a patch to index.c , where I found how to detect the indexed part's mime-type and skip if not "text/*".
|
The use if strnstr() is problematic on platforms other then BSD (libbsd needed). cyrus seems to use strstr() all over the source. The "TEXT/" strnstr() looks like it should be a strncmp(). And are you sure you do not miss the text of oldschool non-MIME messages? |
Thanks for your idea and patch in the first place! After failing to build with it applied on CentOS I modified your patch to I will test this now on my local mailhost.
|
@MASHtm great! thanks! |
Hi together, I found this case by chance and see, it's still open. I would like to announce our solution. (xmalloc: use size_t for consistency with std functions ) 6b3ba8d Unfortunately we have to wait before going to version 3 and help us with problems by looking at new features and patches for version 3. |
@ankarakusu yes you may announce the solution with this code, I already did it but I don't know how much interest is stiill there on 2.x. About 3.x, we did the move only on one of our internal servers, and we found it's not yet prepared for official deployment on our cloud servers. |
Hi, on installations where I still have 2.5.11 or 2.5.17 , many big folders cause squatter to core dump.
I installed a "-g" version of squatter and got the backtrace:
Looks like that pointer is not valid.
Any known solution on the 2.5.11-17 source tree?
This doesn't happen on the same folders using squatter 3.4 (but it actually produces much smaller squat files).
Please help, I cannot upgrade some of these installations to 3.4 yet.
Gabriele
The text was updated successfully, but these errors were encountered: