New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not yet squattered emails are always returned on any search (3.4 and later) #4692
Comments
Hello, I ran into the same problem, after upgrade (Debian 3.2.6-2+deb11u2 -> 3.6.1-4+deb12u1), with search_engine: squat. |
We tried to move to xapian, but we found it much slower than squatter, both indexing and searching. I feel that from 3.2 to 3.4 a refactoring has been made to the search engine code, thus killing the original code that automatically added unindexed emails applying the same filter. |
From what (very!) little I understand of the squat backend, the squat backend is supposed to return all unindexed messages to the general search engine (as well as the indexed messages that matched), and the general search engine is supposed to apply its own (slow) filter to unindexed ones. Perhaps the latter detail has been lost somewhere along the way, and the general search engine no longer applies its own filter to the unindexed messages. Or perhaps it now expects the unindexed messages to be flagged in some different way, and squat hasn't been updated to match, so the general engine mistakes them for indexed results. I dunno, these are some very lightly educated guesses at best! |
@elliefm @rsto @vladki77 we found the solution to both problems:
Here is the patch:
Look to work fine to us. Gabriele |
Thanks! I've applied this patch locally and run it through cassandane, and it creates lots of errors for Xapian search. I've included the list of tests that fail or error for me below, in case it's useful to someone. Since the patch apparently fixes Squat but breaks Xapian, and we suspected a previous change to support Xapian had broken Squat, then I guess the patch at least shows us the area of code that the backends are in disagreement about. And it looks like the disagreement is about whether to use Here's the tests that now fail with the patch:
|
@elliefm I did a step by step debug with gdb and discovered that when squatter is on, cyrus-imapd/imap/search_query.c Line 1014 in 2ab6a5f
But anyway I also discovered that when no search engine is configured, the expression is found inside the |
that is correct and there are assumptions about the search backend being Xapian present up to the JMAP layers. Also the expression rewrite logic reflects that, e.g. see cyrus-imapd/imap/search_expr.c Lines 1353 to 1362 in d093790
For sake of getting squat backend supported again, I'd be fine if the above patch is executed on condition that the configured search backend is squat. What actually needs to get done is to sanitize the whole search code, but that is out of scope this issue. |
Another possibility is to check for squat backend and set |
I'm fine with whatever gets the job done and the smallest code impact! |
Ok @rsto , here is the smaller patch, verified to work our side with squatter:
|
Thanks! I've just created pull request #4698 |
Great! Thanks! |
Hello, I have applied the patch [59d41a0] to 3.6.1 and recompiled. For some users it seems to work fine (search in unindexed mails works as expected), but at least one reports that search stopped working altogether. In mail.log I see Fatal error: Internal error: assertion failed: imap/search_query.c: 514: sub->expr && sub->expr->op == SEOP_TRUE Webmail (roundcube) search reports: Failed to send UID SEARCH command, and does not show any results. |
I have replaced Further testing shows that limiting search by date is ignored. On my mailbox (where search worked well when the assert was in place), i see: expected SEOP_TRUE (1) but found 6. (SEOP_GT) or 3 (SEOP_LT). On the mailbox with problems - search with date limit gives: expected SEOP_TRUE (1) but found 9 (SEOP_AND) - and does not matter if searching newer or older messages. Anyway neither of these searches was limited by date. All messages which matched the search string were returned. As the patched version is now in production I'm collecting more mailboxes that have this problem. |
@rsto is there a way to reopen this issue? |
I have some more logs, and see that sub->expr->op is quite often not SEOP_TRUE, but SEOP_NOT and SEOP_OR. And a few occasions of SEOP_AND, and SEOP_MATCH. Maybe in these cases the sub->expr->op should be preserved? |
Any progress here? I'd like to get this fixed but I'm afraid that my patches can break things. Is there anyone who could review my patches and recommend if they are ok? |
We don't experience your issue with the changes we suggested. |
I'm using cyrus version 3.6.1-4+deb12u1 (debian stable/bookworm), with manually applied patch [https://github.com/cyrusimap/cyrus-imapd/commit/59d41a084db3e7c8dccc24290a7a62693878f47d] |
We're using 3.8 built on XStreamOS/illumos with added patches |
Yeah I know that the patch is intended for 3.8. The question is if anyone is working on backporting it to 3.6, or if I make the pathch myself if anyone can review (and test) it ? |
I will backport it to 3.6, but I haven't yet. Thanks for the bump. |
i noticed, that the search result also includes all messages from the trash folder, is this related? |
@brrrrrrrt Probably, if you haven't indexed the trash folder |
@elliefm hmm, no, the trash mailbox is indexed, just did it manually, still showing all messages from trash, no matter what i am searching for :( |
Hello I see that patches [59d41a0] [6d3b834] and [0c9e48a] are identical. As I reported that patch causes assertion failure on 3.6.x because |
So I have tried to modify the patch but failed. So I reverted back to patch: So to sum up - the patches published as solving this issue do not work well:
|
Can you provide some examples of IMAP SEARCH commands that do and don't work on 3.6 with the patch? I could write a Cassandane test to reproduce the problem, but I'm not really familiar with the search syntax. Some examples of searches that should work and do work, and searches that should work but do not, would give me a quicker starting point than trying to work backwards from source code level details. It seems clear that something else had changed between 3.6 and 3.8, but which has not been backported. If I can reproduce the problem from Cassandane, then it might be possible to find the missing link, and backport that too |
The problem is with "search text xxx since yyy" 3.6 without patches: 3.6 with patch [https://github.com/cyrusimap/cyrus-imapd/commit/59d41a084db3e7c8dccc24290a7a62693878f47d]: 3.6 with my patches cyrus-search-unindexed.txt
3.6 with my patches cyrus-search-unindexed-and-date-failed.txt |
Made a test that reproduces the crash with 3.6, and then ran the same test on the master branch and it also crashes. So it's not a problem with 3.6, it's a problem with the patch itself. |
This comment above
In I'll tinker with this a little longer and see what I come up with. |
Seems to be working so far, needs some tidy up and at least one more test though |
"at least one more test":
Okay so far...
Whoops! No crash at least, but it should have found message 3 here, and didn't. So it's better, but still not quite right in some way that I don't yet understand. But since/not since both work correctly when there isn't any indexed subquery:
So whatever the problem with "not since" is, it doesn't occur in isolation, only when combined with an indexed subquery (such as "text needle"). Hmm. |
Hah, the typo was right there in my previous comment, but I still didn't see it until hours of debugging later. For the search that failed, turns out I was accidentally searching 2004 rather than 2024, doh. With that fixed, the test passes. Just a bunch of debugging mess to clean up now, I think. |
@vladki77 My fix is at #4806, though most of that is tests to prove it works. Thanks heaps for your help reproducing the issue. The one commit you need is 48f702b, which you can download as a patch here. This patch applies over the top of the previous (broken) fix that's already on the cyrus-imapd-3.6 branch, so you should revert your local changes, then apply this one. I'll backport this to the older branches once it's been through code review. |
@vladki77 Forgot to say -- I'd really appreciate if you could try it out and confirm it works for you. The tests tell me it will, but a little more confirmation never hurt :) |
Sorry I did not have time for tests last week. But I'll definitely do it. I promise. |
Compiled, deployed, search seems to work as expected. Thank you very much @elliefm |
Thanks! |
The squat backend returns false positives for unindexed messages, but filtering these messages has been broken. This patch fixes that based on a contribution of @gbulfon. Thanks! Fixes cyrusimap#4692
Hi, I've been digging this problem using gdb but I hardly can figure out what's happening here.
Using squatter mode in imapd.conf, when recent emails arrive and are not yet been squattered, any search will return the correct result of the squattered emails + all of these recent emails, which are not filtered considering the query.
This can be easily simulated by commenting out the rolling squatter from cyrus.conf, then sending a couple of new emails to the test account: try to do queries and you will always receive these new emails as a result, even if they don't match the query expression.
I discovered that the problem is during subquery_run_indexed() (search_query.c), after running the squatter query via bx->run() it will run the subquery_post_enginesearch callback for any folder. Here it will run index_search_evaluate(), where at the beginning runs search_expr_always_same: this returns 1 because e->op is SEOP_TRUE, causing the code to not consider the original query and going on adding the uids as a match.
This search expression is sent to index_search_evaluate from sub->expr, which is the one containing SEOP_TRUE, while I've seen previous code checking the "op" inside sub->indexed, I don't know what this means actually.
I also can't find why this expr is SEOP_TRUE when the search is a SEOP_MATCH (justs a search "FROM abcd"), and infact sub->indexed reflects it.
@rsto @elliefm you may already have checked something in the past with me about it, but maybe with these new considerations we may find where the bug is?
Here is the previous issue: #4224
Gabriele
The text was updated successfully, but these errors were encountered: