Fetch messages in order of their INTERNALDATE (#3756)#3789
Conversation
2fca8eb to
38d468b
Compare
CHANGELOG.md
Outdated
| - strip leading/trailing whitespace from "Chat-Group-Name{,-Changed}:" headers content #3650 | ||
| - Assume all Thunderbird users prefer encryption #3774 | ||
| - refactor peerstate handling to ensure no duplicate peerstates #3776 | ||
| - Fetch messages in order of their INTERNALDATE (fixes reactions for Gmail f.e.) #3756 |
There was a problem hiding this comment.
This number in the end should be the number of PR (#3789), not the issue. At least this is what we do in the rest of changelog.
There was a problem hiding this comment.
But u don't know the PR number until u create it :) Maybe get rid of these "references" to Github too and leave short commit hashes istead?
38d468b to
beb8d0d
Compare
|
Btw, the problem with INTERNALDATE is that it is likely second-precision for the most email providers (if not for all). But i found in RFC9051 nothing preventing it to have a greater precision, hh::mm::ss.mmm f.e. |
beb8d0d to
f081016
Compare
1118d1a to
fb193e2
Compare
|
Missed one thing in the previous version of the fix -- mail servers don't respect the order of UIDs we give 'em in a fetch, so a sort is needed after the fetch too |
| context, | ||
| "Got unwanted uid {} not in {:?}, requested {:?}", | ||
| &server_uid, | ||
| server_uids, |
There was a problem hiding this comment.
Removed logging of server_uids Vec here. Better to log huge structs only once if some error happens
There was a problem hiding this comment.
Makes sense, even though i can't remember this special line here cluttering the log.
Since AFAIK we never had the problem that we generated the wrong uid set (either me or @link2xt would know), it would even be fine to not do the effort (+ code complexity) with the count_extra variable, and just remove the logging here.
There was a problem hiding this comment.
This log line can be removed completely, unwanted UIDs are possible because of unsolicited responses, e.g. if another client changes \Seen flag on a message after we do a prefetch but before fetch. It's not a bug if we receive such unsolicited response.
But even as-is it's ok.
There was a problem hiding this comment.
Ok, removed this log and added your comment instead
fb193e2 to
6651741
Compare
src/imap.rs
Outdated
| while let Some(Ok(msg)) = msgs.next().await { | ||
| } | ||
| .filter_map(|r| async { r.ok() }) | ||
| .collect::<Vec<_>>() |
There was a problem hiding this comment.
Decided to get rid of collecting all messages and sorting then. If messages arrive in the requested order, we can dispatch them immediately (and consume much less memory), and if not, just put out-of-order messages to HashMap and take them from there later. It's not too complex
There was a problem hiding this comment.
The code doesn't look like you got rid of collecting all messages and sorting them?
I agree that we should get rid of the collecting, because it will lead to first fetching all messages and then starting to receive them, which will noticeably increase latency when receiving many messages at once.
There was a problem hiding this comment.
The code doesn't look like you got rid of collecting all messages and sorting them?
I think this is a proposal. It generally makes sense to do as a part of "pipelining" effort (splitting message processing into multiple steps such as fetch, decrypt, add contacts etc., and figuring out which one is the bottleneck), but is not directly related to the bugfix.
@iequidoo Maybe put it into another PR? It's much easier to review in small chunks.
There was a problem hiding this comment.
FTR, we already have this pipelining on master (i.e. emails are fetched and processed in parallel), but this PR removes pipelining. And there were "pipelining efforts" about splitting it more.
There was a problem hiding this comment.
Decided to do this now rather than get some regression in latency / memory consumption for "sane" servers that don't reorder messages
link2xt
left a comment
There was a problem hiding this comment.
Haven't reviewed Rust code yet.
python/tests/test_1_online.py
Outdated
|
|
||
| lp.sec("moving messages to ac2's DeltaChat folder in the reverse order") | ||
| ac2.direct_imap.connect() | ||
| uids2 = [m.uid for m in ac2.direct_imap.get_all_messages()] |
There was a problem hiding this comment.
You can directly use sorted([m.uid for m in ac2.direct_imap.get_all_messages()], reverse=True).
python/tests/test_1_online.py
Outdated
|
|
||
|
|
||
| def test_reactions_for_a_reordering_move(acfactory, lp): | ||
| """See https://github.com/deltachat/deltachat-core-rust/issues/3756 "Gmail reorders messages |
There was a problem hiding this comment.
I generally prefer in-place descriptions over GitHub links. If we ever decide to move off GitHub or something happens to the repository, this info would be lost.
src/imap.rs
Outdated
| while let Some(Ok(msg)) = msgs.next().await { | ||
| } | ||
| .filter_map(|r| async { r.ok() }) | ||
| .collect::<Vec<_>>() |
There was a problem hiding this comment.
The code doesn't look like you got rid of collecting all messages and sorting them?
I think this is a proposal. It generally makes sense to do as a part of "pipelining" effort (splitting message processing into multiple steps such as fetch, decrypt, add contacts etc., and figuring out which one is the bottleneck), but is not directly related to the bugfix.
@iequidoo Maybe put it into another PR? It's much easier to review in small chunks.
| context, | ||
| "Got unwanted uid {} not in {:?}, requested {:?}", | ||
| &server_uid, | ||
| server_uids, |
There was a problem hiding this comment.
This log line can be removed completely, unwanted UIDs are possible because of unsolicited responses, e.g. if another client changes \Seen flag on a message after we do a prefetch but before fetch. It's not a bug if we receive such unsolicited response.
But even as-is it's ok.
link2xt
left a comment
There was a problem hiding this comment.
I checked that the test fails on master and works with this fix, code-wise LGTM too except for the minor fixes in the review.
4bf3fea to
1007ef9
Compare
| let mut result = vec![String::new()]; | ||
| let mut result = vec![(Vec::new(), String::new())]; | ||
| for range in ranges { | ||
| if let Some(last) = result.last_mut() { |
There was a problem hiding this comment.
I think better not to write code this way, result can't be empty by design, and if it's difficult to rewrite the code in a way the compiler would know that, better to return an error (panics also don't seem good here since they are rather for terminating a program entered an irrecoverable state). But i don't want to clutter up the code with handling of such impossible situations and just wrote context(0), but maybe we have an idiomatic way for that?
There was a problem hiding this comment.
Usually we would write .context("result was empty") or so in this case.
There was a problem hiding this comment.
Ok, will keep in mind. But rewrote the code to get rid of unnecessary error handling
1007ef9 to
399eba6
Compare
link2xt
left a comment
There was a problem hiding this comment.
LGTM, let's merge without code changes, maybe with documentation fixes
| &sets | ||
| ); | ||
| continue; | ||
| let mut uid_msgs = server_uids |
There was a problem hiding this comment.
This code is somewhat hard to read because of nested results and options in derived types, but after checking it out and playing a bit with it, it makes sense to me.
I suggest that we merge this as is and postpone any refactoring to separate PRs.
When a batch of messages is moved from Inbox to DeltaChat folder with a single MOVE command, their UIDs may be reordered (e.g. Gmail is known for that) which leads to that messages are processed by receive_imf in the wrong order. But the INTERNALDATE attribute is preserved during a MOVE according to RFC3501. So, use it for sorting fetched messages.
399eba6 to
42262e8
Compare
When a batch of messages is moved from Inbox to DeltaChat folder with a single MOVE command, their UIDs may be reordered (e.g. Gmail is known for that) which leads to that messages are processed by receive_imf in the wrong order. But the INTERNALDATE attribute is preserved during a MOVE according to RFC3501. So, use it for sorting UIDs before fetching messages.
Checked by hands for now, will publish a test soon