Skip to content

Fetch messages in order of their INTERNALDATE (#3756)#3789

Merged
iequidoo merged 2 commits intomasterfrom
iequidoo/sort-by-idate
Dec 4, 2022
Merged

Fetch messages in order of their INTERNALDATE (#3756)#3789
iequidoo merged 2 commits intomasterfrom
iequidoo/sort-by-idate

Conversation

@iequidoo
Copy link
Copy Markdown
Collaborator

When a batch of messages is moved from Inbox to DeltaChat folder with a single MOVE command, their UIDs may be reordered (e.g. Gmail is known for that) which leads to that messages are processed by receive_imf in the wrong order. But the INTERNALDATE attribute is preserved during a MOVE according to RFC3501. So, use it for sorting UIDs before fetching messages.

Checked by hands for now, will publish a test soon

@iequidoo iequidoo marked this pull request as draft November 29, 2022 18:01
@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch 2 times, most recently from 2fca8eb to 38d468b Compare November 29, 2022 18:19
CHANGELOG.md Outdated
- strip leading/trailing whitespace from "Chat-Group-Name{,-Changed}:" headers content #3650
- Assume all Thunderbird users prefer encryption #3774
- refactor peerstate handling to ensure no duplicate peerstates #3776
- Fetch messages in order of their INTERNALDATE (fixes reactions for Gmail f.e.) #3756
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number in the end should be the number of PR (#3789), not the issue. At least this is what we do in the rest of changelog.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But u don't know the PR number until u create it :) Maybe get rid of these "references" to Github too and leave short commit hashes istead?

@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch from 38d468b to beb8d0d Compare November 29, 2022 18:40
@iequidoo
Copy link
Copy Markdown
Collaborator Author

Btw, the problem with INTERNALDATE is that it is likely second-precision for the most email providers (if not for all). But i found in RFC9051 nothing preventing it to have a greater precision, hh::mm::ss.mmm f.e.
So, if a server receives messages in batches, they are likely of the same INTERNALDATE

@iequidoo iequidoo marked this pull request as ready for review November 29, 2022 19:30
@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch from beb8d0d to f081016 Compare November 29, 2022 23:29
@iequidoo iequidoo marked this pull request as draft November 30, 2022 21:04
@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch 3 times, most recently from 1118d1a to fb193e2 Compare December 1, 2022 14:12
@iequidoo
Copy link
Copy Markdown
Collaborator Author

iequidoo commented Dec 1, 2022

Missed one thing in the previous version of the fix -- mail servers don't respect the order of UIDs we give 'em in a fetch, so a sort is needed after the fetch too

context,
"Got unwanted uid {} not in {:?}, requested {:?}",
&server_uid,
server_uids,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed logging of server_uids Vec here. Better to log huge structs only once if some error happens

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, even though i can't remember this special line here cluttering the log.

Since AFAIK we never had the problem that we generated the wrong uid set (either me or @link2xt would know), it would even be fine to not do the effort (+ code complexity) with the count_extra variable, and just remove the logging here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line can be removed completely, unwanted UIDs are possible because of unsolicited responses, e.g. if another client changes \Seen flag on a message after we do a prefetch but before fetch. It's not a bug if we receive such unsolicited response.

But even as-is it's ok.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removed this log and added your comment instead

@iequidoo iequidoo marked this pull request as ready for review December 1, 2022 14:35
@iequidoo iequidoo requested review from Hocuri and link2xt December 1, 2022 14:38
@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch from fb193e2 to 6651741 Compare December 1, 2022 14:48
@iequidoo iequidoo marked this pull request as draft December 1, 2022 17:48
src/imap.rs Outdated
while let Some(Ok(msg)) = msgs.next().await {
}
.filter_map(|r| async { r.ok() })
.collect::<Vec<_>>()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to get rid of collecting all messages and sorting then. If messages arrive in the requested order, we can dispatch them immediately (and consume much less memory), and if not, just put out-of-order messages to HashMap and take them from there later. It's not too complex

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't look like you got rid of collecting all messages and sorting them?

I agree that we should get rid of the collecting, because it will lead to first fetching all messages and then starting to receive them, which will noticeably increase latency when receiving many messages at once.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't look like you got rid of collecting all messages and sorting them?

I think this is a proposal. It generally makes sense to do as a part of "pipelining" effort (splitting message processing into multiple steps such as fetch, decrypt, add contacts etc., and figuring out which one is the bottleneck), but is not directly related to the bugfix.

@iequidoo Maybe put it into another PR? It's much easier to review in small chunks.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, we already have this pipelining on master (i.e. emails are fetched and processed in parallel), but this PR removes pipelining. And there were "pipelining efforts" about splitting it more.

Copy link
Copy Markdown
Collaborator Author

@iequidoo iequidoo Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to do this now rather than get some regression in latency / memory consumption for "sane" servers that don't reorder messages

Copy link
Copy Markdown
Collaborator

@link2xt link2xt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed Rust code yet.


lp.sec("moving messages to ac2's DeltaChat folder in the reverse order")
ac2.direct_imap.connect()
uids2 = [m.uid for m in ac2.direct_imap.get_all_messages()]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly use sorted([m.uid for m in ac2.direct_imap.get_all_messages()], reverse=True).



def test_reactions_for_a_reordering_move(acfactory, lp):
"""See https://github.com/deltachat/deltachat-core-rust/issues/3756 "Gmail reorders messages
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer in-place descriptions over GitHub links. If we ever decide to move off GitHub or something happens to the repository, this info would be lost.

src/imap.rs Outdated
while let Some(Ok(msg)) = msgs.next().await {
}
.filter_map(|r| async { r.ok() })
.collect::<Vec<_>>()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't look like you got rid of collecting all messages and sorting them?

I think this is a proposal. It generally makes sense to do as a part of "pipelining" effort (splitting message processing into multiple steps such as fetch, decrypt, add contacts etc., and figuring out which one is the bottleneck), but is not directly related to the bugfix.

@iequidoo Maybe put it into another PR? It's much easier to review in small chunks.

context,
"Got unwanted uid {} not in {:?}, requested {:?}",
&server_uid,
server_uids,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line can be removed completely, unwanted UIDs are possible because of unsolicited responses, e.g. if another client changes \Seen flag on a message after we do a prefetch but before fetch. It's not a bug if we receive such unsolicited response.

But even as-is it's ok.

Copy link
Copy Markdown
Collaborator

@link2xt link2xt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that the test fails on master and works with this fix, code-wise LGTM too except for the minor fixes in the review.

@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch 3 times, most recently from 4bf3fea to 1007ef9 Compare December 2, 2022 21:29
@iequidoo iequidoo marked this pull request as ready for review December 2, 2022 21:37
let mut result = vec![String::new()];
let mut result = vec![(Vec::new(), String::new())];
for range in ranges {
if let Some(last) = result.last_mut() {
Copy link
Copy Markdown
Collaborator Author

@iequidoo iequidoo Dec 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better not to write code this way, result can't be empty by design, and if it's difficult to rewrite the code in a way the compiler would know that, better to return an error (panics also don't seem good here since they are rather for terminating a program entered an irrecoverable state). But i don't want to clutter up the code with handling of such impossible situations and just wrote context(0), but maybe we have an idiomatic way for that?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we would write .context("result was empty") or so in this case.

Copy link
Copy Markdown
Collaborator Author

@iequidoo iequidoo Dec 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will keep in mind. But rewrote the code to get rid of unnecessary error handling

@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch from 1007ef9 to 399eba6 Compare December 3, 2022 12:10
@iequidoo iequidoo requested a review from link2xt December 3, 2022 12:22
Copy link
Copy Markdown
Collaborator

@link2xt link2xt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's merge without code changes, maybe with documentation fixes

&sets
);
continue;
let mut uid_msgs = server_uids
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is somewhat hard to read because of nested results and options in derived types, but after checking it out and playing a bit with it, it makes sense to me.

I suggest that we merge this as is and postpone any refactoring to separate PRs.

When a batch of messages is moved from Inbox to DeltaChat folder with a single MOVE command, their
UIDs may be reordered (e.g. Gmail is known for that) which leads to that messages are processed by
receive_imf in the wrong order. But the INTERNALDATE attribute is preserved during a MOVE according
to RFC3501. So, use it for sorting fetched messages.
@iequidoo iequidoo force-pushed the iequidoo/sort-by-idate branch from 399eba6 to 42262e8 Compare December 4, 2022 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants