Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop issue with new EOF handling code in eatline #2350

Closed
QuadrantAndrew opened this issue May 9, 2018 · 13 comments
Closed

Infinite loop issue with new EOF handling code in eatline #2350

QuadrantAndrew opened this issue May 9, 2018 · 13 comments
Assignees
Labels
3.0 affects 3.0 3.1 affects 3.1 dev releases

Comments

@QuadrantAndrew
Copy link

Since upgrading to 3.0.6 we've been having issue with some imap processes taking 100% CPU. All of the processes get stuck in the eatline function when disconnecting from an append command. Once the processes get in this state there is no network traffic or system calls made by the processes, which leads me to believe it's stuck in the for(;;) loop in eatline.

Attached are some backtraces we were able to get by dumping the core of the imap processes before killing them.
backtraces.txt

@elliefm
Copy link
Contributor

elliefm commented May 11, 2018

These are the backtraces from the attached backtraces.txt, for my own convenience mostly

(gdb) bt
#0  0x00000008060255a0 in __error () from /lib/libc.so.7
#1  0x00000008023fb11b in prot_fill (s=0x808415300) at prot.c:630
#2  0x00000008023fdf1d in prot_getc (s=0x808415300) at prot.c:1756
#3  0x000000080206dc1e in eatline (pin=0x808415300, c=-1) at imap/imapparse.c:444
#4  0x000000000041b2e5 in cmd_append (tag=0x808453640 "se5u", name=0x808415700 "Sent", cur_name=0x0) at imap/imapd.c:3916
#5  0x0000000000414724 in cmdloop () at imap/imapd.c:1369
#6  0x00000000004139a2 in service_main (argc=1, argv=0x808423000, envp=0x7fffffffed48) at imap/imapd.c:1000
#7  0x00000000004504be in main (argc=1, argv=0x7fffffffed38, envp=0x7fffffffed48) at master/service.c:634
(gdb) frame 3
#3  0x000000080206dc1e in eatline (pin=0x808415300, c=-1) at imap/imapparse.c:444
444	        c = prot_getc(pin);
(gdb) p ( pin )->eof
$4 = 0
(gdb) bt
#0  0x0000000805cb9c3e in pthread_cleanup_pop () from /lib/libthr.so.3
#1  0x00000008023fb11b in prot_fill (s=0x808415500) at prot.c:630
#2  0x00000008023fdf1d in prot_getc (s=0x808415500) at prot.c:1756
#3  0x000000080206dc1e in eatline (pin=0x808415500, c=-1) at imap/imapparse.c:444
#4  0x000000000041b2e5 in cmd_append (tag=0x808453640 "80on", name=0x808423e80 "Sent", cur_name=0x0) at imap/imapd.c:3916
#5  0x0000000000414724 in cmdloop () at imap/imapd.c:1369
#6  0x00000000004139a2 in service_main (argc=1, argv=0x808423000, envp=0x7fffffffed48) at imap/imapd.c:1000
#7  0x00000000004504be in main (argc=1, argv=0x7fffffffed38, envp=0x7fffffffed48) at master/service.c:634
(gdb) frame 3 
#3  0x000000080206dc1e in eatline (pin=0x808415500, c=-1) at imap/imapparse.c:444
444	        c = prot_getc(pin);
Current language:  auto; currently minimal
(gdb) p (pin)->eof
$1 = 0
(gdb) bt
#0  eatline (pin=0x808415300, c=-1) at imap/imapparse.c:422
#1  0x000000000041b2e5 in cmd_append (tag=0x808453640 "k8k1", name=0x808426800 "Sent", cur_name=0x0) at imap/imapd.c:3916
#2  0x0000000000414724 in cmdloop () at imap/imapd.c:1369
#3  0x00000000004139a2 in service_main (argc=1, argv=0x808423000, envp=0x7fffffffed48) at imap/imapd.c:1000
#4  0x00000000004504be in main (argc=1, argv=0x7fffffffed38, envp=0x7fffffffed48) at master/service.c:634
Current language:  auto; currently minimal
(gdb) p (pin)->eof
$1 = 0
(gdb) bt
#0  0x00000008023fbbfd in prot_fill (s=0x808415300) at prot.c:826
#1  0x00000008023fdf1d in prot_getc (s=0x808415300) at prot.c:1756
#2  0x000000080206dc1e in eatline (pin=0x808415300, c=-1) at imap/imapparse.c:444
#3  0x000000000041b2e5 in cmd_append (tag=0x808453640 "sq2k", name=0x80841b240 "Sent", cur_name=0x0) at imap/imapd.c:3916
#4  0x0000000000414724 in cmdloop () at imap/imapd.c:1369
#5  0x00000000004139a2 in service_main (argc=1, argv=0x808423000, envp=0x7fffffffed48) at imap/imapd.c:1000
#6  0x00000000004504be in main (argc=1, argv=0x7fffffffed38, envp=0x7fffffffed48) at master/service.c:634
Current language:  auto; currently minimal
(gdb) frame 2
#2  0x000000080206dc1e in eatline (pin=0x808415300, c=-1) at imap/imapparse.c:444
444	        c = prot_getc(pin);
(gdb) p (pin)->eof
$1 = 0

@elliefm
Copy link
Contributor

elliefm commented May 11, 2018

I had a long reply typed up in which I analysed a bunch of wrong things and was just about to ping someone for help when I suddenly realised what the problem probably is 🙄

I think this quick-and-dirty fix should sort this out:

diff --git a/imap/imapparse.c b/imap/imapparse.c
index 30d169dc9..4da56336d 100644
--- a/imap/imapparse.c
+++ b/imap/imapparse.c
@@ -416,7 +416,7 @@ EXPORTED void eatline(struct protstream *pin, int c)
         /* Several of the parser helper functions return EOF
            even if an unexpected character (other than EOF) is received. 
            We need to confirm that the stream is actually at EOF. */
-        if (c == EOF && prot_IS_EOF(pin)) return;
+        if (c == EOF && (prot_IS_EOF(pin) || pin->error)) return;
 
         /* see if it's a literal */
         if (c == '{') {

A better fix would be to either:

  • make the prot_IS_EOF macro also check for error conditions, OR
  • make a new prot_IS_ERROR macro that checks for error conditions, and have eatline() check both of these

I'm not sure which I prefer. Just adding it to the prot_IS_EOF macro would be trivial, but I've just had a look at the FILE * api and it has distinct feof() and ferror() functions, so maybe we should do the same to keep the prot interface unsurprising. @ksmurchison, any thoughts?

@elliefm
Copy link
Contributor

elliefm commented May 11, 2018

Actually I've just decided to do it the FILE way and then it's done, commits incoming.

@elliefm
Copy link
Contributor

elliefm commented May 11, 2018

I'd appreciate feedback if this fixes the issue!

@elliefm elliefm reopened this May 11, 2018
@ksmurchison
Copy link
Contributor

This fix looks sane, but I'm curious as to what is causing pin->error without also causing pin->eof.
Is this client doing something silly?

@QuadrantAndrew
Copy link
Author

This fix looks like it'll work. I rechecked the core files and they all had the same pin->error value: "Connection reset by peer".

We'll integrate it over the weekend and report if that does fix the problem.

@elliefm
Copy link
Contributor

elliefm commented May 15, 2018

I'm curious as to what is causing pin->error without also causing pin->eof.

@ksmurchison looking at prot_fill(), it seems like this is a design decision in the protstreams API. Most (all?) of the places that set s->error and return EOF do not also set s->eof.

Usually one just checks for the EOF return value and things work out fine, but eatline() needs to look deeper than that.

I wonder if there's other places that need the same sort of fix? Though, I expect they would have broken, been discovered, and fixed, long ago -- looking how quickly the new eatline issue was discovered.

@elliefm
Copy link
Contributor

elliefm commented May 15, 2018

@QuadrantAndrew FYI I'm planning to release a 3.0.7 as soon as I have confirmation that this issue is fixed. :)

@QuadrantAndrew
Copy link
Author

I wasn't able to get the fix in over the weekend. It's in now. I'll let it run during the busy time tomorrow and report back.

@elliefm
Copy link
Contributor

elliefm commented May 15, 2018

Great, thanks :)

@slauffer
Copy link

slauffer commented May 15, 2018

After about 19h runnig cyrus-imapd with this patch we had no more problems.

@QuadrantAndrew
Copy link
Author

It's been running all day without any issues. Before the fix we'd get at least 3-4 runaway processes per hour. Looks like the issue is solved!

Thank you so much for the quick response to this issue.

@elliefm
Copy link
Contributor

elliefm commented May 18, 2018

3.0.7 has just been released with the fix in it :)

@elliefm elliefm closed this as completed May 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 affects 3.0 3.1 affects 3.1 dev releases
Projects
None yet
Development

No branches or pull requests

4 participants