Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMAP container error on reboot #2917

Closed
1 of 3 tasks
DrDoug88 opened this issue Aug 20, 2023 · 18 comments · Fixed by #2923
Closed
1 of 3 tasks

IMAP container error on reboot #2917

DrDoug88 opened this issue Aug 20, 2023 · 18 comments · Fixed by #2923
Labels
priority/p0 Critical bug without workaround / Must have type/bug Bug. Not working as intended

Comments

@DrDoug88
Copy link

Environment & Version

Environment

  • docker compose
  • kubernetes
  • docker swarm

Version

  • Version: 2.0

Description

Occasionally, when performing a server reboot for routine maintenance, Mailu does not start correctly. I have narrowed it down to the imap container. The only error I see in the imap container logs is: Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid). This is on my production machine, so I cannot keep the service down too long to troubleshoot. Before, I would restart the entire stack, but since I looked into it more on this last occurrence, I simply restarted the IMAP container and everything works fine. I am apprehensive about even posting this bug, since I do not have more information to offer. I can spin up a test environment and keep rebooting until the error occurs if necessary. I was running 1.9 for about a year, and it seems like this only started once I moved to 2.0 back in April, but I'm not sure.

Thanks for all of the hard work you all do.

Replication Steps

It is very difficult to reproduce. It happens maybe 10% of the time.

Logs

Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)
@DrDoug88
Copy link
Author

I set up a test environment and was able to reproduce the issue. Below is the full scrubbed IMAP container log. This time around, restarting the IMAP container alone did not remedy the problem. However, restarting the entire compose stack did fix it.

2023-08-21T01:14:38.123746000Z WARNING:root:Disabling hardened-malloc on this CPU
2023-08-21T01:14:38.251769000Z Aug 20 20:14:38 master: Info: Dovecot v2.3.20 (80a5ac675d) starting up for imap, pop3, lmtp, sieve
2023-08-21T01:16:42.609326000Z Aug 20 20:16:42 lmtp(41): Error: SSL context initialization failed, disabling SSL: Can't load SSL certificate (ssl_cert setting): The certificate is empty
2023-08-21T01:16:42.616802000Z Aug 20 20:16:42 lmtp(41): Info: Connect from 172.16.7.10
2023-08-21T01:16:42.710910000Z Aug 20 20:16:42 lmtp(mailadmin@example.com)<41><CYm8JPq64mQpAAAAGOvSGA>: Info: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'
2023-08-21T01:16:42.736340000Z Aug 20 20:16:42 indexer-worker(mailadmin@example.com)<47><CYm8JPq64mQpAAAAGOvSGA:gqjHK/q64mQvAAAAGOvSGA>: Info: FTS Xapian: 'INBOX' (/mail/mailadmin@example.com/xapian-indexes/db_48120528fabae2642900000018ebd218) indexes do not exist. Initializing DB
2023-08-21T01:16:42.739600000Z Aug 20 20:16:42 lmtp(41): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:16:44.427003000Z Aug 20 20:16:44 lmtp(41): Info: Connect from 172.16.7.10
2023-08-21T01:16:44.475830000Z Aug 20 20:16:44 lmtp(mailadmin@example.com)<41><eLZsGfy64mQpAAAAGOvSGA>: Info: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'
2023-08-21T01:16:44.498321000Z Aug 20 20:16:44 lmtp(41): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:15.657275000Z Aug 20 20:17:15 lmtp(41): Info: Connect from 172.16.7.10
2023-08-21T01:17:15.699291000Z Aug 20 20:17:15 lmtp(mailadmin@example.com)<41><yK4lJxu74mQpAAAAGOvSGA>: Info: sieve: msgid=<b9db8cea-63ae-a146-6e60-34a4c1e53924@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:15.730934000Z Aug 20 20:17:15 lmtp(41): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:20.463985000Z Aug 20 20:17:20 lmtp(41): Info: Connect from 172.16.7.10
2023-08-21T01:17:20.501934000Z Aug 20 20:17:20 lmtp(mailadmin@example.com)<41><+OugGyC74mQpAAAAGOvSGA>: Info: sieve: msgid=<b36e9ad4-071f-2f78-af6f-5546466f1357@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:20.529061000Z Aug 20 20:17:20 lmtp(41): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:23.022773000Z Aug 20 20:17:23 lmtp(41): Info: Connect from 172.16.7.10
2023-08-21T01:17:23.035358000Z Aug 20 20:17:23 lmtp(55): Error: SSL context initialization failed, disabling SSL: Can't load SSL certificate (ssl_cert setting): The certificate is empty
2023-08-21T01:17:23.042875000Z Aug 20 20:17:23 lmtp(55): Info: Connect from 172.16.7.10
2023-08-21T01:17:23.055468000Z Aug 20 20:17:23 lmtp(56): Error: SSL context initialization failed, disabling SSL: Can't load SSL certificate (ssl_cert setting): The certificate is empty
2023-08-21T01:17:23.063121000Z Aug 20 20:17:23 lmtp(56): Info: Connect from 172.16.7.10
2023-08-21T01:17:23.070807000Z Aug 20 20:17:23 lmtp(mailadmin@example.com)<41><CMw5ASO74mQpAAAAGOvSGA>: Info: sieve: msgid=<68c15b91-5381-2bd6-8fad-6114e3829564@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:23.099715000Z Aug 20 20:17:23 lmtp(mailadmin@example.com)<55><mBSJAiO74mQ3AAAAGOvSGA>: Info: sieve: msgid=<d88c7547-9553-49a9-4cf7-4603602ba670@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:23.109304000Z Aug 20 20:17:23 lmtp(41): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:23.133226000Z Aug 20 20:17:23 lmtp(55): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:23.136763000Z Aug 20 20:17:23 lmtp(mailadmin@example.com)<56><auu4AyO74mQ4AAAAGOvSGA>: Info: sieve: msgid=<7bec2990-e3f9-4932-d776-1135811e3e5b@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:23.158341000Z Aug 20 20:17:23 lmtp(56): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:23.344474000Z Aug 20 20:17:23 lmtp(41): Info: Connect from 172.16.7.10
2023-08-21T01:17:23.381523000Z Aug 20 20:17:23 lmtp(mailadmin@example.com)<41><QCuAFCO74mQpAAAAGOvSGA>: Info: sieve: msgid=<129c0ff8-581e-24d6-ae81-eb3dda0cc198@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:23.412098000Z Aug 20 20:17:23 lmtp(41): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:24.136545000Z Aug 20 20:17:24 lmtp(41): Info: Connect from 172.16.7.10
2023-08-21T01:17:24.174305000Z Aug 20 20:17:24 lmtp(mailadmin@example.com)<41><8P4cCCS74mQpAAAAGOvSGA>: Info: sieve: msgid=<10863065-6354-74b1-6b7f-f4c05f4bf447@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:24.205175000Z Aug 20 20:17:24 lmtp(41): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:17:24.417230000Z Aug 20 20:17:24 lmtp(56): Info: Connect from 172.16.7.10
2023-08-21T01:17:24.453522000Z Aug 20 20:17:24 lmtp(mailadmin@example.com)<56><ODnVGCS74mQ4AAAAGOvSGA>: Info: sieve: msgid=<de145b94-4be2-3f32-48e8-d1fd9ee744f4@example.com>: stored mail into mailbox 'INBOX'
2023-08-21T01:17:24.456623000Z ERROR:asyncio:Unclosed connection
2023-08-21T01:17:24.456793000Z client_connection: Connection<ConnectionKey(host='admin', port=80, is_ssl=False, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=None)>
2023-08-21T01:17:24.521764000Z Aug 20 20:17:24 lmtp(56): Info: Disconnect from 172.16.7.10: Logged out (state=READY)
2023-08-21T01:18:23.166671000Z WARNING:root:Disabling hardened-malloc on this CPU
2023-08-21T01:18:23.498318000Z Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)
2023-08-21T01:18:23.498512000Z Aug 20 20:18:23 master: Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)
2023-08-21T01:36:28.971116000Z WARNING:root:Disabling hardened-malloc on this CPU
2023-08-21T01:36:29.019423000Z Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)
2023-08-21T01:36:29.019639000Z Aug 20 20:36:29 master: Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)
2023-08-21T01:37:45.890071000Z WARNING:root:Disabling hardened-malloc on this CPU
2023-08-21T01:37:45.938297000Z Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)
2023-08-21T01:37:45.938515000Z Aug 20 20:37:45 master: Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)

@geckolinux
Copy link

geckolinux commented Aug 23, 2023

Same sort of issue here, but the bug report was closed. They claimed the Dovecot error is unimportant: #2886

For the sake of completeness I'm attaching my mailu.env and docker-compose.yml :

@DrDoug88
Copy link
Author

Thanks @geckolinux. I searched through the issues, but did not run across the one you referenced. I'm not sure why it was closed. There is definitely a bug, most likely in the imap container.

@DrDoug88
Copy link
Author

I think this could possibly be the fix: #2913. I have disabled ipv6 on my server. I added the dovecot override and did a reboot without issues. But again, this one has been elusive to report since it doesn't happen every time.

@geckolinux
Copy link

geckolinux commented Aug 24, 2023

Thanks a lot @DrDoug88 for confirming that.

I'm not sure if IMAP is also part of the problem, but in my case I can confirm that POP3 was failing when my third party webapp tried to access it. Also the redirection error that prevented the web interface from working.

Thanks for finding #2913, so in that case is it a problem if IPv6 is enabled or disabled on the server? And at what level? At the container level or the Linux kernel level? In my case the server doesn't have a public IPv6 address, but I also haven't explicitly disabled the IPv6 stack.

But again, this one has been elusive to report since it doesn't happen every time.

Yes, definitely this is a challenge. The randomness makes me think it's some kind of a race condition or missing service dependency. In my case the bug doesn't usually seem to occur unless the server was up for more than about 24h before the reboot.

@DrDoug88
Copy link
Author

I disabled IPv6 on the host via net.ipv6.conf.all.disable_ipv6 = 1 in /etc/sysctl.conf. I also have "ipv6": false in /etc/docker/daemon.json. I too had troubles with the web interface as well as the other services (IMAP, POP, etc.) when this issue occurred. I'm not sure if it occurred because ipv6 was enabled or disabled on the server or not. It could be those settings that affected it, or something else entirely. My thought for now is: set the mailu-imap container to only listen on IPv4 and see if the issue occurs again... ;). I appreciate your responses though. Hopefully we can determine the root of the problem by process of elimination.

@nextgens
Copy link
Contributor

nextgens commented Aug 24, 2023

I am not sure why you think it is related to ipv6 and the other ticket...

Next time you are able to reproduce the issue please verify/confirm:

  • The content of /run/dovecot/master.pid (cat /run/dovecot/master.pid)
  • Whether there is an actual process (and which) with that PID.
  • If it is dovecot, why is the startup script run again (the container is obviously already running!?!) ?
  • What are the permissions of the file and parent folder (ls -ld /run/dovecot /run/dovecot/master.pid) ?

@DrDoug88
Copy link
Author

I am not sure why you think it is not related to ipv6. There seems to be a myriad of issues related to the imap/dovecot container. Other issues on this topic have been raised and closed without proper resolution. I will provide the information that you have requested. I know you are very actively working on this project @nextgens, I see you responding to nearly every issue. And I have nothing but gratitude for your hard work. But please read the room. We are just trying to get to the bottom of the problem man.

@nextgens
Copy link
Contributor

I am not sure why you think it is not related to ipv6.

There is nothing in the logs you have provided that suggests it is: all the IP addresses appearing in your logs are ipv4.

geckolinux keeps unhelpfully adding noise to existing tickets... that in all likelihood and by his own admission are unrelated. I suggest you do like me and ignore him until he bothers to put a description of the symptoms, the relevant config and logs on the same ticket.

@geckolinux
Copy link

@nextgens So please confirm, you would like duplicate tickets to be opened when the same symptoms are present?

@nextgens
Copy link
Contributor

@geckolinux here you are adding noise about your config while complaining about POP3 and webmail while saying "I'm not sure if IMAP is also part of the problem". How is that related to a problem with the PID file (which is the Fatal error reported here)?

On #2886 you were complaining about a log message that I told you is expected and harmless. The ticket has been closed because the reporter provided no logs nor config file.

On the face of it the symptoms exhibited are not the same.

@DrDoug88
Copy link
Author

They are entirely related. @geckolinux thankfully pointed me to that ticket, not noise, very helpful. In #2886, the OP says that the webmail is not working. The next responses seem to indicate IMAP/dovecot, and "Fatal: Dovecot is already running with PID 9". This is what I was running into. @geckolinux responded that he too was having the same issues, but only after the server was up for more than 24 hours. Then the ticket was closed without explanation.

The objective of the forum concept is for others to find related issues and engage on those topics rather than creating duplicates. We are CLEARLY experiencing the same symptoms.

On my ticket, I confirmed that it was the imap container, since on one instance when this occurred, I noticed the errors in the imap container, then I restarted only that container, and the issue was resolved, i.e. everything worked. This has happened randomly a handful of times to me over the past few months, but it only occurs on server reboot. I had originally thought it was webmail related too. I didn't have the time to investigate and report the bug properly, but now I do.

I love Mailu, it's awesome, and I want to do my part to help from the user side. I spun up a test machine solely because of this issue, to test and find it, so that I could get the right kind of information/logs together to report it properly. In fact, I will even spin up a second test machine, one with both ipv4 and ipv6, and the other with only ipv4, to see if I can get the issue to reoccur. I have yet to replicate the issue since adding the listening interface override in dovecot.conf.

@geckolinux
Copy link

geckolinux commented Aug 25, 2023

I'm afraid I honestly don't understand. Is not the error from my logs Fatal error: Dovecot is already running that I mentioned here the same error that @DrDoug88 is reporting? He is using IMAP, and I am using POP3. Is not Dovecot responsable for both IMAP and POP3 in Mailu? And is a "Fatal" error insignificant?

At any rate, I will report back with the requested information from #2917 (comment) the next time this happens.

@DrDoug88
Copy link
Author

DrDoug88 commented Aug 25, 2023

Okay, I created a fresh VM, performed the installation, did a reboot. This one is with ipv4 and ipv6, i.e. no dovecot override. Upon reboot, the issue has occurred. Here are the details that you have requested:

  1. Contents of master.pid:
c8622416fc70:/run/dovecot# cat master.pid
9
  1. Processes:
c8622416fc70:/run/dovecot# ps -A
PID   USER     TIME  COMMAND
    1 root      0:01 python3 /start.py
    9 mail      0:00 python3 /start.py
   58 root      0:00 bash
   76 root      0:00 ps -A
  1. Permissions:
c8622416fc70:/run/dovecot# ls -ld /run/dovecot /run/dovecot/master.pid
drwxr-xr-x    5 root     root          4096 Aug 25 03:14 /run/dovecot
-rw-------    1 root     root             2 Aug 25 03:11 /run/dovecot/master.pid

Here is the accompanying log:

2023-08-25T08:11:35.515076000Z WARNING:root:Disabling hardened-malloc on this CPU
2023-08-25T08:11:35.695948000Z Aug 25 03:11:35 master: Info: Dovecot v2.3.20 (80a5ac675d) starting up for imap, pop3, lmtp, sieve
2023-08-25T08:14:54.996990000Z Aug 25 03:14:54 lmtp(57): Error: SSL context initialization failed, disabling SSL: Can't load SSL certificate (ssl_cert setting): The certificate is empty
2023-08-25T08:14:55.004778000Z Aug 25 03:14:55 lmtp(57): Info: Connect from 172.16.7.9
2023-08-25T08:14:55.117758000Z Aug 25 03:14:55 lmtp(mailadmin@example.com)<57><HXFCAP9i6GQ5AAAAZXHIzA>: Info: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'
2023-08-25T08:14:55.152520000Z Aug 25 03:14:55 indexer-worker(mailadmin@example.com)<63><HXFCAP9i6GQ5AAAAZXHIzA:0GPsCP9i6GQ/AAAAZXHIzA>: Info: FTS Xapian: 'INBOX' (/mail/mailadmin@example.com/xapian-indexes/db_d8b18603ff62e864390000006571c8cc) indexes do not exist. Initializing DB
2023-08-25T08:14:55.173329000Z Aug 25 03:14:55 lmtp(57): Info: Disconnect from 172.16.7.9: Logged out (state=READY)
2023-08-25T08:14:56.644450000Z Aug 25 03:14:56 lmtp(57): Info: Connect from 172.16.7.9
2023-08-25T08:14:56.712615000Z Aug 25 03:14:56 lmtp(mailadmin@example.com)<57><MK1gJgBj6GQ5AAAAZXHIzA>: Info: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'
2023-08-25T08:14:56.735001000Z Aug 25 03:14:56 lmtp(57): Info: Disconnect from 172.16.7.9: Logged out (state=READY)

System Rebooted...

2023-08-25T08:16:37.759918000Z WARNING:root:Disabling hardened-malloc on this CPU
2023-08-25T08:16:37.966560000Z Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)
2023-08-25T08:16:37.966822000Z Aug 25 03:16:37 master: Fatal: Dovecot is already running with PID 9 (read from /run/dovecot/master.pid)

I can leave this machine in this state as long as necessary and perform any other requested actions.

@nextgens
Copy link
Contributor

@DrDoug88 thank you for proving my point.

I will leave the two of you investigate v6 related issues while I work on the PR to fix the issue you have reported.

@DrDoug88
Copy link
Author

I have no idea what you are talking about. It was me that suggested the idea of it being ipv6. I thought maybe since I had disabled it on the host side, but the container might be trying to bind to it and causing problems. I've seen networking do weird things.

@geckolinux
Copy link

@DrDoug88 Thanks very much for setting up the test instance and providing the details in #2917 (comment) . Apparently the root cause of this bug is what I theorized in #2917 (comment):

The randomness makes me think it's some kind of a race condition

I imagine that the additional complexity of the enabled IPv6 stack was changing the timing of some OS events, making it more likely for stale PID files to prevent some Mailu components from starting up, including IMAP, POP3, and possibly the webmail redirect issue described in #2886.

@nextgens nextgens added priority/p0 Critical bug without workaround / Must have type/bug Bug. Not working as intended labels Aug 28, 2023
@bors bors bot closed this as completed in d026f70 Aug 28, 2023
@DrDoug88
Copy link
Author

DrDoug88 commented Sep 4, 2023

Thank you @nextgens for fixing the bug, and more in general for all that you do. I haven't had a chance to pull the changes and test it. But from your response, it seems like you were able to specifically identify it and address it accordingly. I hope that that this entire exchange amongst the three of us can at least convey the idea that the reporting process is not entirely clean and perfect as we would all like. When it comes to something that is not easy to replicate, a little bit of back and forth discussion is not the end of the world. Anyhow, thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/p0 Critical bug without workaround / Must have type/bug Bug. Not working as intended
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants