-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random process crashes on the (almost) lastest portable #929
Comments
Can you reproduce without your filter ? |
I don’t know how to reproduce it, it didn’t crash when I was trying to reproduce it. And I cannot let it run in production without that filter. |
At morning it crashed with a different error message:
|
On Wed, Aug 21, 2019 at 08:43:50AM -0700, Jakub Jirutka wrote:
At morning it crashed with a different error message:
Aug 21 08:59:32 no-reply mail.crit smtpd[2536]: smtpd: process ca socket closed
i'm without smtp out trafic at home, will check if i can setup an alpine
on a vps with smtp-out somewhere and do some testing
… |
I can provide you SSH access to some VM with Alpine installed if it would help you. |
That would be helpful yes if it can send mail to the outside, i'll setup something between that vm and my machines to try reproducing. |
gilles@185.8.164.31 |
Just a note, I’ve created the VM in the playground environment, so it will be available only for 1 month. |
I’ve encountered a different crash, now from SSL library:
This version of OpenSMTPD is very unreliable for me. :( |
Yes, I haven't had time to check yet but this is high on my list |
Any change…? |
not yet but i've just entered my "opensource" week so I'm on it |
I'll address the two issues separately: Regarding the initial crash, I've tried but I'm unable to reproduce on an alpine setup similar to yours both in terms of using OpenSSL and in terms of setup using a relay host. I'll keep it running a while and taking real trafic but not knowing if your filter caused it is an issue for troubleshooting as we've committed four days ago a diff to fix a race condition that could crash in filter layer. Do you still experience the crash ? Regarding the SSL library stuff, you mentioned a different crash but your sample logs doesn't show a crash, what it shows is a TLS negotiation error. Did OpenSMTPD crash after that negotiation ? |
If the filter caused a crash, I would expect OpenSMTPD to log it (“lost processor: dynproc:00000001 exited abnormally” or “misbehaving filter”), but I don’t see any message regarding the filter.
I’ll update my instance to the latest portable and let you know if something changed.
IIRC it crashed right after this message, but I’m not 100% sure now. I’ve already fixed the problem on the client side trying to use SSLv3. |
Not necessarily, when I said "if your filter caused a crash" I didn't mean it necessarily as your filter crashing, it could also be that your filter does something which causes OpenSMTPD itself to crash. This is what happened in the race condition we fixed this week, where if a session was disconnected for some reason while a filter was working, upon response of the filter, the session would no longer exist and OpenSMTPD would fatal() assuming a corruption. This was a crash in the daemon, so not logged as a lost processor, but which would only ever happen with a filter processing a specific phase.
good, I have committed your diff to includes.h so you don't need it anymore
OK, let me know how it goes, as far as I'm concerned, I have produced multiple TLS errors ranging from no TLS on the TLS listener, random data on the TLS listener, wrong ciphers on the TLS listener, and all I get is a disconnect with no crash. |
Today it crashed four times and every time with the following message (with different token and sid ofc):
I didn’t see this message before, so it’s new after the update. I don’t understand what’s wrong with this particular message. I’m not modifying it in any way, just printing what I read on stdin. Relevant lines from the filter: "filter" == $1 {
if (NF < 7) {
die("invalid filter command: expected >6 fields!")
}
sid = $6
token = $7
line = substr($0, length($1$2$3$4$5$6$7) + 8)
# continue with next rule...
}
"filter|smtp-in|data-line" == $1_$4_$5 {
...
print("filter-dataline", token, sid, line)
} |
I think you're still running the code from 7275812 and not the latest code from the portable branch. The error message you've shown can only trigger for that line if you're running the code from 7275812 as it contains a typo which I fixed in the very next commit: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.sbin/smtpd/lka_proc.c.diff?r1=1.9&r2=1.10&f=h Can you update and confirm ? |
Ah, sorry, you’re right! I really did update OpenSMTPD, but to the package built from 7275812, not the latest one – I forgot to install the update after updating the package some time ago and didn’t notice the version number now. Anyway, now I’m really running on the latest portable. I will let you know tomorrow if it crashed. |
cool |
Hi, Any news ? |
This one came out of nowhere:
|
@poolpOrg, any news? |
nope, got context switched but this is still being investigated, whole october is on stabilization |
I have two possible candidates for this bug which would explain why I don't observe them. Investigation still in progress. |
OK, I'm out of ideas. I've setup a brand new alpine machine, built OpenSMTPD with OpenSSL 1.1.1c, I tested incoming and outgoing TLS, I tested DNS-resolved MX and relay host MX over TLS, received and sent thousands of mails and did not observe a single crash. I need to be able to replicate your setup otherwise we're out of luck fixing this, I've seen your config but this is not enough for me to reproduce:
|
any chance of running this with debug symbols compiled in and core dumps enabled? |
I’m using Alpine v3.10, there’s currently OpenSSL 1.1.1d. You can install the exact OpenSMTPD binary that I’m running from my packages: cd /etc/apk/keys
wget https://raw.githubusercontent.com/jirutka/user-aports/v3.10/.keys/jakub@jirutka.cz-56d0d9fd.rsa.pub
echo "@jirutka https://repo.jirutka.cz/alpine/v3.10/user" > /etc/apk/repositories
apk add opensmtpd@jirutka opensmtpd-dbg@jirutka It’s built from this APKBUILD.
It works only as a relay for clients on the local network and relays all the mails via single relay server. I don’t know if it happens only with some clients, I may check it, but there are only about 2-3 active clients anyway…
I thought that core dumps are generated only when the process crash, e.g. with segfault, isn’t it? This doesn’t seem to be the case, according to the logs OpenSMTPD just decides to quit… (I’ve installed debug symbols and enabled core dumps now.) |
Sorry, I was under the impression that the filter crashes and that takes OpenSMTPD with it. You're right of course. |
There's a possibility the crash was related to the fixes I applied, I'll wait for @jirutka to let us know if he still experiences a crash before rolling the new release. |
No, I’ve just upgraded and it’s still killing itself. >_<
I’m not its official maintainer, but I’m Alpine dev and I’m contributing to the opensmtpd package. |
Are you around and can you join IRC ? I have an hour to spare, if you are able to reproduce the crash and generate a core dump, then I can work from that to debug and produce a fix asap. |
Yes.
There’s no crash! That’s the whole point all the time, OpenSMTPD is not crashing, it’s quitting itself without logging the reason. |
Check dmesg. You might see segfault there
…On November 4, 2019 8:31:31 AM EST, Jakub Jirutka ***@***.***> wrote:
> Are you around and can you join IRC ?>
>
Yes.>
>
> if you are able to reproduce the crash and generate a core dump>
>
There’s no crash! That’s the whole point all the time, OpenSMTPD is not
_crashing_, it’s quitting itself without logging the reason.>
>
-- >
You are receiving this because you were mentioned.>
Reply to this email directly or view it on GitHub:>
#929 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
nah this is more subtle, there is a child process crashing, the exit 01 message is because the parent process exits with status 1 if it detects a child has disappeared. I'm on #OpenSMTPD @ freenode, hit me up and we'll figure this out fast. |
Uh, you’re right! There really are segfaults. So I have somehow broken logging on this system and haven’t noticed it until now (always looked only into
|
you may have them disabled by default, can you try to ulimit -c unlimited to increase the core size ? |
I’ve already set this for opensmtpd. I suspect OpenRC’s supervise-daemon that it maybe doesn’t propagate it correctly. :/ |
With help from @jirutka the issue seems to be properly understood: A recent change in musl's Right now, @jirutka is running with a diff to confirm the issue but given that I read the implementations for For the next release, I think the only solution for Alpine is to ship a patch in their package to NULL-protect all freeaddrinfo() calls until we sort out if this is going to stay or be reverted, or at the very least until the next major release if we decide to NULL-protect upstream. |
Awesome diagnosis @poolpOrg Currently OpenSMTPD package in alpine is unmaintained (Jonathan Curran told me that he is no longer maintaining the package) It was on my to-do list to update Alpine's package, so I can add the patch too. |
Thanks a lot to @poolpOrg for diagnosis!
Okay, I’m gonna take over maintainership. |
Awesome @jirutka, thanks a lot |
Ihor Antonov in OpenSMTPD/OpenSMTPD#929 (comment): > Currently OpenSMTPD package in alpine is unmaintained (Jonathan Curran > told me that he is no longer maintaining the package)
@poolpOrg, I have bad news – it crashed even with non-NULL argument to
(I still cannot get rid of OpenSMTPD built from 1c1bdb6 with this patch and libasr 1.0.3 with this patch. |
Aha! Passing
src/network/freeaddrinfo.c: ...
void freeaddrinfo(struct addrinfo *p)
{
size_t cnt;
for (cnt=1; p->ai_next; cnt++, p=p->ai_next);
struct aibuf *b = (void *)((char *)p - offsetof(struct aibuf, ai));
b -= b->slot;
LOCK(b->lock);
if (!(b->ref -= cnt)) free(b); // <--- HERE
else UNLOCK(b->lock);
} |
Not such a bad news, we have found the culprit function for sure, what have not found yet are all the possible ways for it to crash :-) Your stack shows that smtp_getaddrinfo_cb receives a What I think is happening is that the
it probably ends up calling free() on an invalid pointer. I'll check the difference in |
One possible way to test would be to make sure smtp_session.c has the two following includes:
Then throw the following at the end of the file:
And replacing the crashing freeaddrinfo() call with openbsd_freeaddrinfo() and check if it works better, but given that I read the code it's quite obvious it will ;-) |
I have investigated the issue and have a full understanding of the problem. The bottom line is that musl's This means that you may ONLY call So the only viable fix is that we consider My assessment is that there are only two places where we need self-releasing, one is resolver.c and the other the smtp_session.c bit you crashed on. The portable branch has a potential fix, can you update and report how it goes ? |
It looks very promising, ~40 relayed mails and no crash so far! |
Still stable after all these hours ? If so, I'll tag the release tonight :-) |
144 mails relayed and still stable! 👍 |
I'll let you close the issue, will tag 6.6.1p1 shortly, thanks for helping |
Thanks! |
I’m running OpenSMTPD built from 772da22 (+ this patch for musl compatibility) with OpenSSL 1.1.1c* on Alpine Linux 3.10 (musl libc) with filter opensmtpd-filter-rewrite-from. It’s randomly crashes during email relaying.
This is all it logs when running with
-v
:smtpd.conf:
* I wanted to build it against the last stable LibreSSL, but it doesn’t build due to some missing symbols…
/cc @poolpOrg
The text was updated successfully, but these errors were encountered: