Flush chttpd_db monitor refs on demonitor #4906

chewbranca · 2023-12-09T00:37:41Z

This ensures that the coordinator process flushes on demonitoring of the attachment refs in chttpd_db. The problem here is that it's possible to receive a 'DOWN' message for the monitor ref that is not receive'ed, causing it to stick around in the coordinator message queue while the next http request is handled. The pending message will not become apparent until the next fabric call is invoked, as fabric expects to have full access to all messages in the calling process, an expectation which is violated by the pending message, causing a case clause crash in the fabric receive message callbacks.

I noticed this during eunit runs with stubbed attachment handles that generate an immediate noproc message on the monitor call. Normal operations should not result in an immediate noproc result on monitoring the attachment process, however, any failure that causes the attachment process to fail between acquisition of the refs and the demonitor calls will induce this bug, causing the next http request handled by the particular chttpd coordinator pool processs to fail on whatever next fabric call is made.

This ensures that the coordinator process flushes on demonitoring of the attachment refs in chttpd_db. The problem here is that it's possible to receive a 'DOWN' message for the monitor ref that is not receive'ed, causing it to stick around in the coordinator message queue while the next http request is handled. The pending message will not become apparent until the next fabric call is invoked, as fabric expects to have full access to all messages in the calling process, an expectation which is violated by the pending message and causes a case clause crash in the fabric receive message callbacks. I noticed this during eunit runs with stubbed attachment handles that generate an immediate noproc message on the monitor call. Normal operations should not result in an immediate noproc result on monitoring the attachment process, however, any failure that causes the attachment process to fail between acquisition of the refs and the demonitor calls will induce this bug, causing the next http request handled by the particular chttpd coordinator pool processs to fail on whatever next fabric call is made.

We need to potentially extract the usage delta from the incoming RPC message. Rather than pattern match on all possible message formats that could potentially include usage deltas, we instead utilize rexi_util:extract_delta which matches against tuples ending in `{delta, Delta}`, and separates that out from the underlying message. The subtlety here is that receiving the message to extract the delta changes the behavior as this was previously doing a selective receive keyed off of the Ref, and then ignoring any other messages that arrived. I don't know if the selective receive was intended, but I don't think it's appropriate to leave unexpected messages floating around, especially given things like issue #4909. Instead of utilizing a selective receive, this switches to extracting the message and delta like we need to do, and then in the event it finds unexpected messages they're logged and skipped. This selective receive was masking the lack of unlink on the linked rexi_mon pid in fix #4906. I've also noticed some rpc responses arriving late as well, but I haven't tracked that down, so let's log when it does happen.

nickva approved these changes Dec 9, 2023

View reviewed changes

rnewson approved these changes Dec 12, 2023

View reviewed changes

chewbranca force-pushed the flush-chttpd-attachment-monitors branch 3 times, most recently from 5c07273 to c7e3587 Compare December 12, 2023 20:46

chewbranca force-pushed the flush-chttpd-attachment-monitors branch from c7e3587 to c3c364c Compare December 12, 2023 21:26

chewbranca merged commit fae8761 into main Dec 12, 2023
14 checks passed

janl deleted the flush-chttpd-attachment-monitors branch December 20, 2023 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flush chttpd_db monitor refs on demonitor #4906

Flush chttpd_db monitor refs on demonitor #4906

chewbranca commented Dec 9, 2023

Flush chttpd_db monitor refs on demonitor #4906

Flush chttpd_db monitor refs on demonitor #4906

Conversation

chewbranca commented Dec 9, 2023