Description
I've run into a problem with running out of file descriptors. I suspect that use of Network.AWS.Response.receiveNull
results in the program not closing sockets properly.
The following snippet is a trimmed down version of what I'm doing:
main :: IO ()
main = do
awsEnv <- newEnv Discover
runAWSCond awsEnv $
sqsSource queueUrl
.| C.mapC snd
.| sqsDeleteSink queueUrl
where
runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit
sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
sqsSource queueUrl = do
(_, msgs) <- C.lift $ recvSQS queueUrl
C.yieldMany msgs
sqsSource queueUrl
sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
sqsDeleteSink queueUrl = do
C.await >>= \case
Nothing -> pure ()
Just receiptHandle -> do
void $ C.lift $ delSQS queueUrl receiptHandle
sqsDeleteSink queueUrl
recvSQS queueUrl = do
let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
rmrs <- send rm
let status = rmrs ^. rmrsResponseStatus
msgs = rmrs ^. rmrsMessages & traversed %~ extract
pure (status, catMaybes msgs)
where
extract msg = do
body <- msg ^. mBody
rh <- msg ^. mReceiptHandle
pure (body, rh)
delSQS queueUrl receiptHandle = do
let dm = deleteMessage queueUrl receiptHandle
send dm
This works fine for a while, but given a queue with enough messages it will fail with something like
TransportError (HttpExceptionRequest Request {
host = "sqs.eu-central-1.amazonaws.com"
port = 443
secure = True
requestHeaders = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
path = "/"
queryString = ""
method = "POST"
proxy = Nothing
rawBody = False
redirectCount = 0
responseTimeout = ResponseTimeoutMicro 70000000
requestVersion = HTTP/1.1
}
(ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "sqs.eu-central-1.amazonaws.com", service name: Just "443"): does not exist (System error)))
After some detours I found out that it's actually not a network issue, but rather that the process runs out of file descriptors. Using lsof
I can see that it doesn't seem to close /any/ sockets at all, instead they get stuck in a CLOSE_WAIT
state:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
wd-stats 88674 magnus 23u IPv4 815196 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:60624->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus 24u IPv4 811362 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:43482->52.119.189.184:https (CLOSE_WAIT)
wd-stats 88674 magnus 25u IPv4 811386 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:60628->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus 26u IPv4 813527 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:43486->52.119.189.184:https (CLOSE_WAIT)
...
However, if I don't delete from the queue, i.e. comment out the call to delSQS
in sqsDeleteSink
, then I don't see this behaviour -- the number of open sockets remain constant!
The big difference I can see between sending a ReceiveMessage
and a DeleteMessage
is their definition of response
(part of AWSRequest
). ReceiveMessage
uses receiveXMLWrapper
while DeleteMessage
uses receiveNull
.
I had a look at the implementations of both receive*
functions, but quickly realised that it'll take me quite some time to make sense of them, so I thought I\d create this issue to see if someone more knowledgable could quickly confirm or dispel my suspicion.