Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes send fails to close sockets #608

Closed
magthe opened this issue Nov 28, 2020 · 8 comments
Closed

Sometimes send fails to close sockets #608

magthe opened this issue Nov 28, 2020 · 8 comments

Comments

@magthe
Copy link

magthe commented Nov 28, 2020

I've run into a problem with running out of file descriptors. I suspect that use of Network.AWS.Response.receiveNull results in the program not closing sockets properly.

The following snippet is a trimmed down version of what I'm doing:

main :: IO ()
main = do
  awsEnv <- newEnv Discover
  runAWSCond awsEnv $
    sqsSource queueUrl
      .| C.mapC snd
      .| sqsDeleteSink queueUrl
  where
    runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit

sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
sqsSource queueUrl = do
  (_, msgs) <- C.lift $ recvSQS queueUrl
  C.yieldMany msgs
  sqsSource queueUrl

sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
sqsDeleteSink queueUrl = do
  C.await >>= \case
    Nothing -> pure ()
    Just receiptHandle -> do
      void $ C.lift $ delSQS queueUrl receiptHandle
      sqsDeleteSink queueUrl

recvSQS queueUrl = do
  let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
  rmrs <- send rm
  let status = rmrs ^. rmrsResponseStatus
      msgs = rmrs ^. rmrsMessages & traversed %~ extract
  pure (status, catMaybes msgs)
  where
    extract msg = do
      body <- msg ^. mBody
      rh <- msg ^. mReceiptHandle
      pure (body, rh)

delSQS queueUrl receiptHandle = do
  let dm = deleteMessage queueUrl receiptHandle
  send dm

This works fine for a while, but given a queue with enough messages it will fail with something like

TransportError (HttpExceptionRequest Request {
  host                 = "sqs.eu-central-1.amazonaws.com"
  port                 = 443
  secure               = True
  requestHeaders       = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
  path                 = "/"
  queryString          = ""
  method               = "POST"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 0
  responseTimeout      = ResponseTimeoutMicro 70000000
  requestVersion       = HTTP/1.1
}
 (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "sqs.eu-central-1.amazonaws.com", service name: Just "443"): does not exist (System error)))

After some detours I found out that it's actually not a network issue, but rather that the process runs out of file descriptors. Using lsof I can see that it doesn't seem to close /any/ sockets at all, instead they get stuck in a CLOSE_WAIT state:

COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
wd-stats 88674 magnus   23u  IPv4 815196      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60624->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus   24u  IPv4 811362      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43482->52.119.189.184:https (CLOSE_WAIT)
wd-stats 88674 magnus   25u  IPv4 811386      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60628->52.119.188.213:https (CLOSE_WAIT)
wd-stats 88674 magnus   26u  IPv4 813527      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43486->52.119.189.184:https (CLOSE_WAIT)
...

However, if I don't delete from the queue, i.e. comment out the call to delSQS in sqsDeleteSink, then I don't see this behaviour -- the number of open sockets remain constant!

The big difference I can see between sending a ReceiveMessage and a DeleteMessage is their definition of response (part of AWSRequest). ReceiveMessage uses receiveXMLWrapper while DeleteMessage uses receiveNull.

I had a look at the implementations of both receive* functions, but quickly realised that it'll take me quite some time to make sense of them, so I thought I\d create this issue to see if someone more knowledgable could quickly confirm or dispel my suspicion.

@magthe
Copy link
Author

magthe commented Nov 28, 2020

I should have given a bit more context. Hopefully this is enough:

/nix/store/1142wacgnqr9ia0v910r3dp9j2k8ffd4-ghc-8.8.4-with-packages/lib/ghc-8.8.4/package.conf.d
    ...
    amazonka-1.6.1
    amazonka-core-1.6.1
    amazonka-sqs-1.6.1
    ...
    base-4.13.0.0
    ...
    ghc-8.8.4
    ...
    http-client-0.6.4.1
    http-client-tls-0.3.5.3
    http-conduit-2.3.7.3
    http-types-0.12.3
    ...
    network-3.1.1.1
    ...
    xml-conduit-1.9.0.0
    ...
Linux magthecomp 5.9.10-arch1-1 #1 SMP PREEMPT Sun, 22 Nov 2020 14:16:59 +0000 x86_64 GNU/Linux

@brendanhay
Copy link
Owner

OTOH possibly related #490 - is it possible for you to use the develop branch?

@magthe
Copy link
Author

magthe commented Nov 28, 2020

Yes, #490 sounds very related, I'd even say mine might be a duplicate 😁

I'll see if I can get a develop branch built. It'll be an interesting mix of nix, Cabal, and GitHub :)

@magthe
Copy link
Author

magthe commented Nov 30, 2020

It took me a little while to wrestle with nix, but now I'm convinced that this is a duplicate of #490 !

Any chance of seeing a release with the fix on Hackage soon-ish?

@brendanhay
Copy link
Owner

As others can attest, any promises on my end about releases will only end up in tears. Therefore, I'll mention that I am actively working weeknights on a nixified main branch and associated generator tooling. Once that's done I'll switch the default branch over and prepare a release.

@mbj
Copy link

mbj commented Nov 30, 2020

@brendanhay thanks for all your efforts. I know (first hand) thanks to not buy any beers / living expenses. But its still better than nothing.

@magthe
Copy link
Author

magthe commented Nov 30, 2020

It's no hurry, I have no problems with using the solution I have for building Amazonka from GitHub for a while. Take your time!

@magthe
Copy link
Author

magthe commented Nov 30, 2020

Closing as it's a duplicate of #490 .

@magthe magthe closed this as completed Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants