Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with Error: socket hang up #1253

Closed
Mnkras opened this issue Mar 1, 2021 · 27 comments · Fixed by #1260
Closed

Crash with Error: socket hang up #1253

Mnkras opened this issue Mar 1, 2021 · 27 comments · Fixed by #1260
Labels
bug Something isn't working community core
Milestone

Comments

@Mnkras
Copy link

Mnkras commented Mar 1, 2021

Describe the bug
One of our services has started crashing, and the last thing printed out is the following:

 Error: socket hang up
     at createHangUpError (_http_client.js:332:15)
     at Socket.socketCloseListener (_http_client.js:373:25)
     at scope.activate (/apps/ts/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19)
     at Scope._activate (/apps/ts/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:54:14)
     at Scope.activate (/apps/ts/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19)
     at Socket.bound (/apps/ts/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:53:20)
     at Socket.emit (events.js:203:15)
     at Socket.EventEmitter.emit (domain.js:448:20)
     at TCP._handle.close (net.js:607:12)

If I disable tracing in the init via enabled:false, then it stops crashing, we didn't see this on older versions of dd-trace.

The exit status we get is 1

Environment

  • Operation system:
    Ubuntu 20.04
  • Node version:
    v10.19.0
  • Tracer version:
    v0.30.6
  • Agent version:
    Agent (v7.24.1)
@Mnkras Mnkras added the bug Something isn't working label Mar 1, 2021
@superwesker1988
Copy link

We might have just encountered the similar issue after the recent package update. Our error stack is like this:

/srv/package/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:18
      throw e
      ^


Error: socket hang up
    at connResetException (internal/errors.js:609:14)
    at Socket.socketCloseListener (_http_client.js:401:25)
    at /srv/package/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
    at Scope._activate (/srv/package/node_modules/dd-trace/packages/dd-trace/src/scope/async_resource.js:47:14)
    at Scope.activate (/srv/package/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19)
    at Socket.bound (/srv/package/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:53:20)
    at Socket.emit (events.js:326:22)
    at Socket.EventEmitter.emit (domain.js:483:12)
    at TCP.<anonymous> (net.js:675:12)
    at TCP.callbackTrampoline (internal/async_hooks.js:126:14)

dd-trace version: 0.31.0

@rochdev
Copy link
Member

rochdev commented Mar 3, 2021

What was the last version where the error didn't occur? This will help isolating the change that may be causing this.

@superwesker1988
Copy link

For us, our previous version is "0.26.1"

@Mnkras
Copy link
Author

Mnkras commented Mar 3, 2021

ours was "0.25.1"

@seokgyo
Copy link

seokgyo commented Mar 4, 2021

In our case, "0.30.1" works well, but "0.30.6" does not.

@rochdev
Copy link
Member

rochdev commented Mar 4, 2021

I've been unable to reproduce so far, so would like to rule out commits if possible. Would it be possible for anyone to try 0.30.2 in some test environment? There was a pretty big commit between 0.30.1 and 0.30.2 so I'd like to either rule it out or confirm that it might actually be the cause. Also, is this something that can be reproduced in a snippet that could be shared here?

@trajanoreuter
Copy link

@rochdev we are having the same problem with dd-trace, we have already downgraded to version 0.30.1 and 0.29.0.
Our problem comes from a call via axios that has suffered a timeout, so we were able to reproduce the behavior based on this code.

To do this, we are running node 12.20.1.
When we remove the dd-trace, the error stops.

const axios = require('axios')
const MockAdapter = require('axios-mock-adapter')

const mock = new MockAdapter(axios)

mock.onPost('/test').timeout()

const main = async () => {
  try {
    const { data } = await axios.post('/test')

    console.log(data)

  } catch (error) {
    console.log('error', error)
  }
}

main()

@Mnkras
Copy link
Author

Mnkras commented Mar 4, 2021

Great find, I can confirm that we are using axios as well.

@rochdev
Copy link
Member

rochdev commented Mar 4, 2021

@trajanoreuter Thanks for sharing this snippet! How do I use it to reproduce the issue? I get the same error Error: timeout of 0ms exceeded error regardless of if dd-trace is used or not.

@superwesker1988
Copy link

Just follow up with the issue, we did the dd-trace upgrade due to axios vulnerabilities, so our axios version for the whole service is 0.21.1. So maybe the newer axio is causing issue?

@rochdev
Copy link
Member

rochdev commented Mar 4, 2021

@superwesker1988 That would make sense, since the error reported in this thread doesn't look like it's coming from the tracer. The tracer is only involved because of the scope manager that would appear on any event stack trace. We weren't able to identify what change we did that could cause issues without changing any of our code, but updating Axios would definitely make sense. I'm still unable to replicate even with 0.21.1 however, but when I'm able to do it we should be able to quickly fix any incompatibility between dd-trace and that version of Axios.

@0xkalvin
Copy link

0xkalvin commented Mar 4, 2021

Had the same result using an Axios call as @trajanoreuter brought up

Agent: 6.24.1
dd-trace: 0.28.0
Axios: 0.20.0

timeout of 50ms exceeded

/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:18
throw e
^
Error: socket hang up
at connResetException (internal/errors.js:610:14)
at TLSSocket.socketCloseListener (_http_client.js:400:25)
at /server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
at Scope._activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:54:14)
at Scope.activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19)
at TLSSocket.bound (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:53:20)
at TLSSocket.emit (events.js:327:22)
at net.js:674:12
at TCP.done (_tls_wrap.js:567:7)
at TCP.callbackTrampoline (internal/async_hooks.js:120:14)
Emitted 'error' event on Writable instance at:
at ClientRequest.eventHandlers.<computed> (/server/node_modules/follow-redirects/index.js:13:24)
at /server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
at Scope._activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:54:14)
at Scope.activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19)
at ClientRequest.bound (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:53:20)
at ClientRequest.emit (events.js:327:22)
at ClientRequest.req.emit (/server/node_modules/dd-trace/packages/datadog-plugin-http/src/client.js:106:21)
at TLSSocket.socketCloseListener (_http_client.js:400:11)
at /server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
at Scope._activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:54:14) {
code: 'ECONNRESET'
}
2021-03-04T16:35:58.397-03:00	npm ERR! code ELIFECYCLE
2021-03-04T16:35:58.397-03:00	npm ERR! errno 1

@superwesker1988
Copy link

Had the same result using an Axios call as @trajanoreuter brought up

Agent: 6.24.1
dd-trace: 0.28.0
Axios: 0.20.0

timeout of 50ms exceeded

/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:18
throw e
^
Error: socket hang up
at connResetException (internal/errors.js:610:14)
at TLSSocket.socketCloseListener (_http_client.js:400:25)
at /server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
at Scope._activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:54:14)
at Scope.activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19)
at TLSSocket.bound (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:53:20)
at TLSSocket.emit (events.js:327:22)
at net.js:674:12
at TCP.done (_tls_wrap.js:567:7)
at TCP.callbackTrampoline (internal/async_hooks.js:120:14)
Emitted 'error' event on Writable instance at:
at ClientRequest.eventHandlers.<computed> (/server/node_modules/follow-redirects/index.js:13:24)
at /server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
at Scope._activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:54:14)
at Scope.activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:12:19)
at ClientRequest.bound (/server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:53:20)
at ClientRequest.emit (events.js:327:22)
at ClientRequest.req.emit (/server/node_modules/dd-trace/packages/datadog-plugin-http/src/client.js:106:21)
at TLSSocket.socketCloseListener (_http_client.js:400:11)
at /server/node_modules/dd-trace/packages/dd-trace/src/scope/base.js:54:19
at Scope._activate (/server/node_modules/dd-trace/packages/dd-trace/src/scope/async_hooks.js:54:14) {
code: 'ECONNRESET'
}
2021-03-04T16:35:58.397-03:00	npm ERR! code ELIFECYCLE
2021-03-04T16:35:58.397-03:00	npm ERR! errno 1

This is almost identical to the issue we have encountered.

@trajanoreuter
Copy link

trajanoreuter commented Mar 4, 2021

@rochdev in our case we have the problem related to the socket hang up while our application is running and collecting metrics.
We have this configuration at the application entrypoint

require('dd-trace').init({ runtimeMetrics: true })

And we are using "axios": "0.21.1" .

We also tried with different agents, 7.25.1, 7.26.0 and 6.24.1 like @0xkalvin said.

@rochdev
Copy link
Member

rochdev commented Mar 4, 2021

Actually it looks like axios was updated in dd-trace 0.29.2, but there are several reports in this thread that a few versions after that didn't have the issue, so I'm not sure it's directly related, or at least it wouldn't be the only cause.

@trajanoreuter
Copy link

@rochdev Is there anything that can be done to avoid this problem until we don't know how to fix and what is the cause?

@rochdev
Copy link
Member

rochdev commented Mar 5, 2021

A few things potentially, none of which are ideal but could temporarily mitigate the issue:

  • As it looks like the http integration is involved, disabling the http plugin might help. You will still see server requests but not client requests. This can be done with tracer.use('http', { client: false }). If the error still occurs without the http plugin it may indicate an issue with the exporter instead.
  • If it is indeed somehow related to a new version of axios, fixing the range to an older version might work. The tracer itself doesn't use axios unless profiling is enabled, so if the issue is an incompatibility with newer versions of axios it won't trigger it when profiling is disabled (the default).

I'm still trying to replicate this, to no avail so far. If anyone has a snippet that I could run to replicate this then it should be a quick fix. Unfortunately, for this type of issue, replicating it is usually the most difficult part.

@superwesker1988
Copy link

superwesker1988 commented Mar 5, 2021

One detail in our case I would like to add was that the issue constantly happened in a dev env where our Data Dog APM was disabled (yeah, our service was mis configured in that env). Not sure if this would help.

@rochdev
Copy link
Member

rochdev commented Mar 5, 2021

@superwesker1988 Can you provide more details about how it was disabled? Was it through configuration, some condition, disabled in the agent, etc? There are reports above of the issue not happening when dd-trace is disabled so I'm trying to get a sense of what seems to trigger it.

@superwesker1988
Copy link

Hi @rochdev , I talked to our infra teams, it seems like it was done via configuration:

enable_datadog_apm: "false"
enable_datadog_traceid_loginjection: "true"
dd_trace_disabled_plugins: "fs,dns,redis"
dd_trace_samplerate: "1"

@rochdev
Copy link
Member

rochdev commented Mar 5, 2021

I was just able to reproduce the issue, looking into the root cause now.

@rochdev
Copy link
Member

rochdev commented Mar 5, 2021

Turns out the issue was a bug in the tracer that was triggered by follow-redirects/follow-redirects@ad15f9b. Working on a fix now and will release a new version as soon as possible. Thanks all for your help and patience on this issue.

@lucianopf
Copy link

First of all, thanks for all the effort put into this fix @rochdev .

Do we have a ETA for this PR to be applied?

@rochdev
Copy link
Member

rochdev commented Mar 8, 2021

@lucianopf I'm working on the release right now, so should be within the next hour.

@rochdev rochdev added this to the 0.31.1 milestone Mar 8, 2021
@rochdev
Copy link
Member

rochdev commented Mar 8, 2021

Fix released in 0.31.1

@vladimirnani
Copy link

Hey, im still getting timeout while using https://github.com/DataDog/datadog-lambda-js/tree/v4.66.0

Error: timeout of 1000ms exceeded
    at createError (/opt/nodejs/node_modules/axios/lib/core/createError.js:16:15)
    at RedirectableRequest.handleRequestTimeout (/opt/nodejs/node_modules/axios/lib/adapters/http.js:328:16)
    at RedirectableRequest.emit (events.js:400:28)
    at RedirectableRequest.emit (domain.js:475:12)
    at Timeout.<anonymous> (/opt/nodejs/node_modules/follow-redirects/index.js:164:12)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7) {
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [Function: httpAdapter],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 1000,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    validateStatus: [Function: validateStatus],

@rochdev
Copy link
Member

rochdev commented Nov 30, 2021

@vladimirnani This is a different error so I would say to open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community core
Projects
None yet
8 participants