Skip to content

Unhandled raw socket ECONNRESET before normal pg error handling crashes the process #3658

@arvinds

Description

@arvinds

Hi, I ran into a worker crash loop using pg through pg-boss and pg.Pool on Node 20. It looks like there may be a connect-time raw socket error window before pg normal error handling is attached.

What I observed

  • periodic worker crashes with uncaught ECONNRESET
  • the crash signature consistently looked like a raw Socket with:
    • bytesRead: 0
    • bytesWritten: 20
    • no socket error listeners attached
  • this looked like a brand new PostgreSQL connection attempt that reset before the usual pg client error path was ready

Why I think this belongs in pg rather than pg-boss

  • the failure appears to happen during raw socket connect, before a client is fully established
  • in my app, adding listener coverage at the pg-boss or pool-client level was not sufficient
  • pg@8.11.5 lib/stream.js creates the socket with new net.Socket()
  • pg@8.11.5 lib/connection.js calls stream.connect(...) before attaching stream.on("error", reportStreamError)

What I tried

  • adding listener coverage after pool or client connect was not sufficient
  • I also tried a local pg patch in this area, but I was not able to make it sufficient in my environment
  • the mitigation that consistently stopped the crash loop was a worker-level guard that suppresses only raw-socket ECONNRESET when listenerCount("error") === 0

Relevant observed signature

  • raw Socket
  • ECONNRESET
  • bytesRead: 0
  • bytesWritten: 20

Environment

  • Node 20
  • pg 8.11.5
  • usage via pg-boss / pg.Pool

Possible fix
One possible direction would be to protect the raw socket before connect() can emit an error, for example by either:

  • attaching the socket error listener before calling stream.connect(...) in pg/lib/connection.js
  • or ensuring sockets created in pg/lib/stream.js have temporary early error handling until the normal pg connection error path is attached

Questions

  • is this a known connect-time race in pg?
  • is there a recommended way to ensure the raw socket is protected before connect() can emit an error?
  • would maintainers be open to a change here, or is there an existing fix or workaround I missed?

I can provide a proposed patch and more detailed logs if helpful.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions