Skip to content

Conversation

@bliuchak
Copy link
Contributor

@bliuchak bliuchak commented Oct 8, 2025

This PR implements HTTPS server support to proxy-chain.

Also fixed:

  • a datarace for error event when we might log same events
  • fix for usage statistics trackings

Otherwise my changes should be fully compatible with HTTP server and all the handlers.

Readiness checklist:

  • Implement HTTPS server
    • Basic implementation
    • Tooling to run HTTPS locally
    • Tests
  • Current state TLS overhead analysis
    • Investigate where TLS overhead might be possible (both in legacy and https implementation)
    • Verify TLS overhead tracked correctly for these cases
  • Implement TLS overhead bytes tracking

- Fix a datarace for error handler
- Add a regression test that verify datarace fix
- Add TLS defaults for better security
@github-actions github-actions bot added t-core-services Issues with this label are in the ownership of the core services team. tested Temporary label used only programatically for some analytics. labels Oct 8, 2025
@bliuchak bliuchak added the t-unblocking Issues with this label are in the ownership of the unblocking team. label Oct 8, 2025
@jirimoravcik
Copy link
Member

Also fixed:

  • a datarace for error event when we might log same events
  • fix for usage statistics trackings

Could you please point me to the changes that are related to the fixes? Thanks. Also, what was wrong with the statistics?

@bliuchak
Copy link
Contributor Author

bliuchak commented Oct 9, 2025

Also fixed:

  • a datarace for error event when we might log same events
  • fix for usage statistics trackings

Could you please point me to the changes that are related to the fixes? Thanks. Also, what was wrong with the statistics?

  1. Datarace
    1. Fix - e6adb19#diff-8a8ae07582c9d433ec8c2e5c4310ff8901e604f4965c5b90a49117ad46c47595R335
    2. Regression tests - https://github.com/apify/proxy-chain/pull/602/files#diff-d14cbfb50ed1cad7db5f4fef6a6076961b7cc9be980a3be06a70998f0eb8ebceR1456-R1599
  2. Statistics
    1. Fix 313f535#diff-8a8ae07582c9d433ec8c2e5c4310ff8901e604f4965c5b90a49117ad46c47595R658-R659
    2. Regression tests - https://github.com/apify/proxy-chain/pull/602/files#diff-d14cbfb50ed1cad7db5f4fef6a6076961b7cc9be980a3be06a70998f0eb8ebceR830-R871

Also, what was wrong with the statistics?

Don't remember right now for 100%, but few tests failed for https scenarios. I believe there was some issues related with undefined values for statistics.

Copy link
Member

@jirimoravcik jirimoravcik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, had a few comments.
In addition to that, could you please:

  1. Bump the package version
  2. Describe all the new things in README.md (which serves as the primary user-facing documentation)
    Thanks

@bliuchak
Copy link
Contributor Author

@jirimoravcik @lewis-wow Guys, I've added main logic for TLS overhead bytes. Please take a look 🙏

Gonna polish tests in meantime and push 'em ASAP.


// Check once per connection for socket._parent availability.
if (this.serverType === 'https') {
const rawSocket = socket._parent;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth asking in https://github.com/nodejs/node about the safety of using this private property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I know the answer :) as long as unit tests cover the eventuality this will get removed, I think we're good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not just about the presence of the _parent property, but about the overall usability for stats tracking. I mean, if you consider yourself an expert on the TLS implementation in Node.js, that’s great. :)

Copy link
Contributor Author

@bliuchak bliuchak Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi: nodejs/help#5111 🤞

Copy link
Contributor

@lewis-wow lewis-wow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Have nothing to add.

Copy link
Member

@jirimoravcik jirimoravcik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few more points for discussion

Comment on lines +231 to +249
if (options.serverType === 'https') {
if (!options.httpsOptions) {
throw new Error('httpsOptions is required when serverType is "https"');
}

// Apply secure TLS defaults (user options can override)
// This prevents users from accidentally configuring insecure TLS settings
const secureDefaults: https.ServerOptions = {
...HTTPS_DEFAULTS,
honorCipherOrder: true, // Server chooses cipher (prevents downgrade attacks)
...options.httpsOptions, // User options override defaults
};

this.server = https.createServer(secureDefaults);
this.serverType = 'https';
} else {
this.server = http.createServer();
this.serverType = 'http';
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe validate if options.serverType is one of http, https? It would make it consistent with the type. I'd also set it to http by default in the constructor parameter. That way you could just do this.serverType = options.serverType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Gonna add this.

socket.proxyChainErrorHandled = true;

// Log errors only if there are no user-provided error handlers
if (this.listenerCount('error') === 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was === 1 before, why is it === 0 now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly for previous condition this.listenerCount('error') === 1 (which count server-listeners, not socket) app log error only when there is one server handler. Right?

If I'm correct, then in this case the error will be handled by that one server-handler. For us it might be useful to log here when there are no other server-handlers this.listenerCount('error') === 0.

What do you think?

Comment on lines +752 to 778
// Socket contains application bytes only.
let srcTxBytes = socket.bytesWritten ?? 0;
let srcRxBytes = socket.bytesRead ?? 0;

if (this.serverType === 'https' && socket.tlsOverheadAvailable) {
/* eslint no-underscore-dangle: ["error", { "allow": ["_parent"] }] */
// Access underlying raw socket to get total bytes (app + TLS overhead).
const rawSocket = socket._parent;
if (rawSocket && typeof rawSocket.bytesWritten === 'number' && typeof rawSocket.bytesRead === 'number') {
if (rawSocket.bytesWritten >= socket.bytesWritten && rawSocket.bytesRead >= socket.bytesRead) {
srcTxBytes = rawSocket.bytesWritten;
srcRxBytes = rawSocket.bytesRead;
} else {
// This should never happen, log for debugging.
this.log(connectionId, `Warning: TLS overhead count error.`);
}
}
}

const targetStats = getTargetStats(socket);

const result = {
srcTxBytes: socket.bytesWritten,
srcRxBytes: socket.bytesRead,
trgTxBytes: targetStats.bytesWritten,
trgRxBytes: targetStats.bytesRead,
return {
srcTxBytes, // HTTP: app only, HTTPS: total (app + TLS overhead)
srcRxBytes, // HTTP: app only, HTTPS: total (app + TLS overhead)
trgTxBytes: targetStats?.bytesWritten,
trgRxBytes: targetStats?.bytesRead,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea. Why don't we store the _parent socket in this.connections? Looking at the logic here in getConnectionsStats, you just fall back to the original socket for stats. That makes me question why would we ever want to use the TLS socket for connection tracking?
That gives me another idea, if you do this.server.on('connection') for HTTPS you should be able to reach the original Socket without using _parent, right?

Copy link
Contributor Author

@bliuchak bliuchak Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea. Why don't we store the _parent socket in this.connections? Looking at the logic here in getConnectionsStats, you just fall back to the original socket for stats. That makes me question why would we ever want to use the TLS socket for connection tracking?

tl;dr: It's possible, however it creates more isusues than it solves.

HTTPS servers create two separate JavaScript objects:

  • Raw socket (TCP layer) - created first
  • TLS socket (wraps raw) - created after handshake

Nodejs HTTP events (request, connect) always give us the TLS socket (for HTTPS server). This is the same object handlers receive and attach metadata to (proxyChainId, Symbols).

If we stored raw socket in connections:

  • Different object identity (raw vs TLS socket) results in all metadata lookups failing (proxyChainId undefined, Symbol not found)
  • Need to maintain raw to TLS mapping because of Symbol lookup in getConnectionStats() (this func will get raw socket from this.connections, but stats are tracked for TLS socket in handlers)
  • Event handlers are attached to raw socket in registerConnection() (close, timeout) and onConnection() (error). However nodejs for https will emit these events on TLS socket.

We store the socket that HTTP handlers actually use. Accessing _parent once for stats is simpler than complex socket bridging everywhere else.

That gives me another idea, if you do this.server.on('connection') for HTTPS you should be able to reach the original Socket without using _parent, right?

Yes, we can obtain raw socket directly via connection event, but we still need both sockets since TLS socket is used in handlers. In this case, instead of accessing tlsSocket._parent once for stats, we'll need to create and maintain bidirectional mapping between raw and TLS sockets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-core-services Issues with this label are in the ownership of the core services team. t-unblocking Issues with this label are in the ownership of the unblocking team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants