Skip to content

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jun 26, 2023

Part of the "happy eyeballs" recommendations.

Note I attempted to preserve the exception throwing behavior we had previously. i.e. if I'm the last task running and errors, I'll close the Channel{TCPSocket} with my exception which will then propagate up. We could possibly accumulate all the exceptions into a CompositeException, but meh, I'm not sure if that adds much value.

In addition to the new parallel connecting, we're also adjusting how the connect_timeout is implemented. Instead of only applying to the tcp connection, we now wrap the entire newconnection call, which means any ssl handshake timing will also count towards the timeout. We saw a case at RelationalAI where we had a reasonable connect_timeout (10s), yet then saw long requests (>50s) where all the time was reported in the connection layer. It would seem to suggest that the ssl layer is somehow getting stuck or slow, so the proposal here is that if the ssl operations also count, then we'll more aggressively cancel/restart the connection process in the case of slow ssl handshaking.

Part of the "happy eyeballs" recommendations.

Note I attempted to preserve the exception throwing behavior we had
previously. i.e. if I'm the last task running and errors, I'll close
the `Channel{TCPSocket}` with my exception which will then propagate up.
We could possibly accumulate all the exceptions into a CompositeException,
but meh, I'm not sure if that adds much value.

In addition to the new parallel connecting, we're also adjusting how
the `connect_timeout` is implemented. Instead of only applying to the
tcp connection, we now wrap the entire `newconnection` call, which means
any ssl handshake timing will also count towards the timeout. We saw a
case at RelationalAI where we had a reasonable connect_timeout (10s),
yet then saw long requests (>50s) where all the time was reported in
the connection layer. It would seem to suggest that the ssl layer is
somehow getting stuck or slow, so the proposal here is that if the ssl
operations also count, then we'll more aggressively cancel/restart
the connection process in the case of slow ssl handshaking.
@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2023

Codecov Report

Merging #1068 (9f492f4) into master (ba68375) will increase coverage by 0.13%.
The diff coverage is 100.00%.

❗ Current head 9f492f4 differs from pull request most recent head 11f41af. Consider uploading reports for the commit 11f41af to get more accurate results

@@            Coverage Diff             @@
##           master    #1068      +/-   ##
==========================================
+ Coverage   82.28%   82.42%   +0.13%     
==========================================
  Files          32       32              
  Lines        3049     3049              
==========================================
+ Hits         2509     2513       +4     
+ Misses        540      536       -4     
Impacted Files Coverage Δ
src/Connections.jl 85.55% <100.00%> (+1.21%) ⬆️
src/clientlayers/ConnectionRequest.jl 77.00% <100.00%> (+0.95%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@Drvi Drvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of notes, but I think the overall idea is great:+1:. There are also some CI failures we might need to look into.

@quinnj quinnj force-pushed the jq-refactor-connect branch from 9f492f4 to 11f41af Compare June 29, 2023 20:07
@quinnj quinnj merged commit bb12854 into master Jun 29, 2023
@quinnj quinnj deleted the jq-refactor-connect branch June 29, 2023 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants