Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Store Runtime tcp_keep_alive not working as expected #4021

Closed
gpiotti opened this issue Feb 16, 2024 · 4 comments
Closed

Feature Store Runtime tcp_keep_alive not working as expected #4021

gpiotti opened this issue Feb 16, 2024 · 4 comments
Assignees
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback.

Comments

@gpiotti
Copy link

gpiotti commented Feb 16, 2024

Describe the bug

When utilizing the feature store client's put_record method, the Feature Store runtime client appears to terminate connections after 60 seconds, despite the tcp_keep_alive setting being enabled. Consequently, this behavior results in the need to reopen connections for each put_record invocation if the time gap between calls exceeds 60 seconds. This behavior contributes to high latencies, often exceeding 200 milliseconds.

Expected Behavior

The Feature Store runtime client should maintain connections for longer durations, as specified by the tcp_keep_alive setting, to prevent frequent re-opening of connections and reduce latency.

Current Behavior

Connections are terminated after 60 seconds, necessitating the reopening of connections for each put_record invocation beyond this timeframe, resulting in high latencies.

Reproduction Steps

from boto3 import Session
from botocore.config import Config
import logging

logging.basicConfig(level=logging.DEBUG)

session = Session()
Config(
            tcp_keepalive=True,
        )
client = session.client(
    "sagemaker-featurestore-runtime", config=config
)
client.put_record(**put_arguments)

Possible Solution

No response

Additional Information/Context

when waiting more than 60 seconds, the logging states:

DEBUG:urllib3.connectionpool:Resetting dropped connection: featurestore-runtime.sagemaker.us-east-1.amazonaws.com

SDK version used

boto3==1.33.13

Environment details (OS name and version, etc.)

maxOs 13.4.1

@gpiotti gpiotti added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Feb 16, 2024
@tim-finnigan tim-finnigan self-assigned this May 20, 2024
@tim-finnigan
Copy link
Contributor

Hi @gpiotti thanks for reaching out and your patience here. I think the parameter you want to use here is connect_timeout which is documented here: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html

connect_timeout (float or int) – The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds.

Can you try increasing that value when running the put_record command?

If still seeing an issue, please share your debug logs (with sensitive info redacted) by adding boto3.set_stream_logger('') to your script for us to investigate this further.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels May 20, 2024
@gpiotti
Copy link
Author

gpiotti commented May 21, 2024

hi @tim-finnigan thanks for your suggestion, I've just tried with no luck, the connection is still being dropped and recreated after exactly 60 seconds. Here are the redacted logs, let me know if you need any further data. This logs correspond after waiting 60 seconds:

  1. doing an initial put_record
  2. waiting 60 seconds

if I do not wait 60 seconds then the

Resetting dropped connection: featurestore-runtime.sagemaker.us-east-1.amazonaws.com
part doesn't appear


2024-05-21 10:24:52.125 | hooks.py:238:_emit | Event before-parameter-build.sagemaker-featurestore-runtime.PutRecord: calling handler <function generate_idempotent_uuid at 0x158bc3740>
2024-05-21 10:24:52.126 | regions.py:498:construct_endpoint | Calling endpoint provider with parameters: {'Region': 'us-east-1', 'UseDualStack': False, 'UseFIPS': False}
2024-05-21 10:24:52.126 | regions.py:513:construct_endpoint | Endpoint provider result: https://featurestore-runtime.sagemaker.us-east-1.amazonaws.com
2024-05-21 10:24:52.130 | hooks.py:238:_emit | Event before-call.sagemaker-featurestore-runtime.PutRecord: calling handler <function add_recursion_detection_header at 0x158bc34c0>
2024-05-21 10:24:52.130 | hooks.py:238:_emit | Event before-call.sagemaker-featurestore-runtime.PutRecord: calling handler <function inject_api_version_header_if_needed at 0x158bf5260>
2024-05-21 10:24:52.131 | endpoint.py:114:make_request | Making request for OperationModel(name=PutRecord) with params: {'url_path': '/FeatureGroup/fg_name', 'query_string': {}, 'method': 'PUT', 'headers': {'Content-Type': 'application/json', 'User-Agent': '', 'body': b'', 'url': 'https://featurestore-runtime.sagemaker.us-east-1.amazonaws.com/FeatureGroup/fg_name', 'context': {'client_region': '', 'client_config': <botocore.config.Config object at 0x15e97c850>, 'has_streaming_input': False, 'auth_type': None}}
2024-05-21 10:24:52.132 | hooks.py:238:_emit | Event request-created.sagemaker-featurestore-runtime.PutRecord: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x15ee0fad0>>
2024-05-21 10:24:52.133 | hooks.py:238:_emit | Event choose-signer.sagemaker-featurestore-runtime.PutRecord: calling handler <function set_operation_specific_signer at 0x158bc3600>
2024-05-21 10:24:52.135 | auth.py:425:add_auth | Calculating signature using v4 auth.
2024-05-21 10:24:52.135 | auth.py:426:add_auth | CanonicalRequest:
PUT
/FeatureGroup/fg_name

content-type:application/json
host:featurestore-runtime.sagemaker.us-east-1.amazonaws.com

2024-05-21 10:24:52.141 | hooks.py:238:_emit | Event request-created.sagemaker-featurestore-runtime.PutRecord: calling handler <function add_retry_headers at 0x158bf59e0>
2024-05-21 10:24:52.142 | endpoint.py:265:_do_get_response | Sending http request: <AWSPreparedRequest stream_output=False, method=PUT, url=https://featurestore-runtime.sagemaker.us-east-1.amazonaws.com/FeatureGroup/fg_name, headers={'Content-Type': b'application/json', 'User-Agent': b'redacted', 'X-Amz-Date': b'20240521T132452Z',  amz-sdk-request': b'attempt=1', 'Content-Length': '599'}>
2024-05-21 10:24:52.143 | httpsession.py:97:get_cert_path | Certificate path: 'redacted'
2024-05-21 10:24:52.144 | connectionpool.py:293:_get_conn | Resetting dropped connection: featurestore-runtime.sagemaker.us-east-1.amazonaws.com
2024-05-21 10:24:53.201 | connectionpool.py:547:_make_request | https://featurestore-runtime.sagemaker.us-east-1.amazonaws.com:443 "PUT /FeatureGroup/fg_name HTTP/1.1" 200 0
2024-05-21 10:24:53.204 | parsers.py:240:parse | Response headers: {'x-amzn-RequestId': '', 'Content-Type': 'application/json', 'Content-Length': '0', 'Date': 'Tue, 21 May 2024 13:24:53 GMT'}
2024-05-21 10:24:53.205 | parsers.py:241:parse | Response body:
b''
2024-05-21 10:24:53.206 | hooks.py:238:_emit | Event needs-retry.sagemaker-featurestore-runtime.PutRecord: calling handler <botocore.retryhandler.RetryHandler object at 0x15ee1bfd0>
2024-05-21 10:24:53.208 | retryhandler.py:211:__call__ | No retry needed.

@tim-finnigan
Copy link
Contributor

Thanks for following up — after searching for related issues internally, I found one where the service team mentioned that this limit is imposed on their side and that this is expected behavior. I want to reach out to the SageMaker Feature Store team regarding this issue, to see if they could increase that 60 second limit or at least document the limitation.

I created a new issue for this (aws/aws-sdk#752) in or cross-SDK repository since APIs like this are used across SDKs. Please refer to that issue for updates going forward, and feel free to add any additional comments there.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue response-requested Waiting on additional information or feedback.
Projects
None yet
Development

No branches or pull requests

2 participants