Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How Can We Update S3 Client after It Was Created? (Add Region to It) #5081

Closed
shirady opened this issue Aug 13, 2023 · 9 comments
Closed

How Can We Update S3 Client after It Was Created? (Add Region to It) #5081

shirady opened this issue Aug 13, 2023 · 9 comments
Assignees
Labels
documentation This is a problem with documentation. p2 This is a standard priority issue

Comments

@shirady
Copy link

shirady commented Aug 13, 2023

Describe the issue

Hello,

1) Client Region And Update Client
From reading other issues (some examples: #1802, #1807, #1845, #3469) I know that in the AWS SDK V3, we need to pass the region of the bucket the client will work on.
I thought about wrapping the commands with a "guess" of the region and then handling the error and taking the region from it (for example, when using listObjects with another region we get AuthorizationHeaderMalformed and have the region in the error).

But I don't see we have s3.config.update (s3 is the variable that holds the instance of S3). I know that we don't have the global update, but I'm looking for updating a specific instance.
I would add, that at this point in the code, I don't have the parameters that created this s3 so I just want to update it with the right region.
I don't mind creating a new one - but I need to find a way to extract the map of params from the existing client.

What do you suggest as a solution?

2) Detailed Breaking Changes
BTW, except for the upgrading notes, is there more detailed documentation about breaking changes between the 2 versions? (aws sdk v2 and aws sdk v3) because I keep finding out about changes in the trial-error process... for example, there is no mention in the notes about changes in the commands that we use to perform the actions.

Thank you!

Links

https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/s3/
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-s3/Class/S3/
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-smithy-client/Class/Client/
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/introduction/#getting-started

@shirady shirady added documentation This is a problem with documentation. needs-triage This issue or PR still needs to be triaged. labels Aug 13, 2023
@RanVaknin
Copy link
Contributor

Hi @shirady ,

While S3 operates on a global namespace, meaning that bucket names are unique across AWS regardless of region, the buckets themselves are created in specific regions. When you interact with the S3 API, you're required to make a request to an S3 regional endpoint, and these endpoints are region-based.

I would add, that at this point in the code, I don't have the parameters that created this s3 so I just want to update it with the right region.

As a best practice, it's recommended to configure your client with both credentials and the region ahead of time. This ensures predictability when your application makes API calls. So for example, if you are running the SDK code in a container, the container needs to inject the correct region and credentials ahead of the client creation, same goes for an EKS pod, or an AWS Lambda.

Regarding updating the region of an existing client:

const client = new S3Client({});
client.config.region = "us-east-1";

This is how to update the region post-instantiation. However, this approach is not recommended as it can lead to potential problems, especially if the client is shared across different parts of the application.

I don't mind creating a new one - but I need to find a way to extract the map of params from the existing client.

Creating a new client is not recommended. Unless your application makes requests to different regions ie:
Your app gets objects from buckets in us-east-1 and also puts objects into buckets in eu-west-2 - only then you'll create multiple instances of the S3 client:

const clientForUSEast1 = new S3Client({ region: "us-east-1" });
const clientForEUWest2 = new S3Client({ region: "eu-west-2" });

I want to emphasize, you should be creating your client once, in the global scope of your application, when you have the region and credentials. Avoid updating it after creation.

As far as breaking changes go, I personally don't think a change from:
s3.config.update to s3.config.region really constitutes a breaking change. The purpose of releasing a major version is to address bug fixes that would have been breaking changes in a minor version. Thus, upgrading to a new major version inherently means that there will be a change in behavior.

@trivikr created a codemod which is a tool that takes v2 code and converts it to v3. Please note this tool is in early stages of development and is not perfect.

Let me know if there's anything I can help with.
Thanks,
Ran~

@RanVaknin RanVaknin self-assigned this Aug 14, 2023
@RanVaknin RanVaknin added response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Aug 14, 2023
@shirady
Copy link
Author

shirady commented Aug 14, 2023

Hi @RanVaknin ,
Thank you for the detailed explanations!

I would like to respond in different subjects so I numbered it (hoping it is easier to track).

  1. I understand that the most recommended way is to get the credentials and region when I create the s3 client, but I have many clients that already shared the credentials, and getting the region just because we are upgrading the SDK version is not the best user experience.

However, this approach is not recommended as it can lead to potential problems, especially if the client is shared across different parts of the application.

I'm planning to use this s3.config.region in case I don't have the region (and after the first operation I would have it) - I would like to better understand the potential problems you mentioned if you can explain and give examples please (I'm using mainly s3 service).

  1. listBuckets (closed issue Which Region Is Needed for Listbuckets? #5082). I found out that if I use the explicit endpoint address "https://s3.amazonaws.com" and also the region I have an error The authorization header is malformed; the region 'us-west-2' is wrong; expecting 'us-east-1 I have the same error with using endpoint with the region inside "https://s3.us-west-1.amazonaws.com", but if I remove the endpoint address from the s3 client (but still have a region) it is fine. I don't understand it.

  2. About my question on breaking changes, I didn't mean for this config specifically but for the general use of actions. For example, getObject response was changed and converting it to a string as well as I learn from the issues. I just wondered if there is more detailed documentation that maybe I missed, but I guess not.

@RanVaknin
Copy link
Contributor

Hi @shirady ,

  1. Since you didnt share any code, its hard for me to advise you on how to refactor your code to fit the v3 style.
    Having multiple clients that share credentials can introduce complexities. This is true also in v2, as each client will maintain its own connections and sockets. When you create multiple clients you can quickly run into connection timeout issues as your Node.js application's connection pool is limited. This may not be apparent now, but as your application scales it can potentially lead to hard-to-debug problems. I highly suggest you spend some time to refactor your code while migrating. I'd be happy to advise you if you share an actual code snippet.

  2. It's hard for me to say because I don't see your actual code. My guess is that you are specifying an endpoint like https://s3.us-west-1.amazonaws.com but the bucket you specified in your request is in a different region like us-east-1.
    This is a service side error and not an SDK version specific. The behavior in v2 would have been the same.

The power of the SDK client is that you almost never need to configure the endpoint parameter yourself. You specify the client (s3) and the region (us-east-1) and the sdk client will take care of building the correct endpoint for you.
When you specify both the endpoint and the region, the SDK will use the endpoint you provided and ignore the region. They are designed to mutually exclusive.

Typically, there is no need to supply an endpoint parameter unless you explicitly want to route your request to a specific service regional endpoint, or when working with an S3 clone for local development like LocalStack or DigitalOcean Spaces where you'd route your request to something like a local IP address.

  1. In v3, the getObject method was revised to return a readable stream for object bodies, aligning with modern Node.js best practices for efficient data handling. This change considerably improves memory efficiency, especially when dealing with large S3 objects, as streams allow for data to be sent sequentially, instead of holding large objects in-memory.

Most of the current members of the JS SDK team have started working on this team after v3 launched. While flip-flopping between the two versions is easy for us, it can prove challenging for developers who worked with v2 for years and are only making the change now. We are slowly learning about the challenges of migrating as more people move to v3.

I have already added some backlog items for us to add to the migrating guide. If you have anything in mind that was missing please let me know so I can discuss it with the team.

Thanks again,
Ran~

@RanVaknin RanVaknin added response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. and removed response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. labels Aug 14, 2023
@shirady
Copy link
Author

shirady commented Aug 15, 2023

Hi @RanVaknin ,

  1. Thank you for the information, I was not aware of this (if you can attach links for more information about it it will be great). I have a place in the code that creates an s3 client for the same credentials, and now I understand that it is better to re-design it in the future (regardless of the migration).

  2. Here attached a script so you can easily run it on your machine. you will need to set environment variables as described (ACK, SACK), then you can run it by node path-to-file.
    Run it twice:
    1 - (no change) and get FAIL check_list_buckets got the error AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-west-2' is wrong; expecting 'us-east-1'.
    2 - Remove the endpoint address (will be fine, you'll see all the buckets and the printing SUCCESS check_list_buckets).

'use strict';

const { S3 } = require('@aws-sdk/client-s3');
const https = require('https');
const { NodeHttpHandler } = require('@aws-sdk/node-http-handler');

async function main() {
    const s3_client_params = {
        endpoint: 'https://s3.amazonaws.com',
        region: 'us-west-2',
        credentials: {
            accessKeyId: process.env.ACK,
            secretAccessKey: process.env.SACK,
        },
        forcePathStyle: true,
        signatureVersion: 'v4',
        requestHandler: new NodeHttpHandler({
            httpsAgent: new https.Agent({
                keepAlive: true,
                rejectUnauthorized: false,
            })
        }),
    };

    try {
        const s3 = new S3(s3_client_params);
        const res = await s3.listBuckets({});
        console.log(res);
        console.log('SUCCESS check_list_buckets');
    } catch (err) {
        console.error('FAIL check_list_buckets got error', err);
    }
}

main();

I'm aware that it searches for the region in different places (here), but mentioning it explicitly in the s3 client is first in precedence (in my aws config ~/.aws/config (MacOS) is 'us-wast-1'):

If a Region is passed to a client class constructor, that Region is used.

  1. endpoint address

The power of the SDK client is that you almost never need to configure the endpoint parameter yourself. You specify the client (s3) and the region (us-east-1) and the sdk client will take care of building the correct endpoint for you.

In the code we set the endpoint, we have cases where the endpoint is not AWS, but I think that in AWS cases I will remove it, thank you. In cases where the endpoint is not AWS, why are we still required to add the region?

  1. migration guide

I have already added some backlog items for us to add to the migrating guide. If you have anything in mind that was missing please let me know so I can discuss it with the team.

Where can I see it? I can have a look at it and add suggestions related to s3.

@RanVaknin
Copy link
Contributor

RanVaknin commented Aug 15, 2023

Hi @shirady ,

You are seeing this error because when using the global S3 endpoint s3.amazonaws.com for ListBuckets, the signature in the authorization header should be made with the us-east-1 region, irrespective of the client's regional configuration.

While regional endpoints align with the specified region in the SDK and won't cause a mismatch, the global endpoint specifically anticipates the 'us-east-1' signature. Thus, specifying a region like us-west-2 with the global endpoint results in an AuthorizationHeaderMalformed error. The solution is to configure the client's region as 'us-east-1' when using the global endpoint, or remove the endpoint field all together and let the SDK form the regional endpoint implicitly based on the supplied region.

I have already checked the S3 docs and this is not mentioned explicitly, but can be inferred from the the error:

FAIL check_list_buckets got error AuthorizationHeaderMalformed: The authorization header is malformed; the region 'us-west-2' is wrong; expecting 'us-east-1'

I converted this code to v2 and can see that in v2 the requests fails the same way, but the SDK v2 redirects after failure. My guess is that it uses the returned region field in the response body.
This can be viewed by the following:

'use strict';

const AWS = require('aws-sdk');
const https = require('https');

AWS.config.update({
    endpoint: 'https://s3.amazonaws.com',
    region: 'us-west-2',
    s3ForcePathStyle: true,
    signatureVersion: 'v4',
    httpOptions: {
        agent: new https.Agent({
            keepAlive: true,
            rejectUnauthorized: false,
        })
    }
});

const s3 = new AWS.S3();

var request = s3.listBuckets();

request.on('httpData', function(statusCode, headers) {
    console.log('----- HEADERS -----');
    console.log('Status Code:', statusCode);
    for (var header in headers) {
        console.log(header, ':', headers[header]);
    }
});

request.on('httpData', function(chunk) {
    console.log('----- DATA CHUNK -----');
    console.log(chunk.toString());
});

request.send();

Will result in an initial failure (400 error) followed by a redirect to the expected region us-east-1

----- HEADERS -----
Status Code: 400
x-amz-request-id : REDACTED
x-amz-id-2 : REDACTED
content-type : application/xml
transfer-encoding : chunked
date : Tue, 15 Aug 2023 19:15:51 GMT
server : AmazonS3
connection : close
----- DATA CHUNK -----
<Error>
  <Code>AuthorizationHeaderMalformed</Code>
  <Message>The authorization header is malformed; the region 'us-west-2' is wrong; expecting 'us-east-1'</Message>
  <Region>us-east-1</Region>
  <RequestId>QEAN8Z5TPK1S7KHV</RequestId>
  <HostId>REDACTED</HostId>
</Error>
----- HEADERS -----
Status Code: 200
x-amz-id-2 : REDACTED
x-amz-request-id : REDACTED
date : Tue, 15 Aug 2023 19:15:53 GMT
content-type : application/xml
transfer-encoding : chunked
server : AmazonS3
----- DATA CHUNK -----
<?xml version="1.0" encoding="UTF-8"?>
<ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">

rest of successful response.....

I don't know why the v2 SDK does this redirect implicitly, but my guess is this was a customization written after feature request. In v3 we ̶o̶n̶l̶y̶ ̶r̶e̶d̶i̶r̶e̶c̶t̶ ̶o̶n̶ ̶3̶0̶1̶ ̶e̶r̶r̶o̶r̶s̶ ̶(̶r̶e̶d̶i̶r̶e̶c̶t̶s̶)̶ ̶a̶n̶d̶ ̶n̶o̶t̶ ̶4̶0̶0̶ do not follow redirects.
The SDKs adopted new philosophy of moving away from hand writing customizations exactly for these reasons of not being able to maintain them and creating feature parity gaps between major versions.

So now that we know why there's a change in behavior, we can address it.
Going forward, you have two options:

  1. remove the endpoint parameter entirely, supply your desired region and let the SDK build the endpoint for you (recommended)
  2. When making requests to the global endpoint you need to use us-east-1.

Regarding refactoring your code and documentation. It has come to our attention only recently that this is a wide spread issue of initializing the client in a function scope and Im personally working on documentation to address this. So you are not wrong to assume that the current structure is correct, because there is no official documentation that says otherwise.

This is how I would re-write your code:

'use strict';

const { S3 } = require('@aws-sdk/client-s3');
const https = require('https');
const { NodeHttpHandler } = require('@aws-sdk/node-http-handler');

const s3_client_params = {
        // endpoint: 'https://s3.amazonaws.com', remove this as per suggestions from last section
        region: 'us-west-2',
        credentials: {
            accessKeyId: process.env.ACK,
            secretAccessKey: process.env.SACK,
        },
        forcePathStyle: true,
        // signatureVersion: 'v4',   you can remove this as JS SDK v3 uses sigv4 as a standard.
        requestHandler: new NodeHttpHandler({
            httpsAgent: new https.Agent({
                keepAlive: true,
                rejectUnauthorized: false,
            })
        }),
};

const s3 = new S3(s3_client_params);

async function listBucketsForAccount() {
    try {
        const res = await s3.listBuckets({});
        console.log(res);
        console.log('SUCCESS check_list_buckets');
    } catch (err) {
        console.error('FAIL check_list_buckets got error', err);
    }
}

listBucketsForAccount();

You can find more specific upgrade in UPGRADING.md
If you have any specific issues you were struggling with, feel free to open a separate ticket, or even raise a PR your suggestions to the documentation.

I hope this clarifies things.
All the best,
Ran~

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Aug 16, 2023
@RanVaknin RanVaknin added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Aug 16, 2023
@shirady
Copy link
Author

shirady commented Aug 16, 2023

Hi Ran,
In issue #1807 I learned that in AWS SDK V2 the client is global.
Thank you for the code snippet that shows the headers and the redirect, it really helps.

Regarding the suggested options -

remove the endpoint parameter entirely, supply your desired region and let the SDK build the endpoint for you (recommended)
When making requests to the global endpoint you need to use us-east-1.

I understand the best option is supplying the region without using the endpoint, but I still want to understand the second option. As said before, we create the s3 client once for specific credentials and we should set the region to the bucket that we use (if we use 2 buckets in different regions we should create 2 clients). If I'll use the global aws endpoint (https://s3.amazonaws.com) and the region 'us-east-1' listBuckets would work, but in the next action I'll have that involves a bucket as a required parameter it would fail (for example headObject, ListObjects, etc.). Hence I am not sure it is a valid option (solves only a client that operates listBuckets).

In the code we set the endpoint, we have cases where the endpoint is not AWS, but I think that in AWS cases I will remove it, thank you. In cases where the endpoint is not AWS, why are we still required to add the region?

I still didn't understand why do need to set the region for endpoints that are not aws, if you can explain it, please.

Another question - I looked at the link of UPGRADING.md again since you attached it and saw that the library of requestHandler changed a couple of weeks ago (#4990).

- const { NodeHttpHandler } = require('@aws-sdk/node-http-handler');
+ const { NodeHttpHandler } = require('@smithy/node-http-handler');

How do you communicate such changes? I was not aware of it until now.

@RanVaknin
Copy link
Contributor

RanVaknin commented Aug 16, 2023

Hi @shirady,

I still didn't understand why do need to set the region for endpoints that are not aws, if you can explain it, please.

I dont see a real world scenario where the same JS file will call a mock s3 endpoint and also the actual s3 service.
If your app needs to test something on a mock server, you should create a separate JS file, with a separate instance of the S3 client just to interact with your mock S3 server.
With regards to handling the region with a custom endpoint, that is up to the mock server to decide how to handle it.

If I'll use the global aws endpoint (https://s3.amazonaws.com/) and the region 'us-east-1' listBuckets would work, but in the next action I'll have that involves a bucket as a required parameter it would fail (for example headObject, ListObjects, etc.). Hence I am not sure it is a valid option (solves only a client that operates listBuckets).

This is because listBuckets is designed to be a global operation - listing all the buckets in all regions (with the signature caveat that has to be specified in us-east-1) the issue you linked about the global s3 client is aimed to solve those redirects for you and is currently being worked on.

But as I mentioned relying on the global endpoint is not recommended. The recommended solution is to let the SDK resolve the endpoint for you using the regional endpoints.

You should move to regional endpoint because:

  • Regional endpoints are faster, you are not reliant on the S3 service to direct your request to the correct partition.
  • The behavior you are relying on (redirecting implicitly to the correct region upon error) is expensive both in terms of performance and your AWS billing $. Whenever you are making a request that the SDK is implicitly rerouting, you are effectively paying for two requests instead of one.
  • For S3 buckets in Regions launched after March 20, 2019, the DNS server doesn't route your request directly to the AWS Region where your bucket resides. It returns an HTTP 400 Bad Request error instead. For more information, see Making requests.

How do you communicate such changes? I was not aware of it until now.

When you go to upgrade your dependencies you'll get an explicit warning telling you you need to upgrade. For the time being this is just a warning and we have internal re-exports making this transparent, but in the future this will be changed into an error and an explicit change will be required:

$ npm i @aws-sdk/node-http-handler
npm WARN deprecated @aws-sdk/node-http-handler@3.374.0: This package has moved to @smithy/node-http-handler

You can learn more about this change here

Thanks again,
Ran~

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Aug 17, 2023
@shirady
Copy link
Author

shirady commented Aug 17, 2023

Hi @RanVaknin,

I don't see a real-world scenario where the same JS file will call a mock s3 endpoint and also the actual s3 service.
If your app needs to test something on a mock server, you should create a separate JS file, with a separate instance of the S3 client just to interact with your mock S3 server.

I have real endpoints that are s3-compatible (not a testing mock) and I use the same code to create s3 clients for them and AWS. Anyway, in case I don't send a region I have the error of Error: Region is missing, but it is not AWS (I can easily solve it by passing some string, but I don't like it).

I have a place in the code that creates an s3 client for the same credentials, and now I understand that it is better to re-design it in the future (regardless of the migration).

Using destroy() method in the s3client in those cases might be a temporary solution (than not handling it at the first phase).

Thank you!

@github-actions
Copy link

github-actions bot commented Sep 1, 2023

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation This is a problem with documentation. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants