Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow support for paths other than s3:// #558

Closed
SeanBarry opened this issue Feb 14, 2021 · 7 comments
Closed

Allow support for paths other than s3:// #558

SeanBarry opened this issue Feb 14, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@SeanBarry
Copy link

SeanBarry commented Feb 14, 2021

aws-data-wrangler version: 2.4.0 (with no modifications)

As part of my local development and testing, and also my CI development and testing, I'm using Localstack to mock AWS s3. This allows me to simulate putting, listing and getting objects from s3 for example.

My codebase is a mix of Node.js and Python. The Node.js code that is interacting with localstack works fine, as I can specify an endpoint when I initiate the s3 client. This endpoint is an env var, so locally and in CI it points to Localstack, but obviously in prod/dev clusters it points to s3://.

Unfortunately, it seems there's no way to override the s3:// path in AWS data-wrangler.

For example, when I call wr.s3.read_parquet with the path pointing to my Localstack s3 bucket, I get the following error:

raise exceptions.InvalidArgumentValue(f"'{path}' is not a valid path. It MUST start with 's3://'")
awswrangler.exceptions.InvalidArgumentValue: 'http://localhost:4566/<redacted>' is not a valid path. It MUST start with 's3://'

I've had a quick check of the src code of data-wrangler to see if there's an override, but haven't found one. The util that throws this error: parse_path() strictly checks the path begins with s3:// and doesn't account for any override.

Describe the solution you'd like
It would be incredibly useful if this check either didn't exist or if there was a way to pass an override when creating the datawrangler client. This way I can continue to reliably mock AWS infrastructure locally.

Reproduce

 df = wr.s3.read_parquet(
        path="http://localhost:4566/my-bucket/",
        path_suffix="data.parquet"
)

> raise exceptions.InvalidArgumentValue(f"'{path}' is not a valid path. It MUST start with 's3://'")
awswrangler.exceptions.InvalidArgumentValue: 'http://localhost:4566/my-bucket/' is not a valid path. It MUST start with 's3://'
@igorborgest
Copy link
Contributor

Hi @SeanBarry, thanks for reaching out.

Did you tested our support for custom endpoints through global configurations?

Example:

wr.config.s3_endpoint_url = YOUR_ENDPOINT

OR you can define it through the environment variables:

export WR_S3_ENDPOINT_URL=YOUR_ENDPOINT

All endpoints available are:

image

Some resources:

@igorborgest igorborgest self-assigned this Feb 15, 2021
@igorborgest igorborgest added the WIP Work in progress label Feb 15, 2021
@SeanBarry
Copy link
Author

Hi Igor, thanks for the reply. I can confirm that neither of the following options work - the same util parse_path is executed against them which explicitly checks for s3:// in the URL:

wr.config.s3_endpoint_url = YOUR_ENDPOINT
export WR_S3_ENDPOINT_URL=YOUR_ENDPOINT

@igorborgest
Copy link
Contributor

igorborgest commented Feb 15, 2021

The idea would be to use a regular s3 path pattern instead of http://localhost:4566/my-bucket/.

My suggestion is to configure the ENDPOINT with your localstack url and then use your mocked bucket the same way as a normal bucket s3://my-bucket/.

@igorborgest
Copy link
Contributor

Closing due the lack of interactions.

@igorborgest igorborgest removed the WIP Work in progress label Feb 24, 2021
@Ritish-Madan
Copy link

Ritish-Madan commented Jun 17, 2022

Hi @igorborgest I am using the endpoint like s3a://, it still gives me the error due to the explicit check for s3://

@samuelefiorini
Copy link

Hi @SeanBarry, thanks for reaching out.

Did you tested our support for custom endpoints through global configurations?

Example:

wr.config.s3_endpoint_url = YOUR_ENDPOINT

OR you can define it through the environment variables:

export WR_S3_ENDPOINT_URL=YOUR_ENDPOINT

All endpoints available are:

image

Some resources:

Hi @igorborgest, it looks like timestream endpoint is not currently supported. Any plans to add it in the near future?

Cheers

@samuelefiorini
Copy link

Hi @SeanBarry, thanks for reaching out.
Did you tested our support for custom endpoints through global configurations?
Example:

wr.config.s3_endpoint_url = YOUR_ENDPOINT

OR you can define it through the environment variables:

export WR_S3_ENDPOINT_URL=YOUR_ENDPOINT

All endpoints available are:
image
Some resources:

Hi @igorborgest, it looks like timestream endpoint is not currently supported. Any plans to add it in the near future?

Cheers

Meanwhile a dedicated issue has been opened #1414

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants