Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Apr 14, 2024

Which issue does this PR close?

closes #10072

Rationale for this change

See #10072

Basically I want to be able to do:

-- Create external table
CREATE EXTERNAL TABLE sample
STORED AS PARQUET
OPTIONS(
    'aws.access_key_id' 'A',
    'aws.secret_access_key' 'B',
    'aws.endpoint' 'http://localhost:8080',
    'aws.allow_http' 'true',
)
LOCATION 's3://sprox/sample.parquet';

What changes are included in this PR?

Are these changes tested?

  1. Support aws.endpoint as a config option
  2. Support http connections
  3. Add a better error message

Are there any user-facing changes?

Can use http endpoints now

@alamb alamb force-pushed the alamb/local_s3_again branch from e96add1 to 733b593 Compare April 14, 2024 12:14
secret_access_key,
session_token,
region,
endpoint,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug was that endpoint was not read here so I changes this to destructure the object so the compiler can check we don't forget fields

}
_ => {
return internal_err!("Config value \"{}\" not found on AwsOptions", rem);
return config_err!("Config value \"{}\" not found on AwsOptions", rem);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by cleanup -- this is not an internal error

@alamb alamb changed the title Support http s3 endpoints in datafusion-cli Support http s3 endpoints in datafusion-cli with options Apr 14, 2024
@alamb alamb reopened this Apr 15, 2024
@alamb
Copy link
Contributor Author

alamb commented Apr 15, 2024

Whoops -- closed by accident

pub region: Option<String>,
/// OSS or COS Endpoint
pub endpoint: Option<String>,
/// Allow HTTP (otherwise will always use https)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I found the reworked configuration option system that @metesynnada added in #9382 really nice to work with ❤️

@alamb alamb changed the title Support http s3 endpoints in datafusion-cli with options Support http s3 endpoints in datafusion-cli via CREATE EXTERNAL TABLE Apr 15, 2024
@alamb alamb requested a review from metesynnada April 15, 2024 11:12
Copy link
Contributor

@metesynnada metesynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @alamb

.await
.unwrap_err();

assert_eq!(err.to_string(), "Invalid or Unsupported Configuration: Invalid endpoint: http://endpoint33. HTTP is not allowed for S3 endpoints. To allow HTTP, set 'aws.allow_http' to true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also test with aws.allow_http enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I added that test below (demonstrate it doesn't error)

Comment on lines +93 to +97
return config_err!(
"Invalid endpoint: {endpoint}. \
HTTP is not allowed for S3 endpoints. \
To allow HTTP, set 'aws.allow_http' to true"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb merged commit 417b928 into apache:main Apr 16, 2024
@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2024

Thanks everyone for the reviews ❤️

Omega359 pushed a commit to Omega359/arrow-datafusion that referenced this pull request Apr 16, 2024
@alamb alamb deleted the alamb/local_s3_again branch April 19, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support connecting to local s3 object stores in datafusion-cli

5 participants