Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New URL scheme for S3 VPCE endpoints #60021

Closed
lesandie opened this issue Feb 15, 2024 · 2 comments · Fixed by #62208
Closed

New URL scheme for S3 VPCE endpoints #60021

lesandie opened this issue Feb 15, 2024 · 2 comments · Fixed by #62208
Labels
easy task Good for first contributors feature

Comments

@lesandie
Copy link
Contributor

Use case

Using a AWS S3 Endpoint Interface that runs inside a VPCE is a way to access S3 buckets without going through the public internet. This is useful for security and performance reasons.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html

The endpoint provides a special URL scheme:
https://bucket.vpce-xxxxxxxxxxxxxxxxx-xxxxxx.s3.us-east-1.vpce.amazonaws.com and problem is that Clickhouse fails to use the private endpoint, but works with public ones, like https://s3.us-east-1.amazonaws.com.

Using url_schemes_mappers won't work because the URL scheme uses bucket.vpce-xxxxxxxxxxxxxxxxx-xxxxxx as the bucket identifier and clickhouse expects a bucket name without .:

static const RE2 virtual_hosted_style_pattern(R"((.+)\.(s3|cos|obs|oss|eos)([.\-][a-z0-9\-.:]+))");

Describe the solution you'd like

Support a new URL scheme for S3 VPCE endpoints.

Describe alternatives you've considered

Using S3Proxy and Global Proxy settings

#51749

Additional context

Stack trace:

2024.02.14 12:26:42.169692 [ 26483 ] <Debug> executeQuery: (from 127.0.0.1:47206) SELECT name, count() AS c FROM s3('https://bucket.vpce-xxxxxxxxxxxxx-xxxxxx.s3.us-east-1.vpce.amazonaws.com/stagecopy/20231018.csv', 'CSVWithNames') GROUP BY name ORDER BY c DESC LIMIT 10 (stage: Complete)
2024.02.14 12:26:42.169832 [ 26483 ] <Trace> ContextAccess (default): Access granted: CREATE TEMPORARY TABLE, S3 ON *.*
2024.02.14 12:26:42.169889 [ 26483 ] <Warning> AwsAuthSTSAssumeRoleWebIdentityCredentialsProvider: Token file must be specified to use STS AssumeRole web identity creds provider.
2024.02.14 12:26:42.169925 [ 26483 ] <Debug> S3CredentialsProviderChain: The environment variable value AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is 
2024.02.14 12:26:42.169940 [ 26483 ] <Debug> S3CredentialsProviderChain: The environment variable value AWS_CONTAINER_CREDENTIALS_FULL_URI is 
2024.02.14 12:26:42.169949 [ 26483 ] <Debug> S3CredentialsProviderChain: The environment variable value AWS_EC2_METADATA_DISABLED is 
2024.02.14 12:26:42.169971 [ 26483 ] <Information> AWSEC2InstanceProfileConfigLoader: Using IMDS endpoint: http://xxx.xxx.xxx.xxx
2024.02.14 12:26:42.169994 [ 26483 ] <Information> AWSClient: AWSHttpResourceClient: Creating AWSHttpResourceClient with max connections 2 and scheme http
2024.02.14 12:26:42.170011 [ 26483 ] <Information> AWSInstanceProfileCredentialsProvider: Creating Instance with injected EC2MetadataClient and refresh rate.
2024.02.14 12:26:42.170028 [ 26483 ] <Information> S3CredentialsProviderChain: Added EC2 metadata service credentials provider to the provider chain.
2024.02.14 12:26:42.170042 [ 26483 ] <Information> AWSClient: Aws::Config::AWSConfigFileProfileConfigLoader: Initializing config loader against fileName /nonexistent/.aws/credentials and using profilePrefix = 0
2024.02.14 12:26:42.170061 [ 26483 ] <Information> AWSClient: ProfileConfigFileAWSCredentialsProvider: Setting provider to read credentials from /nonexistent/.aws/credentials for credentials file and /nonexistent/.aws/config for the config file , for use with profile default
2024.02.14 12:26:42.170080 [ 26483 ] <Information> AWSInstanceProfileCredentialsProvider: Credentials have expired attempting to repull from EC2 Metadata Service.
2024.02.14 12:26:42.170104 [ 26483 ] <Trace> AWSEC2InstanceProfileConfigLoader: Calling EC2MetadataService to get token.
2024.02.14 12:26:42.170387 [ 26483 ] <Trace> HTTPSessionAdapter: Created HTTP(S) session with xxx.xxx.xxx.xxx:80 (xxx.xxx.xxx.xxx:80)
2024.02.14 12:26:42.170997 [ 26483 ] <Trace> HTTPSessionAdapter: Created HTTP(S) session with xxx.xxx.xxx.xxx:80 (xxx.xxx.xxx.xxx:80)
2024.02.14 12:26:42.171308 [ 26483 ] <Debug> AWSEC2InstanceProfileConfigLoader: Calling EC2MetadataService resource, /latest/meta-data/iam/security-credentials with token returned profile string 605-ClickhouseClientDev.
2024.02.14 12:26:42.171340 [ 26483 ] <Debug> AWSEC2InstanceProfileConfigLoader: Calling EC2MetadataService resource http://xxx.xxx.xxx.xxx/latest/meta-data/iam/security-credentials/605-ClickhouseClientDev with token.
2024.02.14 12:26:42.171457 [ 26483 ] <Trace> HTTPSessionAdapter: Created HTTP(S) session with xxx.xxx.xxx.xxx:80 (xxx.xxx.xxx.xxx:80)
2024.02.14 12:26:42.172257 [ 26483 ] <Trace> AWSEC2InstanceProfileConfigLoader: Successfully pulled credentials from EC2MetadataService with access key.
2024.02.14 12:26:42.172285 [ 26483 ] <Information> AWSClient: Aws::Config::AWSProfileConfigLoaderBase: Successfully reloaded configuration.
2024.02.14 12:26:42.173902 [ 26483 ] <Trace> S3Client: Provider type: AWS
2024.02.14 12:26:42.173914 [ 26483 ] <Trace> S3Client: API mode of the S3 client: AWS
2024.02.14 12:26:42.189379 [ 26483 ] <Trace> HTTPSessionAdapter: Created HTTP(S) session with s3.us-east-1.vpce.amazonaws.com:443 (xxx.xxx.xxx.xxx:443)
2024.02.14 12:26:42.205636 [ 26483 ] <Information> AWSClient: Response status: 404, Not Found
2024.02.14 12:26:42.205726 [ 26483 ] <Information> AWSClient: AWSErrorMarshaller: Encountered AWSError 'NoSuchBucket': The specified bucket does not exist
2024.02.14 12:26:42.205769 [ 26483 ] <Information> AWSClient: AWSXmlClient: HTTP response code: 404
Resolved remote host IP address: s3.us-east-1.vpce.amazonaws.com:443
Request ID: XXXXXXXXXXX
Exception name: NoSuchBucket
Error message: The specified bucket does not exist
7 response headers:
connection : close
content-type : application/xml
date : Wed, 14 Feb 2024 12:26:41 GMT
server : AmazonS3
transfer-encoding : chunked
x-amz-id-2 : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
x-amz-request-id : XXXXXXXXXXX
2024.02.14 12:26:42.206066 [ 26483 ] <Information> AWSClient: If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2024.02.14 12:26:42.206173 [ 26483 ] <Debug> ReadBufferFromS3: Caught exception while reading S3 object. Bucket: bucket.vpce-xxxxxxxxxxxxx-xxxxxx, Key: stagecopy/20231018.csv, Version: Latest, Offset: 0, Attempt: 0, Message: The specified bucket does not exist
2024.02.14 12:26:42.207448 [ 26483 ] <Error> executeQuery: Code: 499. DB::Exception: The specified bucket does not exist: while reading key: stagecopy/20231018.csv, from bucket: bucket.vpce-xxxxxxxxxxxxx-xxxxxx: Cannot extract table structure from CSVWithNames format file. You can specify the structure manually: (in file/uri bucket.vpce-xxxxxxxxxxxxx-xxxxxx/stagecopy/20231018.csv). (S3_ERROR) (version 23.8.8.20) (from 127.0.0.1:47206) (in query: SELECT name, count() AS c FROM s3('https://bucket.vpce-xxxxxxxxxxxxx-xxxxxx.s3.us-east-1.vpce.amazonaws.com/stagecopy/20231018.csv', 'CSVWithNames') GROUP BY name ORDER BY c DESC LIMIT 10), Stack trace (when copying this message, always include the lines below):
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c644ed7 in /usr/bin/clickhouse
2. DB::S3Exception::S3Exception(String const&, Aws::S3::S3Errors) @ 0x0000000010d4454d in /usr/bin/clickhouse
3. DB::ReadBufferFromS3::sendRequest(unsigned long, std::optional<unsigned long>) const @ 0x0000000010d42bb8 in /usr/bin/clickhouse
4. DB::ReadBufferFromS3::nextImpl() @ 0x0000000010d3fdcf in /usr/bin/clickhouse
5. DB::readSchemaFromFormat(String const&, std::optional<DB::FormatSettings> const&, DB::IReadBufferIterator&, bool, std::shared_ptr<DB::Context const>&, std::unique_ptr<DB::ReadBuffer, std::default_delete<DB::ReadBuffer>>&) @ 0x000000001318d6d2 in /usr/bin/clickhouse
6. DB::StorageS3::getTableStructureFromDataImpl(DB::StorageS3::Configuration const&, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::Context const>) @ 0x00000000128b27c0 in /usr/bin/clickhouse
7. DB::StorageS3::StorageS3(DB::StorageS3::Configuration const&, std::shared_ptr<DB::Context const>, DB::StorageID const&, DB::ColumnsDescription const&, DB::ConstraintsDescription const&, String const&, std::optional<DB::FormatSettings>, bool, std::shared_ptr<DB::IAST>) @ 0x00000000128b15ae in /usr/bin/clickhouse
8. DB::TableFunctionS3::executeImpl(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const>, String const&, DB::ColumnsDescription, bool) const @ 0x00000000109ce59d in /usr/bin/clickhouse
9. DB::ITableFunction::execute(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const>, String const&, DB::ColumnsDescription, bool, bool) const @ 0x0000000010c37403 in /usr/bin/clickhouse
10. DB::Context::executeTableFunction(std::shared_ptr<DB::IAST> const&, DB::ASTSelectQuery const*) @ 0x000000001165bb37 in /usr/bin/clickhouse
11. DB::JoinedTables::getLeftTableStorage() @ 0x0000000011fdb316 in /usr/bin/clickhouse
12. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context> const&, std::optional<DB::Pipe>, std::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::PreparedSets>) @ 0x0000000011ee6183 in /usr/bin/clickhouse
13. DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context>, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&) @ 0x0000000011f991a8 in /usr/bin/clickhouse
14. DB::InterpreterFactory::get(std::shared_ptr<DB::IAST>&, std::shared_ptr<DB::Context>, DB::SelectQueryOptions const&) @ 0x0000000011ea01be in /usr/bin/clickhouse
15. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000122e248a in /usr/bin/clickhouse
16. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x00000000122de475 in /usr/bin/clickhouse
17. DB::TCPHandler::runImpl() @ 0x0000000013155799 in /usr/bin/clickhouse
18. DB::TCPHandler::run() @ 0x0000000013167b79 in /usr/bin/clickhouse
19. Poco::Net::TCPServerConnection::start() @ 0x0000000015b5e154 in /usr/bin/clickhouse
20. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b5f351 in /usr/bin/clickhouse
21. Poco::PooledThread::run() @ 0x0000000015c95b87 in /usr/bin/clickhouse
22. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c93e5c in /usr/bin/clickhouse
23. ? @ 0x00007f36216d5ac3 in ?
24. ? @ 0x00007f3621767850 in ?
Received exception from server (version 23.8.8):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: The specified bucket does not exist: while reading key: stagecopy/20231018.csv, from bucket: bucket.vpce-xxxxxxxxxxxxx-xxxxxx: Cannot extract table structure from CSVWithNames format file. You can specify the structure manually: (in file/uri bucket.vpce-xxxxxxxxxxxxx-xxxxxx/stagecopy/20231018.csv). (S3_ERROR)
(query: SELECT
    name,
    count() AS c
FROM s3('https://bucket.vpce-xxxxxxxxxxxxx-xxxxxx.s3.us-east-1.vpce.amazonaws.com/stagecopy/20231018.csv', 'CSVWithNames')
GROUP BY name
ORDER BY c DESC
LIMIT 10)
@UnamedRus
Copy link
Contributor

Related #31074

@lesandie
Copy link
Contributor Author

AFAIK even hardcoding DNS in /etc/hosts does not work as explained in #31074

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy task Good for first contributors feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants