New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly fix GCS when HMAC is used #49390
Conversation
This is an automated comment for commit edf6580 with description of existing statuses. It's updated for the latest CI running
|
I've done a couple of quick tests with GCS (setting up an s3 disk and reading a parquet file) with this PR and didn't see any problem so far. |
bool Client::supportsMultiPartCopy() const | ||
{ | ||
return provider_type; | ||
return provider_type != ProviderType::GCS; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: Unrelated to this PR but worth understanding some implications when using GCS. Why do we disable multipart upload in GCS? I thought this was already supported and the "only" missing big difference between GCS and S3 was the multi-delete.
This page discusses XML API multipart uploads in Cloud Storage. This upload method uploads files in parts and then assembles them into a single object using a final request. XML API multipart uploads are compatible with Amazon S3 multipart uploads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't disable multipart upload but multipart native copy.
This is not supported by GCS so we have to try to do single part native copy and if it fails fallback to upload instead of multipart native copy as we do for AWS.
You can read some things left to do in the previous PR description #48981
The only difference is that GCS support team confirmed that they don't have a limit of 5GB for a single part native copy so we will probably remove the limit if GCS is used.
Additional problem is the need to compose a file uploaded with multi part upload before it can be copied natively but it returns 503 in some cases. This was also confirmed to be a problem on their side but I prepared a PR for it and will push it once it's resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until all the improvements are made it's important to be aware that backup could be slower than AWS in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you @antonio2368!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antonio2368 maybe worth to auto-detect this feature, like DeleteObjects
?
static bool checkBatchRemove(S3ObjectStorage & storage, const String & key_with_trailing_slash) |
Since there can be other provides that will not support some features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that sounds like a better solution
maybe combine it and do it if the provider type is UNKNOWN because we already know best on URL for AWS and GCS
enum class ApiMode : uint8_t | ||
{ | ||
AWS, | ||
GCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please provide some comments about the enum and its constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Correctly modify GCS requests when
HMAC
key is used.Fix #49195
cc @Algunenano @vitlibar
Documentation entry for user-facing changes