-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the new Azure SDK for C++. #3910
Conversation
I think I found the problem. For some reason the |
cba3e2c
to
27635f9
Compare
Does this mean the azure backend will now be supported on Windows? |
@jabbera, yes - @teo-tsirpanis develops primarily on Windows, so it will definitely be supported there. |
Well, on Windows CI vcpkg does not run at all. 🥲 |
CI on Linux seems to fail because the Azurite version we use is old. |
I think I know why vcpkg doesn't get imported on Windows; we are setting |
Ugh. Good catch. That was used to work around this problem. Fix: #3949 |
c271697
to
7f1e0cf
Compare
The I kept a branch with the older Windows-only way of acquiring the Azure SDK that works on my machine, and tested it from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! 👍 I was able to run the UTs locally against Azurite and all of them passed. I ran into some problems with a debug build, and noticed the Azure tests take a long time to retry requests up to failure.
tiledb/sm/filesystem/azure.cc
Outdated
client_ = make_shared<azure::storage_lite::blob_client>( | ||
HERE(), account, thread_pool_->concurrency_level(), cert_file); | ||
::Azure::Storage::Blobs::BlobClientOptions options; | ||
options.Retry.MaxRetries = constants::azure_max_attempts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the following options here, after trying to run UTs and hanging for some time..
options.Retry.RetryDelay = std::chrono::milliseconds(100);
options.Retry.MaxRetryDelay = std::chrono::milliseconds(1000);
Azure SDK will increase the RetryDelay up to MaxRetryDelay for-each retry.. The default options seem to allow for up to 120s of hang time for a request retry. With the retry count set to 10, failing this request 10 times using these defaults could take quite a while. The C++ docs offer very little information on this 😕 so the other link here is from JS docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shaunrd0 and I decided to use the default Azure SDK retry policy (three retries starting from 800ms and growing exponentially).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these config options with defaults? We should also standardize the timing and retry count with what we have for AWS if we can, will just make it simplier to document and explain to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added config options. Standardizing across all cloud providers is hard because each SDK allows configuring the retry policies differently and with varying degrees of freedom.
9f5c961
to
9844027
Compare
@davisp do I have to copy the new Azure SDK ports to the |
@teo-tsirpanis Can you point me at the version you were trying to use? It shouldn't be much more than a copy/paste but the version resolution ends up being mostly manual because I've used super old port versions to match our existing |
This comment was marked as off-topic.
This comment was marked as off-topic.
They map to what the Azure SDK exposes and have its default values.
CI fixes in #4055 |
A follow-up to #3910, this PR fixes typos, improves a message, and infers the blob endpoint from `vfs.azure.storage_account_name` if `vfs.azure.blob_endpoint` was not specified, matching the behavior of the previous SDK. I also added `CONFIG` changelog entries that I forgot to do in #3910. --- TYPE: IMPROVEMENT DESC: Improvements to the Azure VFS. TYPE: CONFIG DESC: Add vfs.azure.max_retries, vfs.azure.retry_delay_ms and vfs.azure.max_retry_delay_ms options to control the retry policy when connecting to Azure. TYPE: CONFIG DESC: Remove the vfs.azure.use_https option; connecting to Azure with HTTP will require specifying the scheme in vfs.azure.blob_endpoint instead. === * Remove unnecessary headers. * Fix typos and improve a message. * Construct the blob endpoint from the account name if not specified. * Bring back `vfs.azure.storage_sas_token`. * Append the SAS token to the blob endpoint if specified. This keeps backwards compatibility with the previous Azure SDK. * Update documentation of config values. * Fix typo.
To communicate with Azure Storage we have been using the Azure C++ Storage Client Lite which is no longer in active development. This PR switches the Azure VFS to use the new Azure SDK for C++. The dependency to the old SDK is removed.
The migration is mostly a straightforward change from the old SDK calls to the new ones. The new SDK does not use futures, simplifying the code. As a bonus, we now pass more detailed error messages. Some stuff around configuration will now work differently, and I added relevant warnings.
Azure unit tests succeed on my machine. Let's see what CI thinks.
SC-21996
SC-25039
SC-25045
SC-25046
TYPE: IMPROVEMENT
DESC: Use the new Azure SDK for C++.