-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure multi read options #15630
Azure multi read options #15630
Conversation
{ | ||
try { | ||
AzureStorage azureStorage = new AzureStorage(azureIngestClientFactory, location.getBucket()); | ||
Pair<String, String> locationInfo = getContainerAndPathFromObjectLocation(location); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, do we need to verify this returned pair in anyway?
...tensions/src/main/java/org/apache/druid/data/input/azure/AzureStorageAccountInputSource.java
Outdated
Show resolved
Hide resolved
...tensions/src/main/java/org/apache/druid/data/input/azure/AzureStorageAccountInputSource.java
Outdated
Show resolved
Hide resolved
...re/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureByteSourceFactory.java
Show resolved
Hide resolved
...s-core/azure-extensions/src/main/java/org/apache/druid/storage/azure/AzureClientFactory.java
Show resolved
Hide resolved
|
||
if (currentUri.getScheme().equals(AzureStorageAccountInputSource.SCHEME)) { | ||
CloudObjectLocation cloudObjectLocation = new CloudObjectLocation(currentUri); | ||
Pair<String, String> containerInfo = AzureStorageAccountInputSource.getContainerAndPathFromObjectLocation(cloudObjectLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto about maybe the need to validate the pair returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do u think we should validate about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is either in the pair come back empty or null, would that cause unexpected / unhandled error elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a check in the function itself so it doesn't throw a index error on a bad input spec now. basically what happens now is the task will eventually fail b/c it fails to read the bad input file which i think is okay (the error from azure is pretty clear)
...re-extensions/src/main/java/org/apache/druid/storage/azure/output/AzureStorageConnector.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits in the docs pages but nothing major. Thanks!
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
...ons-core/azure-extensions/src/test/java/org/apache/druid/storage/azure/AzureStorageTest.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Description
Currently the "azure" input source schema only supports ingesting files that are stored in the Azure Storage Account specified in
druid.azure.account
. To support ingesting data from different storage accounts, add a new "azureStorage" input source schema for azure with a slightly different spec.Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
I would have preferred to keep using the regular AzureInputSource class but that class assumes CloudObjectLocation.bucket to be the container of the file and CloudObjectLocation.path to be the path the file within the bucket. I couldn't think of a way to keep the behavior backwards compatible with existing ingestion specs and support multiple storage accounts.
One idea i had was to introduce a "storageAccount" field directly in AzureInputSource (instead of in CloudObjectLocation), then use that field when creating AzureStorage classes. I think this would work but it also would mean that the behavior of azure ingestion is different from other types of ingestion (by having the storage account not in the ingestion uri) and this could cause problems down the line.
Since the AzureStorage instance is a abstraction over Azure Blob Storage clients, I thought it made sense to create a instance of it per Azure ingestion spec, since we do something similar with S3 (create a s3 client for each s3 ingestion spec).
It would have also been possible to pass the AzureInpuSourceConfig to the relevant functions in AzureStorage and generate different clients but I thought this would have been confusing.
I added support for key, sas token, and app registration authentication when ingesting from storage accounts. Managed/Workload identity auth can also work by not specifying properties and making sure the identity the cluster is deployed with can access the external storage account.
Release note
Support Azure ingestion from multiple Storage Accounts.
Key changed/added classes in this PR
AzureStorageAccountInputSource
AzureInpuSourceConfig
AzureClientFactory
AzureStorage
AzureEntity
This PR has: