Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Blob Storage Disk support in ClickHouse #29430

Closed
jkuklis opened this issue Sep 27, 2021 · 9 comments
Closed

Azure Blob Storage Disk support in ClickHouse #29430

jkuklis opened this issue Sep 27, 2021 · 9 comments
Labels

Comments

@jkuklis
Copy link
Contributor

jkuklis commented Sep 27, 2021

Hello!

We would like to propose introducing support for Azure Blob Storage Disks in ClickHouse, in a similar way it was done for AWS S3 Disks. At Contentsquare we have already started preliminary work to make sure this is feasible.

Context

We use S3 Disks in our AWS servers for example for storing raw data or monitoring data loss with certain metadata, for which regular disks would be too expensive. The DiskS3 approach (and not e.g. S3Engine one) is the best for us, as it can be used with MergeTrees.

Soon we will need a similar solution in Azure servers. We decided internally that the best way for us to go would be to develop for Blob Storage Disks the same logic that was developed for S3 Disks.

Note on alternatives: we considered using DataLake Gen 2, a higher abstraction built on top of Blob Storage to mimic a disk behavior, but it doesn't offer enough flexibility, for example a possibility to do a move operation, which is important for ClickHouse. We also considered using a proxy server to translate commands from S3 to Blob Storage, but it would be too error-prone and inefficient.

Work plan

Below we present what we think is necessary to add the Blob Storage Disk to ClickHouse:

  • Azure SDK dependency
  • POCO HTTP wrapper for Azure
  • Azure authentication part
  • Blob Storage buffer handling
  • Blob Storage Disk
  • End to end integration tests

Azure SDK dependency

We managed to add the dependency by:

  • adding azure-sdk-for-cpp directory and azure-cmake directory with custom CMakeLists.txt to contrib
  • linking libraries and adding directories in src and contrib CMakeLists.txt files
  • adding Azure Blob Storage .cmake to cmake/find and including it in the main CMakeLists.txt file
  • adding a (deprecated) source file to borignssl-cmake CMakeLists.txt, as one of its functions is used in the Azure SDK

We were able to manipulate Blob Storage from within ClickHouse with this configuration.

POCO HTTP wrapper for Azure

This part is used for communication over the network and interpretation of messages. It would be based on the S3 counterpart, with all its files located in src/IO/S3. The S3 version is quite developed and robust, for the start we could probably implement a simpler solution. We could also extract the common part with S3 and create a parent class for it.

Azure authentication part

For the start we would like to rely on the role-based authentication, in which authen is granted to an Azure instance as a whole (so there are no credentials or secrets). We have already conducted preliminary tests for this type of authentication, it is an open question whether we can leave it like that for now, as S3 implementation supports more ways to authenticate. For S3, authentication is done in src/IO/S3Common .h and .cpp.

Blob Storage buffer handling

This part is for actual reading and writing buffers for Blob Storage. For S3, these are implemented in src/IO, in fReadBufferFromS3 and WriteBufferFromS3 .h and .cpp files. It is unclear whether these need to be extracted from the Disk implementation very early on.

Blob Storage Disk

Blob Storage Disk implementation of the IDiskRemote interface, based on the equivalent src/Disk/S3 files. Regarding mutualization of the logic for Blob Storage and S3, on one hand, it might be hard, as the implementations are short and quite Disk-specific, but on the other hand, this part seems to be updated rather frequently, so it might make sense to mutualize the logic to ensure that potential fixes and refactors are applied to both Disks.

Integration tests

We would like to create a couple of end-to-end integration tests on Contentsquare use cases. We aim to run the full Azure pipeline for at least a couple of days to make sure the solution runs smoothly. Functional and unit tests are also considered.

Execution

We aim to implement this feature on our own at Contentsquare provided that we get a green light from you on the design. We have already started working on this feature and expect it to be ready in the first quarter of 2022.

Questions

  • What is your general feeling about adding Azure Disk?
  • Can Blob Storage Disk be started in some preliminary form and not be fully announced immediately, or shall it be full-fledged from the very release, in particular:
    • Shall we extract the common part between S3 and Blob Storage in the code?
    • Do we need to provide all authentication methods?
  • What kind of tests are we expected to conduct?
  • Any suggestions to the dependency part? We are aware of e.g. https://clickhouse.com/docs/en/development/contrib/#adding-third-party-libraries.
  • Did we miss any part of the code necessary to be added?
  • What is the main reason for having a POCO client for S3 if AWS already provides HTTP clients using e.g. libcurl?



Thanks for attention, let us know what you think!

Jakub Kuklis
Contentsquare

@UnamedRus
Copy link
Contributor

Have you considered using a minio gateway for an azure server as a proxy server?

https://docs.min.io/docs/minio-gateway-for-azure.html

@jkuklis
Copy link
Contributor Author

jkuklis commented Sep 28, 2021

Hey @UnamedRus,

Yes, we have considered using a minio proxy, but we still decided to write the Disk Blob Storage part on our own. This approach gives more direct access to the storage and is thus more flexible, plus we remove one bottleneck from the system in the form of a proxy server - all the data would have to come through this proxy, which comes with a performance and system design penalty.

@mxalis
Copy link

mxalis commented Feb 1, 2022

Is there documentation for this feature? Seems it was released in 22.1, but can't find eg configuration docs

@jkuklis
Copy link
Contributor Author

jkuklis commented Feb 3, 2022

Hey @mxalis,

The feature isn't documented yet, as far as I know. Below I list the available configuration parameters.
These parameters are parsed in registerDiskAzureBlobStorage.cpp, AzureBlobStorageAuth.cpp and RemoteDisksCommon.cpp.
Examples of working configurations can be found in tests/integration in test_merge_tree_azure_blob_storage/configs/config.d/storage_conf.xml and in test_azure_blob_storage_zero_copy_replication/configs/config.d/storage_conf.xml.

Connection parameters:

  • storage_account_url - Required, Azure Blob Storage account URL, like http://account.blob.core.windows.net.
  • container_name - Target container name, defaults to default-container.
  • container_already_exists - If set to false, a new container container_name is created in the storage account, if set to true, disk connects to the container directly, and if left unset, disk connects to the account, checks if the container container_name exists, and creates it if it doesn't exist yet.

Authentication parameters (the disk will try all available methods and Managed Identity Credential):

  • connection_string - For authentication using a connection string.
  • account_name and account_key - For authentication using Shared Key.

Limit parameters (mainly for internal usage):

  • max_single_part_upload_size - Limits the size of a single block upload to Blob Storage.
  • min_bytes_for_seek - Limits the size of a seekable region.
  • max_single_read_retries - Limits the number of attempts to read a chunk of data from Blob Storage.
  • max_single_download_retries - Limits the number of attempts to download a readable buffer from Blob Storage.
  • thread_pool_size - Limits the number of threads with which IDiskRemote is instantiated.

Other parameters:

  • metadata_path - Path on local FS to store metadata files for Blob Storage. Default value is /var/lib/clickhouse/disks/<disk_name>/.
  • cache_enabled - Allows to cache mark and index files on local FS. Defaults to true.
  • cache_path - Path on local FS where to store cached mark and index files. Default value is /var/lib/clickhouse/disks/<disk_name>/cache/.
  • skip_access_check - If true, disk access checks will not be performed on disk start-up. Default value is false.

@kssenii, would you like us to update the documentation or shall we leave it to the ClickHouse team? If the former, shall we just create a section in https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/ like the one for S3?

@mxalis
Copy link

mxalis commented Feb 3, 2022

@jkuklis thanks, this will get me started

@openxxx
Copy link

openxxx commented Feb 23, 2022

@jkuklis Any suggestion on data ingestion? I tried to insert a 30~40MB orc/parquet file to a MergeTree table with azure blob storage. It turns out timeout exception(exceed 300seconds).
BTW, clickhouse seems not support premium page blob storage account.

@kssenii
Copy link
Member

kssenii commented Feb 23, 2022

@kssenii, would you like us to update the documentation or shall we leave it to the ClickHouse team?

It would be great if you could do that :)

@jkuklis
Copy link
Contributor Author

jkuklis commented Feb 23, 2022

Hey @openxxx,

When I was working on the implementation, I also tested it using hits_v1 table from Yandex.Metrica (https://clickhouse.com/docs/en/getting-started/example-datasets/metrica/), inserting up to 1kk rows at once from it, which took ~150MB in the original tar file for that table. I can't remember exactly how fast it was, I guess it took some time to upload, but I don't recall any timeouts.

In steps, I created a table using the same schema as hits_v, but with a specified storage_policy, and then ran INSERT INTO blob_table SELECT * FROM datasets.hits_v1 LIMIT 1000000. I think I chose 1kk rows limit for convenience to quicken development, while still testing a large sample - I don't think it was a matter of timeouts.

Have you managed to insert and query anything with that table? Maybe your authentication doesn't work as intended?

For Premium page blob accounts, it's likely it's unsupported, the implementation was tested only with a Standard account. Could you share more context on how it manifests, e.g. share some configs or the error if there is one?

@cwegener
Copy link

For Premium page blob accounts, it's likely it's unsupported, the implementation was tested only with a Standard account. Could you share more context on how it manifests, e.g. share some configs or the error if there is one?

Working config:

az storage account create \
  -n clickhouse$(openssl rand -hex 4) -g testgroup  \
  --sku Standard_LRS --public-network-access Enabled \
  --min-tls-version TLS1_2 --allow-shared-key-access false

Non-working config:

az storage account create \
  -n clickhouse$(openssl rand -hex 4) -g testgroup  \
  --sku Premium_LRS --public-network-access Enabled \
  --min-tls-version TLS1_2 --allow-shared-key-access false

disk config:

         <clickhouse>
           <storage_configuration>
             <disks>
               <azure_disk>
                 <type>azure_blob_storage</type>
                 <storage_account_url>https://<account>.blob.core.windows.net</storage_account_url>
                 <container_name>testcontainer</container_name>
                 <account_name><acconunt></account_name>
                 <use_workload_identity>true</use_workload_identity>
                 <metadata_path>/var/lib/clickhouse/disks/azure_disk/</metadata_path>
                 <cache_path>/var/lib/clickhouse/disks/azure_disk/cache/</cache_path>
                 <skip_access_check>false</skip_access_check>
               </azure_disk>
             </disks>
           </storage_configuration>
         </clickhouse>

Error message on server start:

2025.03.23 09:38:03.527761 [ 1 ] {} <Debug> Application: Destroyed global context.
2025.03.23 09:38:03.527845 [ 1 ] {} <Information> Application: Waiting for background threads
2025.03.23 09:38:03.537151 [ 1 ] {} <Information> Application: Background threads finished in 9 ms
2025.03.23 09:38:03.537601 [ 1 ] {} <Error> Application: std::exception. Code: 1001, type: Azure::Storage::StorageException, e.what() = 400 Block blobs are not supported.
Block blobs are not supported.
RequestId:4d3a1d89-d01c-0098-07d7-9b5b1e000000
Time:2025-03-23T09:38:03.3362842Z
Request ID: 4d3a1d89-d01c-0098-07d7-9b5b1e000000, Stack trace (when copying this message, always include the lines below):

0. Azure::Storage::StorageException::CreateFromResponse(std::unique_ptr<Azure::Core::Http::RawResponse, std::default_delete<Azure::Core::Http::RawResponse>>) @ 0x0000000018138283
1. Azure::Storage::Blobs::_detail::BlockBlobClient::Upload(Azure::Core::Http::_internal::HttpPipeline&, Azure::Core::Url const&, Azure::Core::IO::BodyStream&, Azure::Storage::Blobs::_detail::BlockBlobClient::UploadBlockBlobOptions const&, Azure::Core::Context const&) @ 0x0000000018125543
2. Azure::Storage::Blobs::BlockBlobClient::Upload(Azure::Core::IO::BodyStream&, Azure::Storage::Blobs::UploadBlockBlobOptions const&, Azure::Core::Context const&) const @ 0x00000000181077c2
3. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::WriteBufferFromAzureBlobStorage::preFinalize()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x0000000011a5770a
4. DB::WriteBufferFromAzureBlobStorage::execWithRetry(std::function<void ()>, unsigned long, unsigned long) @ 0x0000000011a5306c
5. DB::WriteBufferFromAzureBlobStorage::preFinalize() @ 0x0000000011a5425f
6. DB::WriteBufferFromFileDecorator::finalizeImpl() @ 0x000000001252924c
7. DB::WriteBufferWithFinalizeCallback::finalizeImpl() @ 0x00000000125683c9
8. DB::WriteBuffer::finalize() @ 0x000000000ef58fd0
9. DB::IDisk::checkAccessImpl(String const&) @ 0x00000000124f60a5
10. DB::IDisk::startup(std::shared_ptr<DB::Context const>, bool) @ 0x00000000124f56bd
11. std::shared_ptr<DB::IDisk> std::__function::__policy_invoker<std::shared_ptr<DB::IDisk> (String const&, Poco::Util::AbstractConfiguration const&, String const&, std::shared_ptr<DB::Context const>, std::map<String, std::shared_ptr<DB::IDisk>, std::less<String>, std::allocator<std::pair<String const, std::shared_ptr<DB::IDisk>>>> const&, bool, bool)>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::registerDiskObjectStorage(DB::DiskFactory&, bool)::$_0, std::shared_ptr<DB::IDisk> (String const&, Poco::Util::AbstractConfiguration const&, String const&, std::shared_ptr<DB::Context const>, std::map<String, std::shared_ptr<DB::IDisk>, std::less<String>, std::allocator<std::pair<String const, std::shared_ptr<DB::IDisk>>>> const&, bool, bool)>>(std::__function::__policy_storage const*, String const&, Poco::Util::AbstractConfiguration const&, String const&, std::shared_ptr<DB::Context const>&&, std::map<String, std::shared_ptr<DB::IDisk>, std::less<String>, std::allocator<std::pair<String const, std::shared_ptr<DB::IDisk>>>> const&, bool, bool) @ 0x00000000125695be
12. DB::DiskFactory::create(String const&, Poco::Util::AbstractConfiguration const&, String const&, std::shared_ptr<DB::Context const>, std::map<String, std::shared_ptr<DB::IDisk>, std::less<String>, std::allocator<std::pair<String const, std::shared_ptr<DB::IDisk>>>> const&, bool, bool, std::unordered_set<String, std::hash<String>, std::equal_to<String>, std::allocator<String>> const&) const @ 0x00000000124dbef4
13. DB::DiskSelector::initialize(Poco::Util::AbstractConfiguration const&, String const&, std::shared_ptr<DB::Context const>, std::function<bool (Poco::Util::AbstractConfiguration const&, String const&, String const&)>) @ 0x0000000012504b62
14. DB::Context::getDiskSelector(std::lock_guard<std::mutex>&) const @ 0x0000000012a7477d
15. DB::Context::getDisksMap(std::lock_guard<std::mutex>&) const @ 0x0000000012a74b96
16. DB::Context::getDatabaseDisk() const @ 0x0000000012a49fc3
17. DB::DatabaseWithOwnTablesBase::DatabaseWithOwnTablesBase(String const&, String const&, std::shared_ptr<DB::Context const>) @ 0x00000000123f6fbb
18. DB::DatabaseMemory::DatabaseMemory(String const&, std::shared_ptr<DB::Context const>) @ 0x000000001237d668
19. DB::DatabaseCatalog::initializeAndLoadTemporaryDatabase() @ 0x0000000012b3458d
20. DB::Server::main(std::vector<String, std::allocator<String>> const&) @ 0x000000000f190a31
21. Poco::Util::Application::run() @ 0x0000000017d396a6
22. DB::Server::run() @ 0x000000000f17bdf0
23. mainEntryClickHouseServer(int, char**) @ 0x000000000f178f13
24. main @ 0x0000000009f01bc1
25. ? @ 0x00007fd1c2ea3d90
26. ? @ 0x00007fd1c2ea3e40
27. _start @ 0x0000000006a9202e
 (version 25.2.2.39 (official build))
2025.03.23 09:38:03.537681 [ 1 ] {} <Information> Application: shutting down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants