Antalya 25.8: remote initiator improvements#1576
Antalya 25.8: remote initiator improvements#1576ianton-ru wants to merge 9 commits intoantalya-25.8from
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 387cfa13d8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/Core/SettingsChangesHistory.cpp
Outdated
| { | ||
| {"allow_local_data_lakes", false, false, "New setting."}, | ||
| {"iceberg_partition_timezone", "", "", "New setting."}, | ||
| {"s3_propagate_credentials_to_other_storages", false, false, "New setting"}, |
There was a problem hiding this comment.
Drop unknown settings from compatibility history
This entry introduces a setting name that is not declared in src/Core/Settings.cpp (same issue also appears for export_merge_tree_partition_lock_inside_the_task and export_merge_tree_partition_system_table_prefer_remote_information later in this block). When compatibility is set to a version below 25.8.16.20002, SettingsImpl::applyCompatibilitySetting() iterates these changes and attempts to read/apply them, which triggers runtime "setting not found" failures instead of applying compatibility defaults.
Useful? React with 👍 / 👎.
src/Core/SettingsChangesHistory.cpp
Outdated
| {"allow_local_data_lakes", false, false, "New setting."}, | ||
| {"iceberg_partition_timezone", "", "", "New setting."}, | ||
| {"s3_propagate_credentials_to_other_storages", false, false, "New setting"}, | ||
| {"export_merge_tree_part_filename_pattern", "", "{part_name}_{checksum}", "New setting"}, |
There was a problem hiding this comment.
Keep filename-pattern change in its original version block
This re-adds export_merge_tree_part_filename_pattern as a "new setting" in 25.8.16.20002, but it is already introduced in the 25.8.9.2000 block with the same previous/new values. Because compatibility rollback applies all versions newer than the requested one, setting compatibility to any version between those two (for example 25.8.10) will now incorrectly force this setting back to "", producing defaults that don't match that target version.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b9c0d9bba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| auto remote_initiator_cluster_name = settings[Setting::object_storage_remote_initiator_cluster].value; | ||
| if (remote_initiator_cluster_name.empty()) | ||
| remote_initiator_cluster_name = cluster_name_from_settings; | ||
| auto remote_initiator_cluster = getClusterImpl(context, remote_initiator_cluster_name); |
There was a problem hiding this comment.
Apply object_storage_max_nodes to initiator cluster selection
IStorageCluster::read now builds remote_initiator_cluster without the object_storage_max_nodes cap, so object_storage_remote_initiator=1 can pick a remote initiator from hosts that the query intentionally excluded via object_storage_max_nodes. In environments that use this cap to avoid unhealthy/unconfigured nodes, this introduces nondeterministic routing failures (auth/connectivity/cluster visibility) even though the setting is configured. The initiator cluster lookup should use the same max-host limit as the normal cluster path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
It's ok, with object_storage_max_nodes getCluster chooses this number of random cluster nodes, and for remote initiator also random node is selected, Select random node from random nodes sublist gives the same result as just select random node from full list.
|
AI audit note: This review comment was generated by AI (gpt-5.4). Audit update for PR #1576 after the latest branch update: Confirmed defectsNo confirmed defects in reviewed scope. Coverage summaryScope reviewed: current PR diff vs Categories failed: none confirmed. Categories passed: previous Assumptions/limits: static audit only; I did not run the integration suite or build the server, so this conclusion is limited to code-path reasoning over the current PR state. |
| Prewhere filter | ||
| Prewhere filter column: less(multiply(2, b), 100) | ||
| Filter column: and(indexHint(greater(plus(i, 40), 0)), equals(a, 0)) (removed) | ||
| Filter column: and(equals(a, 0), indexHint(greater(plus(i, 40), 0))) (removed) |
There was a problem hiding this comment.
Argument order depends on hash, hash was changes (see FunctionNode::updateTreeHashImpl)
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Different improvement for remote initiator
Documentation entry for user-facing changes
With remote initiator feature queries like
rewrites as
'remote_host' is a random host from 'swarm' cluster
See #756
Current PR introduces the next improvements:
object_storage_remote_initiatorauth works incorrectly #1570 - uses username and password if access to cluster requires it. Throws exception if cluster uses common secret, this should be solved in future PRs.object_storage_remote_initiatorwith different cluster name #1571 - new settingobject_storage_remote_initiator_clusterallows to chooseremote_hostfrom different cluster, not only fromswarmremotequery did not work with additional setting inside function, likeremote('remote_host', iceberg(..., SETTINGS iceberg_metadata_file_path='path/to/metadata.json')). Now must work correctly.remote('remote_host', icebergCluster('remote_cluster', ...))clusterremote_clustercan be defined only onremote_hostand unknown on current initial host. Removed cluster check on early stage, this allows to execute such queries.CI/CD Options
Exclude tests:
Regression jobs to run: