Direct (nested loop) join for merge tree tables by vdimir · Pull Request #89920 · ClickHouse/ClickHouse

vdimir · 2025-11-12T12:33:01Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Support direct (nested loop) join for MergeTree tables. To use it, specify it as the single option in the setting: join_algorithm = 'direct'

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2025-11-12T12:33:31Z

Workflow [PR], commit [8381b14]

Summary: ❌

job_name	test_name	status	info	comment
BuzzHouse (amd_debug)		failure
	Logical error: 'Inconsistent AST formatting in SelectQuery: the query:	FAIL	cidb

rschu1ze · 2025-11-15T18:42:23Z

Did direct joins for merge tree tables not work before?

vdimir · 2025-11-17T13:39:09Z

Did direct joins for merge tree tables not work before?

@rschu1ze

Not quite, it works only if you create LAYOUT(DIRECT()) on top of table and join with that dictionary

rschu1ze · 2025-11-17T17:03:22Z

Right, I misread the PR description somehow. Thanks.

Copilot

Pull request overview

This PR introduces support for direct (nested loop) join with MergeTree tables, controlled by setting join_algorithm = 'direct'. The implementation enables efficient joins between regular tables and MergeTree tables by performing index-based lookups instead of building hash tables.

Key Changes:

Added DirectJoinMergeTreeEntity class that implements IKeyValueEntity interface for MergeTree direct join operations
Extended IKeyValueEntity::getByKeys interface to support ALL join semantics via an out_offsets parameter
Implemented query plan cloning capabilities for ReadFromMergeTree and storage snapshots to enable parallel lookups
Added comprehensive test coverage for various join types (INNER, LEFT, SEMI LEFT, ANTI LEFT) with MergeTree tables

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/Interpreters/DirectJoinMergeTreeEntity.h/cpp	New entity implementing direct join logic for MergeTree tables with IN-based filtering
src/Interpreters/IKeyValueEntity.h	Extended interface with `out_offsets` parameter for ALL join semantics
src/Interpreters/DirectJoin.cpp	Updated to handle ALL join semantics with offsets from key-value entities
src/Planner/PlannerJoinTree.cpp	Added logic to detect and create DirectJoinMergeTreeEntity when conditions are met
src/Processors/QueryPlan/ReadFromMergeTree.h/cpp	Added cloning support and query condition cache controls for multiple lookups
src/Storages/StorageSnapshot.h	Added virtual `clone()` method to Data base class
src/Storages/MergeTree/MergeTreeData.h	Implemented `clone()` for MergeTree snapshot data
src/Storages/StorageMemory.h	Implemented `clone()` for Memory snapshot data
src/Storages/Storage*.h/cpp	Updated `getByKeys` signatures to include `out_offsets` parameter
tests/queries/0_stateless/03712_*.sql/reference	Functional test verifying indexed vs full scan behavior
tests/queries/0_stateless/03742_*.sql/reference	Long test verifying different join types with large datasets

tests/queries/0_stateless/03742_nested_loop_join_long.sql

tests/queries/0_stateless/03712_nested_loop_join_merge_tree.sql

src/Interpreters/DirectJoin.cpp