Skip to content

Conversation

@feniljain
Copy link
Contributor

@feniljain feniljain commented Nov 4, 2025

Which issue does this PR close?

What changes are included in this PR?

Added a distinct element calculator in core hash join loop. It also works on an assumption that indices will be returned in an increasing order, I couldn't see a place where this assumption is broken, but if that's not the case, please do help me out.

Also, I am not 100% sure my implementation for avg_fanout is correct, so do let me know if that needs changes.

Are these changes tested?

No failures in sqllogictests/tests in datafusion/core/tests/sql/, should I add a test case for this?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Nov 4, 2025
@feniljain feniljain force-pushed the feat-hash-join-metrics branch from 7ba1336 to e00d993 Compare November 4, 2025 19:14
@2010YOUY01
Copy link
Contributor

Thank you, I think the implementation is correct.

The only consideration is performance, Hash Join implementation is definitely on the performance critical path, so we have to be careful not to introduce additional overhead. This PR should be good to go if we can verify it has no influence on the performance.

In this PR, the extra overhead is for each batch, count the distinct_count for a sorted vector like [0,1,1,2,2,2...] up to batch size long, it seem shouldn't be the bottleneck. (@alamb Could you help trigger then benchmark please?)

I believe these metrics provide more insight than simply computing output_rows / input_rows for equal joins. However, if they introduce noticeable overhead, we can move them under ExplainAnalyzeLevel::Dev, and track them only when this extra-verbose level is enabled. We should also document that more detailed analyze levels may incur additional execution overhead but offer deeper insights.

@xudong963
Copy link
Member

Thank you, I think the implementation is correct.

The only consideration is performance, Hash Join implementation is definitely on the performance critical path, so we have to be careful not to introduce additional overhead. This PR should be good to go if we can verify it has no influence on the performance.

In this PR, the extra overhead is for each batch, count the distinct_count for a sorted vector like [0,1,1,2,2,2...] up to batch size long, it seem shouldn't be the bottleneck. (@alamb Could you help trigger then benchmark please?)

I believe these metrics provide more insight than simply computing output_rows / input_rows for equal joins. However, if they introduce noticeable overhead, we can move them under ExplainAnalyzeLevel::Dev, and track them only when this extra-verbose level is enabled. We should also document that more detailed analyze levels may incur additional execution overhead but offer deeper insights.

+1. Better to do some profiling for the classic join patterns to have a clear print.

@alamb
Copy link
Contributor

alamb commented Nov 5, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat-hash-join-metrics (e00d993) to c5fb605 diff using: tpch_mem
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Nov 5, 2025

In this PR, the extra overhead is for each batch, count the distinct_count for a sorted vector like [0,1,1,2,2,2...] up to batch size long, it seem shouldn't be the bottleneck. (@alamb Could you help trigger then benchmark please?)

Kicked it off

@alamb
Copy link
Contributor

alamb commented Nov 5, 2025

🤖: Benchmark completed

Details

Comparing HEAD and feat-hash-join-metrics
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ feat-hash-join-metrics ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 140.73 ms │              143.40 ms │    no change │
│ QQuery 2     │  26.64 ms │               29.79 ms │ 1.12x slower │
│ QQuery 3     │  34.25 ms │               41.59 ms │ 1.21x slower │
│ QQuery 4     │  29.03 ms │               29.87 ms │    no change │
│ QQuery 5     │  87.12 ms │               89.33 ms │    no change │
│ QQuery 6     │  19.23 ms │               19.52 ms │    no change │
│ QQuery 7     │ 220.66 ms │              232.08 ms │ 1.05x slower │
│ QQuery 8     │  32.49 ms │               34.91 ms │ 1.07x slower │
│ QQuery 9     │ 107.24 ms │              103.10 ms │    no change │
│ QQuery 10    │  62.77 ms │               67.07 ms │ 1.07x slower │
│ QQuery 11    │  17.54 ms │               18.19 ms │    no change │
│ QQuery 12    │  50.81 ms │               52.23 ms │    no change │
│ QQuery 13    │  47.93 ms │               49.43 ms │    no change │
│ QQuery 14    │  13.78 ms │               14.63 ms │ 1.06x slower │
│ QQuery 15    │  24.87 ms │               26.07 ms │    no change │
│ QQuery 16    │  25.58 ms │               25.40 ms │    no change │
│ QQuery 17    │ 149.39 ms │              156.53 ms │    no change │
│ QQuery 18    │ 276.29 ms │              291.48 ms │ 1.05x slower │
│ QQuery 19    │  36.79 ms │               38.34 ms │    no change │
│ QQuery 20    │  49.66 ms │               49.73 ms │    no change │
│ QQuery 21    │ 318.85 ms │              329.51 ms │    no change │
│ QQuery 22    │  22.17 ms │               21.37 ms │    no change │
└──────────────┴───────────┴────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                     │ 1793.83ms │
│ Total Time (feat-hash-join-metrics)   │ 1863.55ms │
│ Average Time (HEAD)                   │   81.54ms │
│ Average Time (feat-hash-join-metrics) │   84.71ms │
│ Queries Faster                        │         0 │
│ Queries Slower                        │         7 │
│ Queries with No Change                │        15 │
│ Queries with Failure                  │         0 │
└───────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Nov 5, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing feat-hash-join-metrics (e00d993) to c5fb605 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Nov 5, 2025

🤖: Benchmark completed

Details

Comparing HEAD and feat-hash-join-metrics
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ feat-hash-join-metrics ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2676.22 ms │             2687.81 ms │     no change │
│ QQuery 1     │  1239.21 ms │             1201.13 ms │     no change │
│ QQuery 2     │  2297.31 ms │             2287.02 ms │     no change │
│ QQuery 3     │  1203.15 ms │             1194.59 ms │     no change │
│ QQuery 4     │  2317.59 ms │             2396.58 ms │     no change │
│ QQuery 5     │ 28535.00 ms │            28593.26 ms │     no change │
│ QQuery 6     │  4243.56 ms │             4140.09 ms │     no change │
│ QQuery 7     │  3738.66 ms │             3537.71 ms │ +1.06x faster │
└──────────────┴─────────────┴────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                     │ 46250.69ms │
│ Total Time (feat-hash-join-metrics)   │ 46038.19ms │
│ Average Time (HEAD)                   │  5781.34ms │
│ Average Time (feat-hash-join-metrics) │  5754.77ms │
│ Queries Faster                        │          1 │
│ Queries Slower                        │          0 │
│ Queries with No Change                │          7 │
│ Queries with Failure                  │          0 │
└───────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ feat-hash-join-metrics ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.56 ms │                2.30 ms │ +1.11x faster │
│ QQuery 1     │    48.43 ms │               49.43 ms │     no change │
│ QQuery 2     │   137.41 ms │              133.50 ms │     no change │
│ QQuery 3     │   164.42 ms │              163.90 ms │     no change │
│ QQuery 4     │  1212.32 ms │             1092.00 ms │ +1.11x faster │
│ QQuery 5     │  1641.93 ms │             1547.45 ms │ +1.06x faster │
│ QQuery 6     │     2.32 ms │                2.51 ms │  1.08x slower │
│ QQuery 7     │    56.78 ms │               56.49 ms │     no change │
│ QQuery 8     │  1623.39 ms │             1433.44 ms │ +1.13x faster │
│ QQuery 9     │  1985.94 ms │             1803.53 ms │ +1.10x faster │
│ QQuery 10    │   365.86 ms │              362.26 ms │     no change │
│ QQuery 11    │   423.33 ms │              426.94 ms │     no change │
│ QQuery 12    │  1560.49 ms │             1346.06 ms │ +1.16x faster │
│ QQuery 13    │  2258.56 ms │             2137.00 ms │ +1.06x faster │
│ QQuery 14    │  1364.37 ms │             1255.45 ms │ +1.09x faster │
│ QQuery 15    │  1361.68 ms │             1254.45 ms │ +1.09x faster │
│ QQuery 16    │  2744.94 ms │             2704.16 ms │     no change │
│ QQuery 17    │  2699.05 ms │             2691.15 ms │     no change │
│ QQuery 18    │  4902.42 ms │             5318.32 ms │  1.08x slower │
│ QQuery 19    │   128.58 ms │              133.94 ms │     no change │
│ QQuery 20    │  2034.58 ms │             2178.17 ms │  1.07x slower │
│ QQuery 21    │  2335.23 ms │             2493.61 ms │  1.07x slower │
│ QQuery 22    │  3979.30 ms │             4134.97 ms │     no change │
│ QQuery 23    │ 12979.52 ms │            13176.79 ms │     no change │
│ QQuery 24    │   214.84 ms │              222.73 ms │     no change │
│ QQuery 25    │   474.79 ms │              485.79 ms │     no change │
│ QQuery 26    │   218.05 ms │              222.30 ms │     no change │
│ QQuery 27    │  2864.29 ms │             2968.56 ms │     no change │
│ QQuery 28    │ 23459.77 ms │            23870.99 ms │     no change │
│ QQuery 29    │   962.14 ms │              982.48 ms │     no change │
│ QQuery 30    │  1374.68 ms │             1289.52 ms │ +1.07x faster │
│ QQuery 31    │  1408.04 ms │             1348.50 ms │     no change │
│ QQuery 32    │  4598.77 ms │             4668.63 ms │     no change │
│ QQuery 33    │  5701.39 ms │             5612.52 ms │     no change │
│ QQuery 34    │  5967.52 ms │             5664.98 ms │ +1.05x faster │
│ QQuery 35    │  2164.31 ms │             2006.57 ms │ +1.08x faster │
│ QQuery 36    │   124.84 ms │              119.75 ms │     no change │
│ QQuery 37    │    52.35 ms │               51.32 ms │     no change │
│ QQuery 38    │   127.46 ms │              118.98 ms │ +1.07x faster │
│ QQuery 39    │   200.91 ms │              198.17 ms │     no change │
│ QQuery 40    │    40.65 ms │               42.20 ms │     no change │
│ QQuery 41    │    40.50 ms │               38.77 ms │     no change │
│ QQuery 42    │    32.30 ms │               32.18 ms │     no change │
└──────────────┴─────────────┴────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                     │ 96041.03ms │
│ Total Time (feat-hash-join-metrics)   │ 95842.78ms │
│ Average Time (HEAD)                   │  2233.51ms │
│ Average Time (feat-hash-join-metrics) │  2228.90ms │
│ Queries Faster                        │         13 │
│ Queries Slower                        │          4 │
│ Queries with No Change                │         26 │
│ Queries with Failure                  │          0 │
└───────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ feat-hash-join-metrics ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 142.43 ms │              142.85 ms │     no change │
│ QQuery 2     │  27.62 ms │               28.79 ms │     no change │
│ QQuery 3     │  40.54 ms │               36.59 ms │ +1.11x faster │
│ QQuery 4     │  29.48 ms │               30.18 ms │     no change │
│ QQuery 5     │  90.55 ms │               89.00 ms │     no change │
│ QQuery 6     │  19.70 ms │               19.50 ms │     no change │
│ QQuery 7     │ 227.81 ms │              234.14 ms │     no change │
│ QQuery 8     │  33.79 ms │               33.35 ms │     no change │
│ QQuery 9     │ 108.03 ms │              105.63 ms │     no change │
│ QQuery 10    │  64.37 ms │               66.94 ms │     no change │
│ QQuery 11    │  18.40 ms │               17.97 ms │     no change │
│ QQuery 12    │  52.20 ms │               53.43 ms │     no change │
│ QQuery 13    │  47.74 ms │               52.96 ms │  1.11x slower │
│ QQuery 14    │  14.24 ms │               14.31 ms │     no change │
│ QQuery 15    │  25.47 ms │               25.93 ms │     no change │
│ QQuery 16    │  24.89 ms │               24.85 ms │     no change │
│ QQuery 17    │ 153.19 ms │              161.08 ms │  1.05x slower │
│ QQuery 18    │ 286.54 ms │              285.21 ms │     no change │
│ QQuery 19    │  37.91 ms │               38.49 ms │     no change │
│ QQuery 20    │  49.92 ms │               51.43 ms │     no change │
│ QQuery 21    │ 334.08 ms │              341.07 ms │     no change │
│ QQuery 22    │  21.08 ms │               22.20 ms │  1.05x slower │
└──────────────┴───────────┴────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                     ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                     │ 1849.99ms │
│ Total Time (feat-hash-join-metrics)   │ 1875.92ms │
│ Average Time (HEAD)                   │   84.09ms │
│ Average Time (feat-hash-join-metrics) │   85.27ms │
│ Queries Faster                        │         1 │
│ Queries Slower                        │         3 │
│ Queries with No Change                │        18 │
│ Queries with Failure                  │         0 │
└───────────────────────────────────────┴───────────┘

@feniljain
Copy link
Contributor Author

Considering the results, I think we should move it to ExplainAnalyzeLevel::Dev then?

@2010YOUY01
Copy link
Contributor

Considering the results, I think we should move it to ExplainAnalyzeLevel::Dev then?

It's running in a noisy cloud environment, and tpch_mem takes quite short time, so it might not be accurate.

I’ve verified this with tpch_mem10 locally, and it actually slows down several queries. I tried to make this count distinct indices faster (and sent a PR feniljain#2), and after that I think the result looks good:

--------------------
Benchmark tpch_mem_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ pr-18488-opt ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  224.33 ms │    225.34 ms │     no change │
│ QQuery 2     │   52.61 ms │     53.36 ms │     no change │
│ QQuery 3     │  116.20 ms │    115.79 ms │     no change │
│ QQuery 4     │   70.10 ms │     69.32 ms │     no change │
│ QQuery 5     │  199.10 ms │    198.30 ms │     no change │
│ QQuery 6     │   50.64 ms │     46.92 ms │ +1.08x faster │
│ QQuery 7     │  474.45 ms │    479.40 ms │     no change │
│ QQuery 8     │  158.07 ms │    157.65 ms │     no change │
│ QQuery 9     │  341.40 ms │    336.64 ms │     no change │
│ QQuery 10    │  143.37 ms │    150.52 ms │     no change │
│ QQuery 11    │   41.77 ms │     42.82 ms │     no change │
│ QQuery 12    │  146.51 ms │    147.67 ms │     no change │
│ QQuery 13    │  118.36 ms │    120.90 ms │     no change │
│ QQuery 14    │   25.81 ms │     25.68 ms │     no change │
│ QQuery 15    │   54.43 ms │     55.78 ms │     no change │
│ QQuery 16    │   47.39 ms │     47.76 ms │     no change │
│ QQuery 17    │  414.56 ms │    417.56 ms │     no change │
│ QQuery 18    │ 1042.19 ms │   1074.00 ms │     no change │
│ QQuery 19    │   77.35 ms │     80.07 ms │     no change │
│ QQuery 20    │   97.83 ms │     98.10 ms │     no change │
│ QQuery 21    │ 4416.18 ms │   4322.24 ms │     no change │
│ QQuery 22    │   35.76 ms │     35.70 ms │     no change │
└──────────────┴────────────┴──────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary           ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)           │ 8348.40ms │
│ Total Time (pr-18488-opt)   │ 8301.53ms │
│ Average Time (main)         │  379.47ms │
│ Average Time (pr-18488-opt) │  377.34ms │
│ Queries Faster              │         1 │
│ Queries Slower              │         0 │
│ Queries with No Change      │        21 │
│ Queries with Failure        │         0 │
└─────────────────────────────┴───────────┘

@feniljain
Copy link
Contributor Author

feniljain commented Nov 6, 2025

It's running in a noisy cloud environment, and tpch_mem takes quite short time, so it might not be accurate.

Very interesting

I’ve verified this with tpch_mem10 locally, and it actually slows down several queries.

Thanks for confirming!

I tried to make this count distinct indices faster (and sent a PR feniljain#2),

Very curious about this PR, cause it seems you have written the same logic as mine, but using loops instead, am I missing some detail?

Is it that the None check is causing all this overhead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add selectivity metrics (for Explain Analyze) in Hash Join

4 participants