-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lambda performance revamp #9395
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR makes me very happy :)
Not only is it more efficient, but it also is way more readable!
I just have one small formatting suggestion: could the functors be moved to their respective files so that all the filter, transform, etc., functions are together? Since we are planning on adding more lambda functions (eg. reduce), it will keep lamda_functions.cpp
more concise.
@Mytherin, all green :) @maiadegraaf and I discussed continuing the modularization and moving the functors in a different PR, as she'll be working on |
Thanks! |
Merge pull request duckdb/duckdb#9164 from Mause/feature/jdbc-uuid-param Merge pull request duckdb/duckdb#9185 from pdet/adbc_07 Merge pull request duckdb/duckdb#9126 from Maxxen/parquet-kv-metadata Merge pull request duckdb/duckdb#9123 from lnkuiper/parquet_schema Merge pull request duckdb/duckdb#9086 from lnkuiper/json_inconsistent_structure Merge pull request duckdb/duckdb#8977 from Tishj/python_readcsv_multi_v2 Merge pull request duckdb/duckdb#9279 from hawkfish/nsdate-cast Merge pull request duckdb/duckdb#8851 from taniabogatsch/binary_lambdas Merge pull request duckdb/duckdb#8983 from Maxxen/types/fixedsizelist Merge pull request duckdb/duckdb#9318 from Maxxen/fix-unused Merge pull request duckdb/duckdb#9220 from hawkfish/exclude Merge pull request duckdb/duckdb#9230 from Maxxen/json-plan-serialization Merge pull request duckdb/duckdb#9011 from Tmonster/add_create_statement_support_to_fuzzer Merge pull request duckdb/duckdb#9400 from Maxxen/array-fixes Merge pull request duckdb/duckdb#8741 from Tishj/python_import_cache_upgrade Merge fixes Merge pull request duckdb/duckdb#9395 from taniabogatsch/lambda-performance Merge pull request duckdb/duckdb#9427 from Tishj/python_table_support_replacement_scan Merge pull request duckdb/duckdb#9516 from carlopi/fixformat Merge pull request duckdb/duckdb#9485 from Maxxen/fix-parquet-serialization Merge pull request duckdb/duckdb#9388 from chrisiou/issue217 Merge pull request duckdb/duckdb#9565 from Maxxen/fix-array-vector-sizes Merge pull request duckdb/duckdb#9583 from carlopi/feature Merge pull request duckdb/duckdb#8907 from cryoEncryp/new-list-functions Merge pull request duckdb/duckdb#8642 from Virgiel/capi-streaming-arrow Merge pull request duckdb/duckdb#8658 from Tishj/pytype_optional Merge pull request duckdb/duckdb#9040 from Light-City/feature/set_mg
This PR refactors the lambda function execution code. Most notably, it removes all calls to
Flatten
and other 'relics' from the initial implementation. The changes significantly increase performance and make the code more structured and readable. This should make the lambdas efficient enough to be used directly in cases where unnesting is the current alternative. They can now scale to lists with a combined (over the whole table) element count of over 10M.This also paves the way for an efficient future
list_reduce
implementation (@maiadegraaf, @Maxxen).Benchmark numbers
Here are the benchmark numbers for 10K values. The benchmarks included in this PR run with 100K values. I took the benchmarks from those that @cryoEncryp used in his PR.
This PR
Feature branch
SQL