-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Description
Introduction
Goal of this ticket is a weekly summary if interesting things happening in DataFusion over the last week. Note this is not a complete list. Please feel free to comment on this ticket about things that I may have missed or you think should get wider attention by the community
Loosely inspired by https://this-week-in-rust.org/
Andrew's TLDR:
We are preparing for the 43.0.0 release and I am personally pretty excited about:
- [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench #12821
- [Epic] Unify
WindowFunctionInterface (remove built in list ofBuiltInWindowFunctions) #8709 - [EPIC] Automatically generate all function documentation from code #12740
Upcoming Releases
- Release DataFusion 42.1.0 #12813 (thanks @Xuanwo and @matthewmturner)
- Release DataFusion 43.0.0 #12470 (thanks @andygrove)
Project Happenings
- Integrate sqlparser into DataFusion governance: [DISCUSSION]: move sqlparser to Apache (DataFusion) governance datafusion-sqlparser-rs#1294 (comment)
Highlights from last week(s):
(I am sorry if I missed you -- please add a note to this ticket with anything you would like to add)
- @dmitrybugakov started managing extension functions in https://github.com/datafusion-contrib/datafusion-functions-extra
- @eejbyfeldt is doing some great work on grouping sets such as feat: Implement grouping function using grouping id #12704
- @tokoko, @Blizzara @vbarua and @westonpace continue to mature the substrait support such as fix(substrait): remove optimize calls from substrait consumer #12800
- Along with @devanbenz and @Rachelint and @jayzhan211 I implemented Implement special min/max accumulator for Strings and Binary (10% faster for Clickbench Q28) #12792 to help clickbench queries
- @timesaucer made a beautiful macro Macro for creating record batch from literal slice #12846
- @Rachelint made a beautiful aggregation fuzzing proect
- @jonahgao continues to make our SQL handling more beautiful and correct (Remove unused dependencies and features #12808, Fix clippy error on wasmtest #12844, etc)
Performance
- [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench #12821 (thanks to the epic work of @Rachelint, @goldmedal, @jayzhan211, @Dandandan @XiangpengHao and others, we are quite close)
- [EPIC] Improvements to GroupColumn multi-column aggregation performance #12680 (kudos to @jayzhan211 and @Rachelint)
- @simonvandel and @tlm365 Optimize
signumfunction (3-25x faster) #12890
Quality
- Aggregation fuzz testing #12114 (already found several bugs -- thanks @Rachelint)
Extensibility
- Very close to finishing [Epic] Unify
WindowFunctionInterface (remove built in list ofBuiltInWindowFunctions) #8709 (thanks @jcsherin @jatin510 @hailelagi) - @Omega359 started [EPIC] Automatically generate all function documentation from code #12740 and we are making great progress thanks to @jonathanc-n @juroberttyb and others
- @notfillipo and @findepi are working to better separate logical and physical types [EPIC] Decouple logical from physical types #12622
Features
Interesting discussions underway:
- 2024 Q3-Q4 Roadmap? #11442
- [DISCUSS] Document criteria for adding new features / what belongs in core DataFusion (e.g. sql syntax, functions, etc) #12357
Community
- Weekly Call
- Slack/Discord: info links
Upcoming meetups:
- Oct 14 Seattle: https://lu.ma/tnwl866b @phillipleblanc @likekim
- Dec 18 Chicago: https://lu.ma/eq5myc5i @adriangb @timsaucer
Background:
I got some great feedback from @timsaucer, @findepi and @andygrove on the DataFusion weekly call that having a weekly summary like #12494 was helpful. I will therefore try to write up one each week
shanesveller, SamSynnada, austin362667 and djandersonRachelint, Omega359, XiangpengHao, goldmedal, devanbenz and 15 more
Metadata
Metadata
Assignees
Labels
No labels