-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Insights: apache/datafusion
Overview
Could not load contribution data
Please try again later
30 Pull requests merged by 21 people
-
Automatically split large single RecordBatches in
MemorySource
into smaller batches#16734 merged
Jul 17, 2025 -
Restore custom SchemaAdapter functionality for Parquet
#16791 merged
Jul 17, 2025 -
fix tests in page_pruning when filter pushdown is enabled by default
#16794 merged
Jul 16, 2025 -
fix: add
order_requirement
&dist_requirement
toOutputRequirementExec
display#16726 merged
Jul 16, 2025 -
Support min/max aggregates for FixedSizeBinary type
#16765 merged
Jul 15, 2025 -
fix: return NULL if any of the param to make_date is NULL
#16759 merged
Jul 15, 2025 -
add filter to handle backtrace
#16752 merged
Jul 15, 2025 -
Add
clickbench_pushdown
benchmark#16731 merged
Jul 15, 2025 -
Per file filter evaluation
#15057 merged
Jul 15, 2025 -
Remove fixed version from MSRV check
#16786 merged
Jul 15, 2025 -
Refactor BinaryTypeCoercer to Handle Null Coercion Early and Avoid Redundant Checks
#16768 merged
Jul 15, 2025 -
Auto start testcontainers for
datafusion-cli
#16644 merged
Jul 14, 2025 -
Add serialization/deserialization and round-trip tests for all tpc-h queries
#16742 merged
Jul 14, 2025 -
limit intermediate batch size in nested_loop_join
#16443 merged
Jul 14, 2025 -
feat: Add a configuration to make parquet encryption optional
#16649 merged
Jul 14, 2025 -
perf: Optimize hash joins with an empty build side
#16716 merged
Jul 14, 2025 -
chore(deps): bump chrono-tz from 0.10.3 to 0.10.4
#16769 merged
Jul 14, 2025 -
Support Type Coercion for NULL in Binary Arithmetic Expressions
#16761 merged
Jul 14, 2025 -
feat: expose intersect distinct/except distinct in dataframe api
#16578 merged
Jul 13, 2025 -
chore: Make
GroupValues
and APIs onPhysicalGroupBy
aggregation APIs public#16733 merged
Jul 13, 2025 -
ensure MemTable has at least one partition
#16754 merged
Jul 12, 2025 -
Fix in list round trip in df proto
#16744 merged
Jul 12, 2025 -
Improve Ci cache
#16709 merged
Jul 12, 2025 -
Improve dictionary null handling in hashing and expand aggregate test coverage for nulls
#16466 merged
Jul 12, 2025 -
Perform type coercion for corr aggregate function
#15776 merged
Jul 11, 2025 -
Refactor filter pushdown APIs to enable joins to pass through filters
#16732 merged
Jul 11, 2025 -
Remove parquet_filter and parquet
sort
benchmarks#16730 merged
Jul 11, 2025 -
chore: make more clarity for internal errors
#16741 merged
Jul 11, 2025 -
chore(deps): bump clap from 4.5.40 to 4.5.41
#16735 merged
Jul 10, 2025
26 Pull requests opened by 23 people
-
feat: support multi-threaded writing of Parquet files with modular encryption
#16738 opened
Jul 10, 2025 -
Draft: Test fast gc for sort string view
#16739 opened
Jul 10, 2025 -
Benchmark for char expression
#16743 opened
Jul 10, 2025 -
Fix `next_up` and `next_down` behavior for zero float values
#16745 opened
Jul 11, 2025 -
chore(deps): bump sysinfo from 0.35.2 to 0.36.0
#16747 opened
Jul 11, 2025 -
Use tokio::task::coop::poll_proceed by default in CooperativeStream
#16748 opened
Jul 11, 2025 -
feat(datafusion-proto): allow TableSource to be serialized
#16750 opened
Jul 11, 2025 -
Perf: Optimize performance of ByteViewGroupValueBuilder on batches with inlined views
#16751 opened
Jul 11, 2025 -
WIP: Update `object_store` 0.12.3
#16753 opened
Jul 11, 2025 -
48.0.1
#16755 opened
Jul 12, 2025 -
feat: improve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)`
#16762 opened
Jul 13, 2025 -
feat: change Expr OuterReferenceColumn to Box type for reducing expr struct size
#16771 opened
Jul 14, 2025 -
feat: Dynamic Parquet encryption and decryption properties
#16779 opened
Jul 15, 2025 -
feat: [datafusion-spark] Implement `next_day` function
#16780 opened
Jul 15, 2025 -
Implement equals for stateful functions
#16781 opened
Jul 15, 2025 -
Refactor binary.rs tests into modular submodules under `binary/tests`
#16782 opened
Jul 15, 2025 -
fix: support nullable columns in pre-sorted data sources
#16783 opened
Jul 15, 2025 -
Fix: Preserve sorting for the COPY TO plan
#16785 opened
Jul 15, 2025 -
Add reproducing test cases for stackoverflows
#16787 opened
Jul 15, 2025 -
cache generation of dictionary keys and null arrays for ScalarValue
#16789 opened
Jul 15, 2025 -
fix: skip predicates on struct unnest in PushDownFilter
#16790 opened
Jul 15, 2025 -
Add support for Float16 type in substrait
#16793 opened
Jul 15, 2025 -
CI: Fix slow join test
#16796 opened
Jul 16, 2025 -
Allow comparison between boolean and int values
#16798 opened
Jul 16, 2025 -
chore: add tests for out of bounds for NullArray
#16802 opened
Jul 16, 2025 -
Add example of custom file schema casting rules
#16803 opened
Jul 17, 2025
19 Issues closed by 5 people
-
Better parallelize large input batches (speed up dataframe access)
#16717 closed
Jul 17, 2025 -
Blog post for the DataFusion 48 release
#16757 closed
Jul 16, 2025 -
[Discussion]: show more info for `OutputRequirementExec` display
#16725 closed
Jul 16, 2025 -
FixedSizeBinary support in min/max accumulators
#16513 closed
Jul 15, 2025 -
Bug: `make_date(year, month, day)` reports error if one of the fileds is NULL
#16746 closed
Jul 15, 2025 -
Handle panic stacktrace in `datafusion-cli` tests
#16146 closed
Jul 15, 2025 -
Add a datafusion benchmark for `filter_pushdown`
#16729 closed
Jul 15, 2025 -
Simplify `signature()` Null Handling by Addressing at Function Entry
#16766 closed
Jul 15, 2025 -
Auto run docker containers needed for tests
#15092 closed
Jul 14, 2025 -
Blog Post for Accelerating Query Processing with Specialized Indexes
#16372 closed
Jul 14, 2025 -
Arithmetic expression on `Date` type with `Null` returns planning error (SQLancer)
#16760 closed
Jul 14, 2025 -
Decimal & UInt Binary operation giving wrong output
#16667 closed
Jul 13, 2025 -
Discussion: public some aggregate related function and struct
#16724 closed
Jul 13, 2025 -
Blog post for the DataFusion 47, 48, and 49 releases
#16347 closed
Jul 12, 2025 -
TPC-H Q16 fails during deserialization
#16665 closed
Jul 12, 2025 -
Update Fuzz tests to include Dict with null values
#16266 closed
Jul 12, 2025 -
Avoid explicit cast during execution in `corr` aggregate function
#13721 closed
Jul 11, 2025 -
Optimized spill file format
#14078 closed
Jul 11, 2025 -
Filter multiple columns from TopK using Lexicographical ordering
#15698 closed
Jul 10, 2025
22 Issues opened by 15 people
-
Integration tests are not being run
#16801 opened
Jul 16, 2025 -
Plan to replace `SchemaAdapter` with `PhysicalExprAdapter`
#16800 opened
Jul 16, 2025 -
Release DataFusion `50.0.0` (Aug/Sep 2025)
#16799 opened
Jul 16, 2025 -
Allow comparison netween booleans and integers
#16797 opened
Jul 16, 2025 -
count_all() aggregations cannot be aliased
#16795 opened
Jul 15, 2025 -
joins::nested_loop_join::tests::join_maintains_right_order tests take over 60 seconds
#16792 opened
Jul 15, 2025 -
Restructure core codepaths to prevent stack overflows
#16788 opened
Jul 15, 2025 -
CopyTo plan looses ordering requirement during physical plan optimization
#16784 opened
Jul 15, 2025 -
More flexible Parquet encryption configuration
#16778 opened
Jul 15, 2025 -
Make parquet_encryption a non-default feature
#16777 opened
Jul 15, 2025 -
[datafusion-spark] Implement Spark `date` function `next_day`
#16775 opened
Jul 14, 2025 -
[datafusion-spark] Implement Spark `datetime` function `last_day`
#16774 opened
Jul 14, 2025 -
Apply filters to `RecordBatch` instead of indices in nested loop join
#16773 opened
Jul 14, 2025 -
Only 4 tpc-h queries have matching physical plans before serialization and after deserialization
#16772 opened
Jul 14, 2025 -
Continue to reduce Expr struct size
#16770 opened
Jul 14, 2025 -
Restructure `binary.rs` Tests into Dedicated Modules
#16767 opened
Jul 14, 2025 -
Blog post for the DataFusion 49 release
#16758 opened
Jul 12, 2025 -
[BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion
#16756 opened
Jul 12, 2025 -
Serializing custom `TableSource` implementations fails
#16749 opened
Jul 11, 2025 -
Support uneven partition inputs HashJoinExec in Partitioned mode
#16740 opened
Jul 10, 2025 -
Support multi-threaded writing of encrypted Parquet files
#16737 opened
Jul 10, 2025 -
Filtering and counting afterwards causes overflow panic in interval arithmetics
#16736 opened
Jul 10, 2025
43 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic
#16681 commented on
Jul 16, 2025 • 3 new comments -
SQL logic tests for Run-End Encoded (REE)
#16715 commented on
Jul 14, 2025 • 2 new comments -
`DataSourceExec` is projecting/reading unused columns from Parquet files for recursive queries
#16684 commented on
Jul 10, 2025 • 0 new comments -
Deprecate `ExprSchema` functions
#15847 commented on
Jul 16, 2025 • 0 new comments -
fix: Allow ORDER BY aggregates not present in SELECT list
#15876 commented on
Jul 16, 2025 • 0 new comments -
PERF : modify SMJ shuffle file reader to skip validation
#15948 commented on
Jul 15, 2025 • 0 new comments -
Fix `datafusion-cli` memory leak by using `snmalloc`
#15963 commented on
Jul 15, 2025 • 0 new comments -
Demonstrate wrong statistics reported from parquet
#15977 commented on
Jul 15, 2025 • 0 new comments -
Optimize hash partitioning for cache friendliness
#15981 commented on
Jul 15, 2025 • 0 new comments -
Optimize char expression
#16076 commented on
Jul 12, 2025 • 0 new comments -
Optimize Hex Function
#16077 commented on
Jul 13, 2025 • 0 new comments -
feat: optimize and unparse grouping
#16161 commented on
Jul 15, 2025 • 0 new comments -
fix: Fix `EquivalenceClass` calculation for Union queries
#16185 commented on
Jul 15, 2025 • 0 new comments -
feature: sort by/cluster by/distribute by
#16310 commented on
Jul 15, 2025 • 0 new comments -
fix: The inconsistency between scalar and array on the cast decimal to timestamp
#16539 commented on
Jul 14, 2025 • 0 new comments -
[datafusion-spark] Implement Spark `luhn_check` function
#16580 commented on
Jul 15, 2025 • 0 new comments -
Support multiple ordered array_agg aggregations
#16625 commented on
Jul 16, 2025 • 0 new comments -
Partially implement MATCH_RECOGNIZE for Advanced Pattern Matching
#16685 commented on
Jul 16, 2025 • 0 new comments -
DRAFT: Update arrow/parquet to 56.0.0
#16690 commented on
Jul 15, 2025 • 0 new comments -
Enable Projection Pushdown Optimization for Recursive CTEs
#16696 commented on
Jul 16, 2025 • 0 new comments -
POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4)
#16711 commented on
Jul 16, 2025 • 0 new comments -
feat: Optimize `collect_left_input` processing
#16727 commented on
Jul 14, 2025 • 0 new comments -
Optimize performance of `ByteViewGroupValueBuilder` on batches with inlined views
#16330 commented on
Jul 11, 2025 • 0 new comments -
[EPIC] Improved Externalized / Spilling / Large than Memory Hash Aggregation
#13123 commented on
Jul 11, 2025 • 0 new comments -
[EPIC] A collection of tickets for improving sorting larger than memory datasets / spilling sorts
#15271 commented on
Jul 11, 2025 • 0 new comments -
[EPIC] A collection of items to improve developer / CI speed
#13813 commented on
Jul 11, 2025 • 0 new comments -
Bloom filters are unused for certain where clause patterns (improve LiteralGuarantee)
#16697 commented on
Jul 13, 2025 • 0 new comments -
Blog post about parquet vs custom file formats
#16149 commented on
Jul 13, 2025 • 0 new comments -
[DISCUSSION] DataFusion Road Map: Q3-Q4 2025
#15878 commented on
Jul 14, 2025 • 0 new comments -
[Epic] DataFusion Blogs
#14836 commented on
Jul 14, 2025 • 0 new comments -
Feature is not implemeneted: Unsupported cast with list of structs
#15338 commented on
Jul 14, 2025 • 0 new comments -
Unnested fields are not filterable when using subqueries.
#16695 commented on
Jul 14, 2025 • 0 new comments -
[EPIC] Complete `datafusion-spark` Spark Compatible Functions
#15914 commented on
Jul 14, 2025 • 0 new comments -
[datafusion-spark] Implement Spark `string` function `luhn_check`
#16612 commented on
Jul 15, 2025 • 0 new comments -
Move code in `user_defined_plan.rs` to the `extending-operators` doc
#15774 commented on
Jul 15, 2025 • 0 new comments -
Support `VARIANT` type for unstructured data
#16116 commented on
Jul 15, 2025 • 0 new comments -
[substrait] [sqllogictest] Unsupported cast type: FixedSizeList
#16278 commented on
Jul 16, 2025 • 0 new comments -
[substrait] [sqllogictest] Cannot convert <subquery> to Substrait
#16281 commented on
Jul 16, 2025 • 0 new comments -
Unnest struct expression can't be aliased
#12794 commented on
Jul 16, 2025 • 0 new comments -
Release DataFusion `49.0.0` (July 2025)
#16235 commented on
Jul 16, 2025 • 0 new comments -
Optimize the join operators
#16710 commented on
Jul 17, 2025 • 0 new comments -
Perf: Optimize in memory sort
#15380 commented on
Jul 16, 2025 • 0 new comments -
feat: Emit warning with Diagnostic when doing = Null
#15696 commented on
Jul 12, 2025 • 0 new comments