Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change input for to_timestamp function to be seconds rather than nanoseconds, add to_timestamp_nanos #7844

Merged
merged 4 commits into from
Oct 26, 2023

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Oct 17, 2023

Which issue does this PR close?

Closes #7802.
Closes #2979

Rationale for this change

to_timestamp() builtin function had an input in nanoseconds whereas other SQL engines has input in seconds and as the result the function output gives unexpected result.

What changes are included in this PR?

Changed signature to to_timestamp function from nanoseconds to seconds, also introduced to_timestamp_nano to create timestamp value from nanoseconds

Are these changes tested?

Yes

Are there any user-facing changes?

The user might expect that to_timestamp function will return other values if the input still in nanoseconds

@github-actions github-actions bot added sql logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core datafusion crate sqllogictest labels Oct 17, 2023
Copy link
Contributor

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @comphead

@alamb alamb added the api change Changes the API exposed to users of the crate label Oct 17, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense and aligns DataFusion with postgres. Thank you @comphead and @Weijun-H

https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TABLE

@@ -1534,10 +1537,10 @@ mod tests {
let expr = cast_to_int64_expr(now_expr()) + lit(100_i64);
test_evaluate_with_start_time(expr, lit(ts_nanos + 100), &time);

// CAST(now() as int64) < cast(to_timestamp(...) as int64) + 50000_i64 ---> true
// CAST(now() as int64) < cast(to_timestamp(...) as int64) + 50000_i64 ---> false
let expr = cast_to_int64_expr(now_expr())
.lt(cast_to_int64_expr(to_timestamp_expr(ts_string)) + lit(50000i64));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can adjust the literal so it still evaluates to true so that we don't lose coverage 🤔

@alamb
Copy link
Contributor

alamb commented Oct 18, 2023

cc @gruuya @Dandandan and @waitingkuo and @waynexia as other people I know who might be using the timestamp functionality of DataFusion and be affected by this change

@@ -1390,10 +1391,10 @@ extract(field FROM source)

### `to_timestamp`

Converts a value to RFC3339 nanosecond timestamp format (`YYYY-MM-DDT00:00:00.000000000Z`).
Converts a value to RFC3339 second timestamp format (`YYYY-MM-DDT00:00:00Z`).
Supports timestamp, integer, and unsigned integer types as input.
Integers and unsigned integers are parsed as Unix nanosecond timestamps and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documentation still accurate? Integers are now parsed as "unix second timestamps"?. Unless they aren't in which case a little bit more expanation of the behaviour for integers would be helpful.

@waitingkuo
Copy link
Contributor

Thank you @comphead @alamb

This looks great, this closes #2979 as well.

To align with Postgrseql, perhaps we could fire another ticket for supporting more inputs like floating number & string.
i.e. in postgresql

willy=# select to_timestamp('1.1');
       to_timestamp       
--------------------------
 1970-01-01 08:00:01.1+08
(1 row)

willy=# select to_timestamp(1.1);
       to_timestamp       
--------------------------
 1970-01-01 08:00:01.1+08

@alamb alamb changed the title Change input for to_timestamp function Change input for to_timestamp function to be seconds rather than nanoseconds, add to_timestamp_nano Oct 19, 2023
@alamb alamb changed the title Change input for to_timestamp function to be seconds rather than nanoseconds, add to_timestamp_nano Change input for to_timestamp function to be seconds rather than nanoseconds, add to_timestamp_nanos Oct 19, 2023
@alamb
Copy link
Contributor

alamb commented Oct 19, 2023

To align with Postgrseql, perhaps we could fire another ticket for supporting more inputs like floating number & string.

Great idea @waitingkuo. This is now tracked by two tickets:

@@ -745,9 +748,10 @@ impl BuiltinScalarFunction {
return plan_err!("The to_hex function can only accept integers.");
}
}),
BuiltinScalarFunction::ToTimestamp => Ok(Timestamp(Nanosecond, None)),
BuiltinScalarFunction::ToTimestamp => Ok(Timestamp(Second, None)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while writing up #7868 it occured to me this signature prevents implementing something like to_timestamp(1.234) reasonably, as it can't represent sub-second precison

What would you think about leaving the return type the same (Timestamp(Nanosecond, None)) so that we can add support for floating point argments as a follow on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return Timestamp(Nanosecond, None) seems reasonable to me since most of the time related functions in datafusion return nanosecond based timestamp i.e. timestamp '2000-01-01T00:00:00', now(), ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In PG

select pg_typeof(to_timestamp(1.234))

the return type is timestamp with time zone which has minimum resolution as a microsecond

https://www.postgresql.org/docs/current/datatype-datetime.html

In light of this, having the return type as nanos or micros makes more sense for me.
What do you think is better? microseconds like in PG or nanos like it was in DF before the PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any preference now that we have coercion rules that can automatically coerce between timestamp resolution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I'll do micros as a return type as it is in PG, will update the PR soon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set return type to nanos as it was before.

Comment on lines +77 to +83
Ok(DataType::Int64) => |col_values: &[ColumnarValue]| {
cast_column(
&col_values[0],
&DataType::Timestamp(TimeUnit::Second, None),
None,
)
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still returns Timestamp(second, None) while the input is Int64

select arrow_typeof(to_timestamp(0));
+--------------------------------------+
| arrow_typeof(to_timestamp(Int64(0))) |
+--------------------------------------+
| Timestamp(Second, None)              |
+--------------------------------------+
1 row in set. Query took 0.003 seconds.

@@ -748,6 +751,7 @@ impl BuiltinScalarFunction {
BuiltinScalarFunction::ToTimestamp => Ok(Timestamp(Nanosecond, None)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb @comphead
any idea why the function could return Timestamp(Second, None) even though we set the return type here as Timestamp(Nanosecond, None)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waitingkuo for raising it.
I suppose for bunch of functions the return type gets changed in create_physical_expr because functions.rs is on physical expr layer and rewrites the return type defined on logical layer.

For ints I would think the second is probably better return type
If you try to present 0004-01-01 year in nanoseconds it will require higher value than i64::MAX/MIN
https://doc.rust-lang.org/src/core/num/mod.rs.html#366

make_type!(
    TimestampNanosecondType,
    i64,
    DataType::Timestamp(TimeUnit::Nanosecond, None),
    "A timestamp nanosecond type with an optional timezone."
);

But you are right we need to clean up builtin functions configuration in multiple places.

@@ -748,6 +751,7 @@ impl BuiltinScalarFunction {
BuiltinScalarFunction::ToTimestamp => Ok(Timestamp(Nanosecond, None)),
BuiltinScalarFunction::ToTimestampMillis => Ok(Timestamp(Millisecond, None)),
BuiltinScalarFunction::ToTimestampMicros => Ok(Timestamp(Microsecond, None)),
BuiltinScalarFunction::ToTimestampNanos => Ok(Timestamp(Nanosecond, None)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could map to_timestamp to ToTimestampSeconds as well durig planing.

@comphead
Copy link
Contributor Author

@alamb should we merge the PR?

@alamb
Copy link
Contributor

alamb commented Oct 25, 2023

I think so @comphead -- it has had plenty of review by now and no objections.

@andygrove andygrove merged commit 12b473b into apache:main Oct 26, 2023
23 checks passed
Dandandan added a commit to coralogix/arrow-datafusion that referenced this pull request Nov 9, 2023
* Cleanup logical optimizer rules.  (apache#7919)

* Initial commit

* Address todos

* Update comments

* Simplifications

* Minor simplifications

* Address reviews

* Add TableScan constructor

* Minor changes

* make try_new_with_schema method of Aggregate private

* Use projection try_new instead of try_new_schema

* Simplifications, add comment

* Review changes

* Improve comments

* Move get_wider_type to type_coercion module

* Clean up type coercion file

---------

Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Parallelize Serialization of Columns within Parquet RowGroups (apache#7655)

* merge main

* fixes and cmt

* review comments, tuning parameters, updating docs

* cargo fmt

* reduce default buffer size to 2 and update docs

* feat: Use bloom filter when reading parquet to skip row groups  (apache#7821)

* feat: implement read bloom filter support

* test: add unit test for read bloom filter

* Simplify bloom filter application

* test: add unit test for bloom filter with sql `in`

* fix: imrpove bloom filter match express

* fix: add more test for bloom filter

* ci: rollback dependences

* ci: merge main branch

* fix: unit tests for bloom filter

* ci: cargo clippy

* ci: cargo clippy

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix: don't push down volatile predicates in projection (apache#7909)

* fix: don't push down volatile predicates in projection

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* add suggestions

* fix

* fix doc

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Jonah Gao <jonahgaox@gmail.com>

* Add `parquet` feature flag, enabled by default, and make parquet conditional  (apache#7745)

* Make parquet an option by adding multiple cfg attributes without significant code changes.

* Extract parquet logic into submodule from execution::context

* Extract parquet logic into submodule from datafusion_core::dataframe

* Extract more logic into submodule from execution::context

* Move tests from execution::context

* Rename submodules

* [MINOR]: Simplify enforce_distribution, minor changes (apache#7924)

* Initial commit

* Simplifications

* Cleanup imports

* Review

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Add simple window query to sqllogictest (apache#7928)

* ci: upgrade node to version 20 (apache#7918)

* Change input for `to_timestamp` function to be seconds rather than nanoseconds, add `to_timestamp_nanos` (apache#7844)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Minor: Document `parquet` crate feature (apache#7927)

* Minor: reduce some #cfg(feature = "parquet") (apache#7929)

* Minor: reduce use of cfg(parquet) in tests (apache#7930)

* Fix CI failures on `to_timestamp()` calls (apache#7941)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Fix CI `to_timestamp()` failed

* Update datafusion/expr/src/built_in_function.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* fix typo

* fix

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* minor: add a datatype casting for the updated value (apache#7922)

* minor: cast the updated value to the data type of target column

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* fix tests

---------

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* fix (apache#7946)

* Add simple exclude all columns test to sqllogictest (apache#7945)

* Add simple exclude all columns test to sqllogictest

* Add more exclude test cases

* Support Partitioning Data by Dictionary Encoded String Array Types (apache#7896)

* support dictionary encoded string columns for partition cols

* remove debug prints

* cargo fmt

* generic dictionary cast and dict encoded test

* updates from review

* force retry checks

* try checks again

* Minor: Remove array() in array_expression (apache#7961)

* remove array

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup others

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* clippy

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup cast

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fmt

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup cast

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: simplify update code (apache#7943)

* Add some initial content about creating logical plans (apache#7952)

* Minor: Change from `&mut SessionContext` to `&SessionContext` in substrait (apache#7965)

* Lower &mut SessionContext in substrait

* rm mut ctx in tests

* Fix crate READMEs (apache#7964)

* Minor: Improve `HashJoinExec` documentation (apache#7953)

* Minor: Improve `HashJoinExec` documentation

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* chore: clean useless clone baesd on clippy (apache#7973)

* Add README.md to `core`, `execution` and `physical-plan` crates (apache#7970)

* Add README.md to `core`, `execution` and `physical-plan` crates

* prettier

* Update datafusion/physical-plan/README.md

* Update datafusion/wasmtest/README.md

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>

* Move source repartitioning into `ExecutionPlan::repartition` (apache#7936)

* Move source repartitioning into ExecutionPlan::repartition

* cleanup

* update test

* update test

* refine docs

* fix merge

* minor: fix broken links in README.md (apache#7986)

* minor: fix broken links in README.md

* fix proto link

* Minor: Upate the `sqllogictest` crate README (apache#7971)

* Minor: Upate the sqllogictest crate README

* prettier

* Apply suggestions from code review

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>

---------

Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>

* Improve MemoryCatalogProvider default impl block placement (apache#7975)

* Fix `ScalarValue` handling of NULL values for ListArray (apache#7969)

* Fix try_from_array data type for NULL value in ListArray

* Fix

* Explicitly assert the datatype

* For review

* Refactor of Ordering and Prunability Traversals and States (apache#7985)

* simplify ExprOrdering

* Comment improvements

* Move map/transform comment up

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Keep output as scalar for scalar function if all inputs are scalar (apache#7967)

* Keep output as scalar for scalar function if all inputs are scalar

* Add end-to-end tests

* Fix crate READMEs for core, execution, physical-plan (apache#7990)

* Update sqlparser requirement from 0.38.0 to 0.39.0 (apache#7983)

* chore: Update sqlparser requirement from 0.38.0 to 0.39.0

* support FILTER Aggregates

* Fix panic in multiple distinct aggregates by fixing `ScalarValue::new_list` (apache#7989)

* Fix panic in multiple distinct aggregates by fixing ScalarValue::new_list

* Update datafusion/common/src/scalar.rs

Co-authored-by: Daniël Heres <danielheres@gmail.com>

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>

* MemoryReservation exposes MemoryConsumer (apache#8000)

... as a getter method.

* fix: generate logical plan for `UPDATE SET FROM` statement (apache#7984)

* Create temporary files for reading or writing (apache#8005)

* Create temporary files for reading or writing

* nit

* addr comment

---------

Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>

* doc: minor fix to SortExec::with_fetch comment (apache#8011)

* Fix: dataframe_subquery example Optimizer rule `common_sub_expression_eliminate` failed (apache#8016)

* Fix: Optimizer rule 'common_sub_expression_eliminate' failed

* nit

* nit

* nit

---------

Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>

* Percent Decode URL Paths (apache#8009) (apache#8012)

* Treat ListingTableUrl as URL-encoded (apache#8009)

* Update lockfile

* Review feedback

* Minor: Extract common deps into workspace (apache#7982)

* Improve datafusion-*

* More common crates

* Extract async-trait

* Extract more

* Fix cli

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* minor: change some plan_err to exec_err (apache#7996)

* minor: change some plan_err to exec_err

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change unreachable code to internal error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Minor: error on unsupported RESPECT NULLs syntax (apache#7998)

* Minor: error on unsupported RESPECT NULLs syntax

* fix clippy

* Update datafusion/sql/tests/sql_integration.rs

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* GroupedHashAggregateStream breaks spill batch (apache#8004)

... into smaller chunks to decrease memory required for merging.

* Minor: Add implementation examples to ExecutionPlan::execute (apache#8013)

* Add implementation examples to ExecutionPlan::execute

* Review feedback

* address comment (apache#7993)

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* GroupedHashAggregateStream should register spillable consumer (apache#8002)

* fix: single_distinct_aggretation_to_group_by fail (apache#7997)

* fix: single_distinct_aggretation_to_group_by faile

* fix

* move test to groupby.slt

* Read only enough bytes to infer Arrow IPC file schema via stream (apache#7962)

* Read only enough bytes to infer Arrow IPC file schema via stream

* Error checking for collect bytes func

* Update datafusion/core/src/datasource/file_format/arrow.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* Minor: remove a strange char (apache#8030)

* Minor: Improve documentation for Filter Pushdown (apache#8023)

* Minor: Improve documentation for Fulter Pushdown

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: jakevin <jakevingoo@gmail.com>

* Apply suggestions from code review

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

---------

Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>

* Minor: Improve `ExecutionPlan` documentation (apache#8019)

* Minor: Improve `ExecutionPlan` documentation

* Add link to Partitioning

* fix: clippy warnings from nightly rust 1.75 (apache#8025)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Minor: Avoid recomputing compute_array_ndims in align_array_dimensions (apache#7963)

* Refactor align_array_dimensions

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* address comment

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* remove unwrap

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* address comment

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fix rebase

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: fix doc check (apache#8037)

* Minor: remove uncessary #cfg test (apache#8036)

* Minor: remove uncessary #cfg test

* fmt

* Update datafusion/core/src/datasource/file_format/arrow.rs

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* Minor: Improve documentation for  `PartitionStream` and `StreamingTableExec` (apache#8035)

* Minor: Improve documentation for  `PartitionStream` and `StreamingTableExec`

* fmt

* fmt

* Combine Equivalence and Ordering equivalence to simplify state (apache#8006)

* combine equivalence and ordering equivalence

* Remove EquivalenceProperties struct

* Minor changes

* all tests pass

* Refactor oeq

* Simplifications

* Resolve linter errors

* Minor changes

* Minor changes

* Add new tests

* Simplifications window mode selection

* Simplifications

* Use set_satisfy api

* Use utils for aggregate

* Minor changes

* Minor changes

* Minor changes

* All tests pass

* Simplifications

* Simplifications

* Minor changes

* Simplifications

* All tests pass, fix bug

* Remove unnecessary code

* Simplifications

* Minor changes

* Simplifications

* Move oeq join to methods

* Simplifications

* Remove redundant code

* Minor changes

* Minor changes

* Simplifications

* Simplifications

* Simplifications

* Move window to util from method, simplifications

* Simplifications

* Propagate meet in the union

* Simplifications

* Minor changes, rename

* Address berkay reviews

* Simplifications

* Add new buggy test

* Add data test for sort requirement

* Add experimental check

* Add random test

* Minor changes

* Random test gives error

* Fix missing test case

* Minor changes

* Minor changes

* Simplifications

* Minor changes

* Add new test case

* Minor changes

* Address reviews

* Minor changes

* Increase coverage of random tests

* Remove redundant code

* Simplifications

* Simplifications

* Refactor on tests

* Solving clippy errors

* prune_lex improvements

* Fix failing tests

* Update get_finer and get_meet

* Fix window lex ordering implementation

* Buggy state

* Do not use output ordering in the aggregate

* Add union test

* Update comment

* Fix bug, when batch_size is small

* Review Part 1

* Review Part 2

* Change union meet implementation

* Update comments

* Remove redundant check

* Simplify project out_expr function

* Remove Option<Vec<_>> API.

* Do not use project_out_expr

* Simplifications

* Review Part 3

* Review Part 4

* Review Part 5

* Review Part 6

* Review Part 7

* Review Part 8

* Update comments

* Add new unit tests, simplifications

* Resolve linter errors

* Simplify test codes

* Review Part 9

* Add unit tests for remove_redundant entries

* Simplifications

* Review Part 10

* Fix test

* Add new test case, fix implementation

* Review Part 11

* Review Part 12

* Update comments

* Review Part 13

* Review Part 14

* Review Part 15

* Review Part 16

* Review Part 17

* Review Part 18

* Review Part 19

* Review Part 20

* Review Part 21

* Review Part 22

* Review Part 23

* Review Part 24

* Do not construct idx and sort_expr unnecessarily, Update comments, Union meet single entry

* Review Part 25

* Review Part 26

* Name Changes, comment updates

* Review Part 27

* Add issue links

* Address reviews

* Fix failing test

* Update comments

* SortPreservingMerge, SortPreservingRepartition only preserves given expression ordering among input ordering equivalences

---------

Co-authored-by: metesynnada <100111937+metesynnada@users.noreply.github.com>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

* Encapsulate `ProjectionMapping` as a struct (apache#8033)

* Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, ... (apache#8040)

* Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, etc

* prettier

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: comphead <comphead@users.noreply.github.com>

---------

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Improve comments for `PartitionSearchMode` struct (apache#8047)

* Improve comments

* Make comments partition/group agnostic

* General approach for Array replace (apache#8050)

* checkpoint

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* optimize non-list

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* replace list ver

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* rename

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: Remove the irrelevant note from the Expression API doc (apache#8053)

* Minor: Add more documentation about Partitioning (apache#8022)

* Minor: Add more documentation about Partitioning

* fix typo

* Apply suggestions from code review

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Add more diagrams, improve text

* undo unintended changes

* undo unintended changes

* fix links

* Try and clarify

---------

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Minor: improve documentation for IsNotNull, DISTINCT, etc (apache#8052)

* Minor: improve documentation for IsNotNull, DISTINCT, etc

* fix

* Prepare 33.0.0 Release (apache#8057)

* changelog

* update version

* update changelog

* Minor: improve error message by adding types to message (apache#8065)

* Minor: improve error message

* add test

* Minor: Remove redundant BuiltinScalarFunction::supports_zero_argument() (apache#8059)

* deprecate BuiltinScalarFunction::supports_zero_argument()

* unify old supports_zero_argument() impl

* Add example to ci (apache#8060)

* feat: add example to ci

* nit

* addr comments

---------

Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>

* Update substrait requirement from 0.18.0 to 0.19.0 (apache#8076)

Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version.
- [Release notes](https://github.com/substrait-io/substrait-rs/releases)
- [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md)
- [Commits](substrait-io/substrait-rs@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: substrait
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix incorrect results in COUNT(*) queries with LIMIT (apache#8049)

Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>

* feat: Support determining extensions from names like `foo.parquet.snappy` as well as `foo.parquet` (apache#7972)

* feat: read files based on the file extention

* fix: some the file extension might be started with . and some not

* fix: rename extention to extension

* chore: use exec_err

* chore: rename extention to extension

* chore: rename extention to extension

* chore: simplify the code

* fix: check table is empty

* ci: fix test

* fix: add err info

* refactor: extract the logic to infer_types

* fix: add tests for different extensions

* fix: ci clippy

* fix: add more tests

* fix: simplify the logic

* fix: ci

* Use FairSpillPool for TaskContext with spillable config (apache#8072)

* Minor: Improve HashJoinStream docstrings (apache#8070)

* Minor: Improve HashJoinStream docstrings

* fix comments

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <comphead@users.noreply.github.com>

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <comphead@users.noreply.github.com>

---------

Co-authored-by: Daniël Heres <danielheres@gmail.com>
Co-authored-by: comphead <comphead@users.noreply.github.com>

* Fixing broken link (apache#8085)

* Fixing broken link

* Update docs/source/contributor-guide/index.md

Thanks for spotting this as well

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

---------

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>

* fix: DataFusion suggests invalid functions (apache#8083)

* fix: DataFusion suggests invalid functions

* update test

* Add test for BuiltInWindowFunction

* Replace macro with function for  `array_repeat` (apache#8071)

* General array repeat

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* cleanup

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* done

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* remove test

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* add comment

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* fm

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

* Minor: remove unnecessary projection in `single_distinct_to_group_by` rule (apache#8061)

* Minor: remove unnecessary projection

* fix ci

* minor: Remove duplicate version numbers for arrow, object_store, and parquet dependencies (apache#8095)

* remove duplicate version numbers for arrow, object_store, and parquet dependencies

* cargo update

* use default features in parquet crate

* disable default parquet features in wasmtest

* fix: add match encode/decode  scalar function type (apache#8089)

* feat: Protobuf serde for Json file sink (apache#8062)

* Protobuf serde for Json file sink

* Fix tests

* Fix test

* Minor: use `Expr::alias` in a few places to make the code more concise (apache#8097)

* Minor: Cleanup BuiltinScalarFunction::return_type() (apache#8088)

* Expose metrics from FileSinkExec impl of ExecutionPlan

---------

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Mustafa Akur <106137913+mustafasrepo@users.noreply.github.com>
Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai>
Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Co-authored-by: Devin D'Angelo <devinjdangelo@gmail.com>
Co-authored-by: Hengfei Yang <hengfei.yang@gmail.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Huaijin <haohuaijin@gmail.com>
Co-authored-by: Jonah Gao <jonahgaox@gmail.com>
Co-authored-by: Chih Wang <ongchi@users.noreply.github.com>
Co-authored-by: Jeffrey <22608443+Jefffrey@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: comphead <comphead@users.noreply.github.com>
Co-authored-by: Alex Huang <huangweijun1001@gmail.com>
Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: yi wang <48236141+my-vegetable-has-exploded@users.noreply.github.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: 张林伟 <lewiszlw520@gmail.com>
Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
Co-authored-by: Marko Milenković <milenkovicm@users.noreply.github.com>
Co-authored-by: jokercurry <982458633@qq.com>
Co-authored-by: zhongjingxiong <zhongjingxiong@bytedance.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: metesynnada <100111937+metesynnada@users.noreply.github.com>
Co-authored-by: Yongting You <2010youy01@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mark Sirek <mwsirek@gmail.com>
Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
Co-authored-by: Edmondo Porcu <edmondo.porcu@gmail.com>
Co-authored-by: Syleechan <38198463+Syleechan@users.noreply.github.com>
Co-authored-by: Dan Harris <dan@coralogix.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core datafusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sqllogictest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_timestamp for negative values gives wrong result to_timestamp timeunit to be consistent with postgresql's
7 participants