Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust Interval definition incorrect #5654

Closed
Tracked by #5688
david542542 opened this issue Apr 16, 2024 · 15 comments · Fixed by #5769
Closed
Tracked by #5688

Rust Interval definition incorrect #5654

david542542 opened this issue Apr 16, 2024 · 15 comments · Fixed by #5769
Assignees
Labels
arrow Changes to the arrow crate bug parquet Changes to the parquet crate

Comments

@david542542
Copy link

david542542 commented Apr 16, 2024

It seems the Rust INTERVAL definition is incorrect. Please see duckdb/duckdb-wasm#1696 (comment) for more details on the bug and where it occurs in the code.

Particularly this definition:

```text
┌──────────────────────────────┬─────────────┬──────────────┐
│ Nanos │ Days │ Months │
│ (64 bits) │ (32 bits) │ (32 bits) │
└──────────────────────────────┴─────────────┴──────────────┘

@tustvold
Copy link
Contributor

tustvold commented Apr 16, 2024

@pitrou do you know if the integration tests cover the interval types?

@avantgardnerio as the author of #2235 which appears to have switched these round, perhaps you could weigh in here

tustvold added a commit to tustvold/arrow-rs that referenced this issue Apr 16, 2024
@avantgardnerio
Copy link
Contributor

@avantgardnerio as the author of #2235 which appears to have switched these round, perhaps you could weigh in here

Honestly, I don't recall the reasoning. I know at the time I was working with JDBC interop, so I assume it was to make that work.

I agree we should do whatever arrow-cpp does, as that seems to be the reference implementation, to the extent that there is one.

@pitrou
Copy link
Member

pitrou commented Apr 17, 2024

@pitrou do you know if the integration tests cover the interval types?

Yes, they do.
https://github.com/apache/arrow/blob/7003e90c113cec58a620deddf56af71eb305af2a/dev/archery/archery/integration/datagen.py#L1653-L1669

@tustvold
Copy link
Contributor

One way to avoid this leading to subtle downstream breakage might be to do something along the lines of #3125

@alamb
Copy link
Contributor

alamb commented Apr 24, 2024

Typing down some things that @tustvold told me today:

  1. we think the tests ensure the data round trips but does not actually ensure both ends interpret them in the same way (aka they don't cover this particular bug)
  2. He plans to work on this ticket, likely next week(ish)
  3. In order to avoid subtle breaking changes (due to the same type i128 being interpreted differently by different versions) he plans to do something like Structured Interval Native Type #3125

@alamb alamb assigned alamb and tustvold and unassigned alamb Apr 24, 2024
@alamb
Copy link
Contributor

alamb commented Apr 29, 2024

@tustvold reports is that it is unlikely that this will be done this week due to other higher priorities

@tustvold
Copy link
Contributor

It turns out the reason we didn't detect this with our integration tests, is that they were actually using different, correct, logic for constructing the expected values from the JSON data - #5765

@tustvold
Copy link
Contributor

@alamb and I had a good discussion about the pros/cons of switching to a structured representation (#3125), vs simply flipping the order (#5656). The broad points were:

As such the decision was that a custom struct should be used where operations, such as arithmetic, on the integral representation have no semantic meaning. Specifically this means types such as timestamps, decimals, and even IntervalYearMonth should remain using integral types as their native representation, as operations on the integral representation are well-defined. However, types like IntervalMonthDayNano or ByteView (#5736) should switch to a non-integral representation.

@david542542
Copy link
Author

That's great to know. Any idea on the ETA on this fix?

@alamb
Copy link
Contributor

alamb commented May 13, 2024

We are holding the 52.0.0 release for this. I expect it will be available in about 2 weeks but no promises, etc

@alamb
Copy link
Contributor

alamb commented May 17, 2024

Proposed PR: #5769

tustvold added a commit that referenced this issue May 20, 2024
…ime` (#3125) (#5654) (#5769)

* Structured interval type (#3125) (#5654)

* Update integration-test

* Fix 32-bit build

* Review feedback
@david542542
Copy link
Author

david542542 commented May 30, 2024

@alamb any idea on when 52.0.0 will be deployed?

@alamb
Copy link
Contributor

alamb commented May 31, 2024

@alamb any idea on when 52.0.0 will be deployed?

The plan as I understand it is next week

@tustvold tustvold added the parquet Changes to the parquet crate label Jun 3, 2024
@tustvold
Copy link
Contributor

tustvold commented Jun 3, 2024

label_issue.py automatically added labels {'parquet'} from #5769

@tustvold tustvold added the arrow Changes to the arrow crate label Jun 3, 2024
@tustvold
Copy link
Contributor

tustvold commented Jun 3, 2024

label_issue.py automatically added labels {'arrow'} from #5769

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug parquet Changes to the parquet crate
Projects
None yet
5 participants