Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Implement Float16 logical type #36036

Closed
benibus opened this issue Jun 12, 2023 · 0 comments · Fixed by #36073
Closed

[C++][Parquet] Implement Float16 logical type #36036

benibus opened this issue Jun 12, 2023 · 0 comments · Fixed by #36073

Comments

@benibus
Copy link
Collaborator

benibus commented Jun 12, 2023

Describe the enhancement requested

There's currently an active proposal to add a half-float logical type to the Parquet spec here: apache/parquet-format#184. Following the discussion in the PR/issue, the general consensus was that we should go ahead and implement support for the type before moving forward with a vote.

To summarize, this would currently entail adding a float16 logical type (based on a 2-byte fixed-size binary physical type) and implementing read/write support. We also want to ensure that its ordering requirements are consistent with the native floating point types, min/max values are properly handled in Statistics, etc.

Component(s)

C++, Parquet

@benibus benibus self-assigned this Jun 12, 2023
anjakefala added a commit to benibus/arrow that referenced this issue Aug 2, 2023
pitrou added a commit that referenced this issue Nov 15, 2023
### Rationale for this change

There is currently an active proposal to support half-float types in Parquet. For more details/discussion, see the links in this PR's accompanying issue.

### What changes are included in this PR?

This PR implements basic support for a `Float16LogicalType` in accordance with the proposed spec. More specifically, this includes:

- Changes to `parquet.thrift` and regenerated `parqet_types` files
- Basic `LogicalType` class definition, method impls, and enums
- Support for specialized comparisons and column statistics

In the interest of scope, this PR does not currently deal with arrow integration and byte split encoding - although we will want both of these features resolved before the proposal is approved.

### Are these changes tested?

Yes (tests are included)

### Are there any user-facing changes?

Yes

* Closes: #36036

Lead-authored-by: benibus <bpharks@gmx.com>
Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 15.0.0 milestone Nov 15, 2023
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…apache#36073)

### Rationale for this change

There is currently an active proposal to support half-float types in Parquet. For more details/discussion, see the links in this PR's accompanying issue.

### What changes are included in this PR?

This PR implements basic support for a `Float16LogicalType` in accordance with the proposed spec. More specifically, this includes:

- Changes to `parquet.thrift` and regenerated `parqet_types` files
- Basic `LogicalType` class definition, method impls, and enums
- Support for specialized comparisons and column statistics

In the interest of scope, this PR does not currently deal with arrow integration and byte split encoding - although we will want both of these features resolved before the proposal is approved.

### Are these changes tested?

Yes (tests are included)

### Are there any user-facing changes?

Yes

* Closes: apache#36036

Lead-authored-by: benibus <bpharks@gmx.com>
Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants