Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings #42016

Closed
jorisvandenbossche opened this issue Jun 6, 2024 · 2 comments · Fixed by #42103
Closed

Comments

@jorisvandenbossche
Copy link
Member

Reading and writing data with a float16 field works just fine (because its implemented on the C++ side):

>>> table = pa.table({"a": np.array([0.1, 0.2], "float16"), "b": np.array([1, 2], "int8")})
>>> pq.write_table(table, "/tmp/test_float16.parquet")
>>> meta = pq.read_metadata("/tmp/test_float16.parquet")
>>> meta.schema
<pyarrow._parquet.ParquetSchema object at 0x7f488ec55000>
required group field_id=-1 schema {
  optional fixed_len_byte_array(2) field_id=-1 a (Float16);
  optional int32 field_id=-1 b (Int(bitWidth=8, isSigned=true));
}

But in a few parts of the API you can see we didn't add it to the python bindings:

>>> meta.schema.column(0).logical_type
<pyarrow._parquet.ParquetLogicalType object at 0x7f488ef60210>
  Float16
>>> meta.schema.column(1).logical_type
<pyarrow._parquet.ParquetLogicalType object at 0x7f4894c9fcf0>
  Int(bitWidth=8, isSigned=true)

>>> meta.schema.column(0).logical_type.type
'UNKNOWN'                                          # <--- UNKNOWN instead of FLOAT16 here
>>> meta.schema.column(1).logical_type.type
'INT'

That comes from

cdef logical_type_name_from_enum(ParquetLogicalTypeId type_):
return {
ParquetLogicalType_UNDEFINED: 'UNDEFINED',
ParquetLogicalType_STRING: 'STRING',
ParquetLogicalType_MAP: 'MAP',
ParquetLogicalType_LIST: 'LIST',
ParquetLogicalType_ENUM: 'ENUM',
ParquetLogicalType_DECIMAL: 'DECIMAL',
ParquetLogicalType_DATE: 'DATE',
ParquetLogicalType_TIME: 'TIME',
ParquetLogicalType_TIMESTAMP: 'TIMESTAMP',
ParquetLogicalType_INT: 'INT',
ParquetLogicalType_JSON: 'JSON',
ParquetLogicalType_BSON: 'BSON',
ParquetLogicalType_UUID: 'UUID',
ParquetLogicalType_NONE: 'NONE',
}.get(type_, 'UNKNOWN')

(it might actually be the only place to add it)

@tlm365
Copy link
Contributor

tlm365 commented Jun 11, 2024

take

jorisvandenbossche pushed a commit that referenced this issue Jun 20, 2024
…quet bindings (#42103)

### Rationale for this change
Resolves #42016.

### What changes are included in this PR?
Expose new `FLOAT16` logical type in the `pyarrow.parquet` bindings

### Are these changes tested?
Unit test added.

### Are there any user-facing changes?
No.

* GitHub Issue: #42016

Authored-by: Tai Le Manh <manhtai.lmt@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 17.0.0 milestone Jun 20, 2024
@jorisvandenbossche
Copy link
Member Author

Issue resolved by pull request 42103
#42103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants