Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet][Tools] Print FLBA type width when printing column types #40133

Closed
pitrou opened this issue Feb 19, 2024 · 0 comments · Fixed by #40132
Closed

[C++][Parquet][Tools] Print FLBA type width when printing column types #40133

pitrou opened this issue Feb 19, 2024 · 0 comments · Fixed by #40132

Comments

@pitrou
Copy link
Member

pitrou commented Feb 19, 2024

Describe the enhancement requested

Currently, when printing a column's type, the output is poorly informative for FIXED_LEN_BYTE_ARRAY columns as the width is not displayed. Example:

Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY)
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY)
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))

Component(s)

C++, Parquet

pitrou added a commit that referenced this issue Feb 19, 2024
…0132)

In `ParquetFilePrinter`, when printing the type of the column, also print its byte width if the type is FIXED_LEN_BYTE_ARRAY.

Before:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY)
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY)
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

After:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY(5))
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY(5))
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

* Closes: #40133

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 16.0.0 milestone Feb 19, 2024
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…th (apache#40132)

In `ParquetFilePrinter`, when printing the type of the column, also print its byte width if the type is FIXED_LEN_BYTE_ARRAY.

Before:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY)
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY)
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

After:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY(5))
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY(5))
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

* Closes: apache#40133

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…th (apache#40132)

In `ParquetFilePrinter`, when printing the type of the column, also print its byte width if the type is FIXED_LEN_BYTE_ARRAY.

Before:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY)
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY)
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

After:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY(5))
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY(5))
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

* Closes: apache#40133

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant