Parquet read : Invalid decimal encoding in Parquet file #12621

Jbcot · 2024-06-20T12:12:58Z

What happens?

I encounter the following error when I read a parquet that contain a DECIMAL(17, 4) column with a python script running on Linux.
The same error occurs on windows using DBeaver tool to read the Parquet file ith a DuckDB connection.
I precise that the file has been extracted from a DB2/AS400 system with Talend ETL tool.
I have no issue to open the file with another parquet viewer.

duckdb.duckdb.InvalidInputException: Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result
Error: Invalid Input Error: Invalid decimal encoding in Parquet file

The file to test is attached to the issue.
sample.zip

To Reproduce

FROM 'sample.parquet';

OS:

Debian GNU/Linux 11

DuckDB Version:

1.0.0

DuckDB Client:

Python

Full Name:

Jean-Blaise Cottenceau

Affiliation:

Manitou Group

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

Yes, I have

The text was updated successfully, but these errors were encountered:

szarnyasg · 2024-06-21T13:19:46Z

Hi @Jbcot, thanks! I could reproduce the issue with a plain SQL code.

fedefrancescon · 2024-06-23T11:01:28Z

Hi, I've tried fixing this with #12655
Hope that sounds good. Would be nice to add a smaller file to the test as just a few rows should be sufficient.
Unfortunately I'm not sure on how to "strip" the file size without importing/exporting the datas.

As I'm pretty new here, any suggestion and enhancement would be much appreciated

fix(parquet): two-complement zeroes check on FIXED_BYTE_ARRAY encoded DECIMAL (#12621)

Jbcot added the needs triage label Jun 20, 2024

szarnyasg added the reproduced label Jun 21, 2024

duckdblabs-bot removed the needs triage label Jun 21, 2024

fedefrancescon mentioned this issue Jun 23, 2024

fix(parquet): two-complement zeroes check on FIXED_BYTE_ARRAY encoded DECIMAL (#12621) #12655

Merged

Mytherin linked a pull request Jun 24, 2024 that will close this issue

fix(parquet): two-complement zeroes check on FIXED_BYTE_ARRAY encoded DECIMAL (#12621) #12655

Merged

Mytherin closed this as completed in #12655 Jun 24, 2024

Mytherin added a commit that referenced this issue Jun 24, 2024

Merge pull request #12655 from fedefrancescon/patch-issue-12621

1385b3b

fix(parquet): two-complement zeroes check on FIXED_BYTE_ARRAY encoded DECIMAL (#12621)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet read : Invalid decimal encoding in Parquet file #12621

Parquet read : Invalid decimal encoding in Parquet file #12621

Jbcot commented Jun 20, 2024 •

edited by szarnyasg

Loading

szarnyasg commented Jun 21, 2024

fedefrancescon commented Jun 23, 2024

Parquet read : Invalid decimal encoding in Parquet file #12621

Parquet read : Invalid decimal encoding in Parquet file #12621

Comments

Jbcot commented Jun 20, 2024 • edited by szarnyasg Loading

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

Did you include all relevant data sets for reproducing the issue?

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

szarnyasg commented Jun 21, 2024

fedefrancescon commented Jun 23, 2024

Jbcot commented Jun 20, 2024 •

edited by szarnyasg

Loading