[Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups #21573

asfimport · 2019-04-01T15:54:35Z

Conversion of dict encoded null column fails in parquet writing when using RowGroups

import pyarrow.parquet as pq
import pandas as pd
import pyarrow as pa
df = pd.DataFrame({"col": [None] * 100, "int": [1.0] * 100})
df = df.astype({"col": "category"})
table = pa.Table.from_pandas(df)
buf = pa.BufferOutputStream()
pq.write_table(
    table,
    buf,
    version="2.0",
    chunk_size=10,
)

fails with

pyarrow.lib.ArrowIOError: Column 2 had 100 while previous column had 10

Reporter: Florian Jetter / @fjetter
Assignee: Wes McKinney / @wesm

PRs and other links:

GitHub Pull Request #5107

_{Note: This issue was originally created as ARROW-5085. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2019-08-16T16:58:50Z

Wes McKinney / @wesm:
This is now fixed in master, presumably by ARROW-3246. I'll add a unit test. Note that version=2.0 files should not be used, see PARQUET-458

asfimport · 2019-08-16T17:03:04Z

Wes McKinney / @wesm:
Hm, reading the file back causes a segfault. Taking a look

asfimport · 2019-08-17T20:25:06Z

Wes McKinney / @wesm:
Issue resolved by pull request 5107
#5107

asfimport closed this as completed Aug 17, 2019

asfimport assigned wesm Jan 10, 2023

asfimport added this to the 0.15.0 milestone Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups #21573

[Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups #21573

asfimport commented Apr 1, 2019

asfimport commented Aug 16, 2019

asfimport commented Aug 16, 2019

asfimport commented Aug 17, 2019

[Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups #21573

[Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups #21573

Comments

asfimport commented Apr 1, 2019

PRs and other links:

asfimport commented Aug 16, 2019

asfimport commented Aug 16, 2019

asfimport commented Aug 17, 2019