Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups #21573

Closed
asfimport opened this issue Apr 1, 2019 · 3 comments
Assignees
Milestone

Comments

@asfimport
Copy link

Conversion of dict encoded null column fails in parquet writing when using RowGroups

import pyarrow.parquet as pq
import pandas as pd
import pyarrow as pa
df = pd.DataFrame({"col": [None] * 100, "int": [1.0] * 100})
df = df.astype({"col": "category"})
table = pa.Table.from_pandas(df)
buf = pa.BufferOutputStream()
pq.write_table(
    table,
    buf,
    version="2.0",
    chunk_size=10,
)

fails with

pyarrow.lib.ArrowIOError: Column 2 had 100 while previous column had 10

Reporter: Florian Jetter / @fjetter
Assignee: Wes McKinney / @wesm

PRs and other links:

Note: This issue was originally created as ARROW-5085. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
This is now fixed in master, presumably by ARROW-3246. I'll add a unit test. Note that version=2.0 files should not be used, see PARQUET-458

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Hm, reading the file back causes a segfault. Taking a look

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 5107
#5107

@asfimport asfimport added this to the 0.15.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants