Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastparquet cannot read a categorical column that contains NaNs only #887

Closed
apamplifi opened this issue Oct 10, 2023 · 2 comments · Fixed by #888
Closed

fastparquet cannot read a categorical column that contains NaNs only #887

apamplifi opened this issue Oct 10, 2023 · 2 comments · Fixed by #888

Comments

@apamplifi
Copy link

Describe the issue:
When a pandas dataframe contains a categorical column that only contains NaNs, fastparquet is able to save such dataframe into a .parq file, but it cannot read it back. I am unsure if this is because of the broken file format, or because the file reader code cannot deal with the resulting data format.

Minimal Complete Verifiable Example:

# bug report for fastparquet
import numpy as np
import pandas as pd
bug = pd.DataFrame({
    "good_col": range(5),
    "bad_col": pd.Series([pd.NA] * 5).astype("category")
})
bug.to_parquet("broken.parq")
test = pd.read_parquet("broken.parq") # ValueError: codes need to be between -1 and len(categories)-1

Environment:

  • fastparquet version: 2023.8.0
  • numpy version: 1.25.2
  • pandas version: 2.1.0
  • Python version: 3.11.3
  • Operating System: Windows 11;
  • Install method (conda, pip, source): installed using Poetry.
@martindurant
Copy link
Member

@apamplifi , thank you for the concise and exact description of the problem. As a workaround before the next release, you can always set the categories on your categorical columns - as long as there are more than zero categories, you won't get the error.

@apamplifi
Copy link
Author

@martindurant, many thanks for such a quick response. I solved my problem by removing the empty columns from the data (because in this particular case I could), but your fix is much appreciated as it won't always be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants