You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the issue:
When a pandas dataframe contains a categorical column that only contains NaNs, fastparquet is able to save such dataframe into a .parq file, but it cannot read it back. I am unsure if this is because of the broken file format, or because the file reader code cannot deal with the resulting data format.
Minimal Complete Verifiable Example:
# bug report for fastparquetimportnumpyasnpimportpandasaspdbug=pd.DataFrame({
"good_col": range(5),
"bad_col": pd.Series([pd.NA] *5).astype("category")
})
bug.to_parquet("broken.parq")
test=pd.read_parquet("broken.parq") # ValueError: codes need to be between -1 and len(categories)-1
Environment:
fastparquet version: 2023.8.0
numpy version: 1.25.2
pandas version: 2.1.0
Python version: 3.11.3
Operating System: Windows 11;
Install method (conda, pip, source): installed using Poetry.
The text was updated successfully, but these errors were encountered:
@apamplifi , thank you for the concise and exact description of the problem. As a workaround before the next release, you can always set the categories on your categorical columns - as long as there are more than zero categories, you won't get the error.
@martindurant, many thanks for such a quick response. I solved my problem by removing the empty columns from the data (because in this particular case I could), but your fix is much appreciated as it won't always be possible.
Describe the issue:
When a pandas dataframe contains a categorical column that only contains NaNs, fastparquet is able to save such dataframe into a .parq file, but it cannot read it back. I am unsure if this is because of the broken file format, or because the file reader code cannot deal with the resulting data format.
Minimal Complete Verifiable Example:
Environment:
The text was updated successfully, but these errors were encountered: