-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
I wanna filter dataset on some (pandas) category field, but python crashed. Some sample code as follows
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
# B is category
df = pd.DataFrame({'A': range(4), 'B': list('bccd')})
df['B'] = df['B'].astype('category')
# save to parquet file
table = pa.Table.from_pandas(df)
pq.write_table(table, 'test.parquet')
# read with dataset
dataset = ds.dataset('test.parquet')
# it's ok
dataset.to_table().to_pandas()
# it's ok
dataset.to_table(filter=ds.field('A') > 2).to_pandas()
# it crashed
dataset.to_table(filter=ds.field('B') == 'b').to_pandas()the crash message
ValueOrDie called on an error: Type error: Cannot compare scalars of differing type: dictionary<values=string, indices=int32, ordered=0> vs stringI know in C++ arrow::DictionaryArray of course cannot comapre with string. But I wanna know is there any possible to filter on category field in python?
Thanks for your attention, and Thanks for this brilliant lib.
Metadata
Metadata
Assignees
Labels
No labels