-
Notifications
You must be signed in to change notification settings - Fork 134
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug descripion
From pandas I am writing a parquet file (using gzip compression), I am looking to query this file using datafusion. The same file exported as .csv is working fine using this library, but the parquet version is not returning anything.
Steps to reproduce
import datafusion
import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2, 4, 5, 6], 'col2': [3, 4, 3, 5, 2], 'col3': [3, 4, 1, 2, 3], 'col4': [3, 4, 4, 5, 6]})
df.to_csv('df.csv', compression=None)
df.to_parquet('df.pq', compression=None)
ctx = datafusion.SessionContext()
ctx.register_csv(name="example_csv", path="df.csv")
ctx.register_parquet(name="example_pq", path="df.pq")
# test csv
df = ctx.sql("SELECT * FROM example_csv")
result = df.collect()
res = result[0]
# test parquet
df = ctx.sql("SELECT * FROM example_pq")
result = df.collect()
res = result[0]Expected behavior
The same result from both approaches
Additional context
It also seems that gzip compressed files are not working, I am not sure why this is, please consider the following example:
df.to_csv('df.csv.gz')
ctx.register_csv(name="example_csv_gz", path="df.csv.gz")
# test csv
df = ctx.sql("SELECT * FROM example_csv_gz")
result = df.collect()
res = result[0]
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working