Skip to content

to_parquet when column with only null values #8797

@odovad

Description

@odovad

I did create a file named 'system_data.txt' with the following content :

SYSTEM_DATE,col1 
Test1,Test1
Test2,Test2
,Test3

This code produces an error :

import dask.dataframe as dd
df = dd.read_csv('system_data.txt',  dtype={"SYSTEM_DATE":'str', 'col1':'str'})
df = df.repartition(npartitions=3)
df.to_parquet('test')

What happened:
RuntimeError: AppendRowGroups requires equal schemas.
I tried with fastparquet, and there is no errors.

What you expected to happen:
The parquet file should be created without errors.

Environment:

  • Dask version: 2022.2.1
  • Python version: 3.8
  • Pyarrow : 7.0.0
  • Operating System: Linux
  • Install method : pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions