Skip to content

add support for reading csv with variable number of columns #891

@djouallah

Description

@djouallah

I am doing an ETL benchmarks that read csv files with variable number of columns, do some transformation and write it back as delta, I test it with 7 Python Engines, unfortunately datafusion support only a csv with a fixed schema.

fwiw the notebook is here with a reproducible data source : https://github.com/djouallah/Fabric_Notebooks_Demo/blob/main/ETL/Light_ETL_Python_Notebook.ipynb

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions