Skip to content

When reading a csv or json file, columns get sorted in the dataframe, how NOT to? #18770

@keltia

Description

@keltia

Describe the bug

Given a json (or a csv) read by datafusion-cli with

create external table t17 stored as json location '20251117.json';

whatever the columns order in the json(or csv) file, all columns get sorted in the dataframe, which means when writing into another format, order of columns will be different! I can not find the options either in the API or in datafusion-cli rto prevent this. I do not understand why this happens at all. At least there should be an option.

To Reproduce

head -1 20251117.json
{"uti":1763334319,"dat":"2025-11-16 23:05:19.186018808","tim":"23:05:19.186018808","hex":"4d00c3","fli":"","lat":48.20722961425781,"lon":2.240788386418269,"gda":"A","src":"A","alt":38000,"altg":37900,"hgt":-100,"spd":385,"cat":"A0","squ":"2256","vrt":0,"trk":56.84539,"mop":2,"lla":2,"tru":29,"dbm":-90,"nucp":8,"nic":8,"pic":11}

datafusion-cli

> create external table t17 stored as json location '20251117.json';
0 row(s) fetched.
Elapsed 0.024 seconds.

> describe t17;
+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| alt         | Int64     | YES         |
| altg        | Int64     | YES         |
| cat         | Utf8      | YES         |
| dat         | Utf8      | YES         |
| dbm         | Int64     | YES         |
| fli         | Utf8      | YES         |
| gda         | Utf8      | YES         |
| hex         | Utf8      | YES         |
| hgt         | Int64     | YES         |
| lat         | Float64   | YES         |
| lla         | Int64     | YES         |
| lon         | Float64   | YES         |
| mop         | Int64     | YES         |
| nic         | Int64     | YES         |
| nucp        | Int64     | YES         |
| pic         | Int64     | YES         |
| spd         | Int64     | YES         |
| squ         | Utf8      | YES         |
| src         | Utf8      | YES         |
| tim         | Utf8      | YES         |
| trk         | Float64   | YES         |
| tru         | Int64     | YES         |
| uti         | Int64     | YES         |
| vrt         | Int64     | YES         |
+-------------+-----------+-------------+
24 row(s) fetched.

there should NO SORTING by defaullt.

Expected behavior

Columns are not sorted.

Additional context

I have seen the file_sort_order option, but I do not believe this is it. I cannot find any option to reverse the behaviour.
I use the bdt tool every day to convert from JSONL to CSV and that's where I found the problem. bdt use datafusion for all its operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions