-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Open
Copy link
Labels
questionFurther information is requestedFurther information is requested
Description
Describe the bug
Given a json (or a csv) read by datafusion-cli with
create external table t17 stored as json location '20251117.json';
whatever the columns order in the json(or csv) file, all columns get sorted in the dataframe, which means when writing into another format, order of columns will be different! I can not find the options either in the API or in datafusion-cli rto prevent this. I do not understand why this happens at all. At least there should be an option.
To Reproduce
head -1 20251117.json
{"uti":1763334319,"dat":"2025-11-16 23:05:19.186018808","tim":"23:05:19.186018808","hex":"4d00c3","fli":"","lat":48.20722961425781,"lon":2.240788386418269,"gda":"A","src":"A","alt":38000,"altg":37900,"hgt":-100,"spd":385,"cat":"A0","squ":"2256","vrt":0,"trk":56.84539,"mop":2,"lla":2,"tru":29,"dbm":-90,"nucp":8,"nic":8,"pic":11}datafusion-cli
> create external table t17 stored as json location '20251117.json';
0 row(s) fetched.
Elapsed 0.024 seconds.
> describe t17;
+-------------+-----------+-------------+
| column_name | data_type | is_nullable |
+-------------+-----------+-------------+
| alt | Int64 | YES |
| altg | Int64 | YES |
| cat | Utf8 | YES |
| dat | Utf8 | YES |
| dbm | Int64 | YES |
| fli | Utf8 | YES |
| gda | Utf8 | YES |
| hex | Utf8 | YES |
| hgt | Int64 | YES |
| lat | Float64 | YES |
| lla | Int64 | YES |
| lon | Float64 | YES |
| mop | Int64 | YES |
| nic | Int64 | YES |
| nucp | Int64 | YES |
| pic | Int64 | YES |
| spd | Int64 | YES |
| squ | Utf8 | YES |
| src | Utf8 | YES |
| tim | Utf8 | YES |
| trk | Float64 | YES |
| tru | Int64 | YES |
| uti | Int64 | YES |
| vrt | Int64 | YES |
+-------------+-----------+-------------+
24 row(s) fetched.
there should NO SORTING by defaullt.
Expected behavior
Columns are not sorted.
Additional context
I have seen the file_sort_order option, but I do not believe this is it. I cannot find any option to reverse the behaviour.
I use the bdt tool every day to convert from JSONL to CSV and that's where I found the problem. bdt use datafusion for all its operations.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested