You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Storing sorted data in parquet is often a key performance technique as it "clusters" data in interesting ways than can make predicate evaluation and other query techniques faster.
Describe alternatives you've considered
It might be worth considering having the parquet writer determine automatically if the data was sorted (maybe this would be better than letting the caller have to verify it)? However, verifying in the writer would likely be a significant performance hit.
DataFusion is getting more sophisticated in its ability to track and use sortedness information (e.g. apache/arrow-datafusion#4122). If this metadata was included in the parquet file, DataFusion might be able to take more advantage of it: apache/arrow-datafusion#4177.