Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Sub-partitioning of Parquet file for ADAM #1003
The Spark-SQL programming guide describe an optimization of parquet usage that involves splitting parquet file into directories corresponding to different column values. here
This issue is meant as a place for discussion of this topic and to determine if we should prototype such a parquet directory layout, for example dividing the parquet file into individual files per chromosome.
Look forward to any comments and/or links to earlier discussions of this topic