Skip to content
Branch: master
Find file History

README.md

Parquet Tools

Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.

Currently these tools are available for UN*X systems.

Build

If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.

cd parquet-tools && mvn clean package -Plocal 

To use it in hadoop mode, the default profile will exclude the hadoop client dependency

cd parquet-tools && mvn clean package 

The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it

#Run from hadoop

See Commands Usage for command to use

hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

#Run locally

See Commands Usage for command to use

java -jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Commands Usage

To see usage instructions for all commands:

java -jar ./parquet-tools-<VERSION>.jar --help

Note: To run it on hadoop, you should use hadoop jar instead of java -jar

Meta Legend

Row Group Totals

Acronym Definition
RC Row Count
TS Total Byte Size

Row Group Column Details

Acronym Definition
DO Dictionary Page Offset
FPO First Data Page Offset
SZ:{x}/{y}/{z} Size in bytes. x = Compressed total, y = uncompressed total, z = y:x ratio
VC Value Count
RLE Run-Length Encoding
You can’t perform that action at this time.