TPC-DS tools for Apache Impala

The official and latest TPC-DS tools and specification can be found at tpc.org

The query templates and sample queries provided in this repo are compliant with the standards set out by the TPC-DS benchmark specification and include only minor query modifications (MQMs) as set out by section 4.2.3 of the specification. The modification list can be found in query-templates/README.md.

If you use this repo for any results publication, please see Fair Use of TPC Benchmarks.

Step 0: Environment Setup

Install Java JDK and Maven if need be:

sudo yum -y install java-1.8.0-openjdk-devel maven

Install the necessary development tools:

sudo yum -y install git gcc make flex bison byacc curl unzip patch

Step 1: Generate Data

Data generation is done via a MapReduce wrapper around TPC-DS dsdgen. See tpcds-gen/README.md for more details on the commands to generate the flat files.

Step 2: Load Data

Adjust the source/text and target/Parquet schema names and flat file paths in the sql files found in the scripts/ directory. See the comments at the top of each.

Create external text file tables:

impala-shell -f impala-external.sql

Create Parquet tables:

impala-shell -f impala-parquet.sql

Load Parquet tables and compute stats:

impala-shell -f impala-insert.sql

Step 3: Run Queries

Sample queries from the 10TB scale factor can be found in the queries/ directory. The query-templates/ directory contains the Apache Impala TPC-DS query templates which can be used with dsqgen (found in the official TPC-DS tools) to generate queries for other scale factors or to generate more queries with different substitution variables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

queries

queries

query-templates

query-templates

scripts

scripts

tpcds-gen

tpcds-gen

LICENSE

LICENSE

README.md

README.md

Repository files navigation

TPC-DS tools for Apache Impala

Step 0: Environment Setup

Step 1: Generate Data

Step 2: Load Data

Step 3: Run Queries

About

Releases

Packages

Contributors 9

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
queries		queries
query-templates		query-templates
scripts		scripts
tpcds-gen		tpcds-gen
LICENSE		LICENSE
README.md		README.md

License

cloudera/impala-tpcds-kit

Folders and files

Latest commit

History

Repository files navigation

TPC-DS tools for Apache Impala

Step 0: Environment Setup

Step 1: Generate Data

Step 2: Load Data

Step 3: Run Queries

About

Resources

License

Stars

Watchers

Forks

Languages