## TPC-DS Query Generation  

TPC-DS generates queries from template SQL files.  These files are compiled by the `dsqgen` program into executable SQL for the system under test (SUT) based on files in `/query_templates/`, specfically, `sqlserver.tpl` for sql configuration and `queryX.tpl` where X is 1-99 for the format of each query.  The SQL output is ANSI compliant SQL.  The manual steps are as follows:  
1. NB DS 01 Setup also copies all template files into `/tpl` where three copies are made: `/tpl/ansi_gen/`, `/tpl/bq_gen` and `/tpl/sf_gen`
1. Using the default SQL templates, `ansi_gen`, queries are compiled into `/q/ansi_ds`
1. The default `/tpl/bq_gen/sqlserver.tpl` is rewritten to `/tpl/bq_gen/sqlserver_bq.tpl`
1. The templates in `/tpl/bq_gen/` are modified with regex to make them BigQuery syntax
1. MANUAL: The directory `/tpl/bq_gen` is copied and renamed to `/tpl/bq_ds`
1. MANUAL: The contents of query1.tpl through query99.tpl are edited by hand, compiled with `dsqgen` (see NB DS XX BQ Query Validation) and verified to run on BigQuery.
1. `dsqgen` is run on the entire set of 1-99 queries to create a benchmark stream (i.e. 1-10, see spec Appendix D for query order of each stream).

In [1]:
import ds_setup, config, tools

### 00. ANSI Default Queries  
This is the default output and will not run correctly on BigQuery or Snowflake.  The reason we generate this output is for future reference when comparing SQL edits and performance.

scale factor: 1  
dialect: ANSI SQL  
templates: /tpl/ds_ansi  
output: /q/ds_00_1GB_default_ansi

In [2]:
test_name = "ansi_ds"
input_templates = config.fp_ds_ansi_gen_template_dir
input_templates

'/home/colin/code/bq_snowflake_benchmark/tpl/ansi_ds_gen'

In [3]:
output_queries = config.fp_query + config.sep + test_name
tools.mkdir_safe(output_queries)
output_queries

'/home/colin/code/bq_snowflake_benchmark/q/ansi_ds'

In [4]:
std_out, err_out = ds_setup.dsqgen(directory=input_templates,
                                   output_dir=output_queries,
                                   input=input_templates + config.sep + "templates.lst",
                                   dialect="sqlserver",
                                   scale=1,
                                   streams=10,
                                   qualify='Y')
print(std_out)
print(err_out)


qgen2 Query Generator (Version 2.11.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2019



### 01. BigQuery Qualification Queries  
Output a copy of the queries that should run on BigQuery.  These queries come from the template files in `/tpl/bq_ds` which have been manually edited to compile to BigQuery SQL.  No optimization is done, the intention of these queries is to naively convert the source TPC-DS queries as faithfully as possible.

As the `qualify` flag is set, these queries will be populated with values that should produce known output on the 1GB scale factore dataset.  The answers are in `/answer_sets/` and are used to evaluate if the database has been loaded correctly.

scale factor: 1  
dialect: BigQuery SQL  
templates: /tpl/bq_ds  
output: /q/bq_ds_1GB_qual  

In [5]:
test_name = "bq_ds_1GB_00_qual"

In [6]:
dialect = "sqlserver_bq"

In [7]:
input_templates_bq = config.fp_ds_bq_template_dir
input_templates_bq

'/home/colin/code/bq_snowflake_benchmark/tpl/bq_ds'

In [8]:
output_queries = config.fp_query + config.sep + test_name
tools.mkdir_safe(output_queries)
output_queries

'/home/colin/code/bq_snowflake_benchmark/q/bq_ds_1GB_00_qual'

In [9]:
std_out, err_out = ds_setup.dsqgen(directory=input_templates_bq,
                                   output_dir=output_queries,
                                   input=input_templates + config.sep + "templates.lst",
                                   dialect=dialect,
                                   scale=1,
                                   streams=10,
                                   qualify="Y"  # qualify only for 1GB
                                  )
print(std_out)
print(err_out)


qgen2 Query Generator (Version 2.11.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2019



### 02. BigQuery 100GB Default Queries 
Output copy of the queries that should run on BigQuery.  These queries come from the template files in `/tpl/bq_ds` which have been manually edited to compile to BigQuery SQL.  No optimization is done, the intention of these queries is to naively convert the source TPC-DS queries to run on BigQuery as faithfully as possible.

scale factor: 100  
dialect: BigQuery SQL  
templates: /tpl/bq_ds  
output: /q/bq_ds_100GB_01_default

In [10]:
test_name = "bq_ds_100GB_01_default"

In [11]:
dialect = "sqlserver_bq"

In [12]:
input_templates_bq = config.fp_ds_bq_template_dir
input_templates_bq

'/home/colin/code/bq_snowflake_benchmark/tpl/bq_ds'

In [13]:
output_queries = config.fp_query + config.sep + test_name
tools.mkdir_safe(output_queries)
output_queries

'/home/colin/code/bq_snowflake_benchmark/q/bq_ds_100GB_01_default'

In [14]:
std_out, err_out = ds_setup.dsqgen(directory=input_templates_bq,
                                   output_dir=output_queries,
                                   input=input_templates + config.sep + "templates.lst",
                                   dialect=dialect,
                                   scale=100,
                                   streams=10)
print(std_out)
print(err_out)


qgen2 Query Generator (Version 2.11.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2019



### 03. BigQuery 1000GB/1TB Default Queries 
Output copy of the queries that should run on BigQuery.  These queries come from the template files in `/tpl/bq_ds` which have been manually edited to compile to BigQuery SQL.  No optimization is done, the intention of these queries is to naively convert the source TPC-DS queries to run on BigQuery as faithfully as possible.

scale factor: 1000  
dialect: BigQuery SQL  
templates: /tpl/bq_ds  
output: /q/bq_ds_1000GB_01_default

In [15]:
test_name = "bq_ds_1000GB_01_default"

In [16]:
dialect = "sqlserver_bq"

In [17]:
input_templates_bq = config.fp_ds_bq_template_dir
input_templates_bq

'/home/colin/code/bq_snowflake_benchmark/tpl/bq_ds'

In [18]:
output_queries = config.fp_query + config.sep + test_name
tools.mkdir_safe(output_queries)
output_queries

'/home/colin/code/bq_snowflake_benchmark/q/bq_ds_1000GB_01_default'

In [19]:
std_out, err_out = ds_setup.dsqgen(directory=input_templates_bq,
                                   output_dir=output_queries,
                                   input=input_templates + config.sep + "templates.lst",
                                   dialect=dialect,
                                   scale=100,
                                   streams=10)
print(std_out)
print(err_out)


qgen2 Query Generator (Version 2.11.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2019



### 04. BigQuery 10000GB/10TB Default Queries 
Output copy of the queries that should run on BigQuery.  These queries come from the template files in `/tpl/bq_ds` which have been manually edited to compile to BigQuery SQL.  No optimization is done, the intention of these queries is to naively convert the source TPC-DS queries to run on BigQuery as faithfully as possible.

scale factor: 10000  
dialect: BigQuery SQL  
templates: /tpl/bq_ds  
output: /q/bq_ds_10000GB_01_default

In [20]:
test_name = "bq_ds_10000GB_01_default"

In [21]:
dialect = "sqlserver_bq"

In [22]:
input_templates_bq = config.fp_ds_bq_template_dir
input_templates_bq

'/home/colin/code/bq_snowflake_benchmark/tpl/bq_ds'

In [23]:
output_queries = config.fp_query + config.sep + test_name
tools.mkdir_safe(output_queries)
output_queries

'/home/colin/code/bq_snowflake_benchmark/q/bq_ds_10000GB_01_default'

In [24]:
std_out, err_out = ds_setup.dsqgen(directory=input_templates_bq,
                                   output_dir=output_queries,
                                   input=input_templates + config.sep + "templates.lst",
                                   dialect=dialect,
                                   scale=100,
                                   streams=10)
print(std_out)
print(err_out)


qgen2 Query Generator (Version 2.11.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2019

