Howo do I actually change the scale factor #28

v-olmedo · 2018-05-17T22:25:44Z

I do not see any way to do that.

dilipbiswal · 2018-06-24T07:49:57Z

Hello @v-olmedo,
Thanks for trying out the code pattern. Actually this pattern is initially targeted towards developers and target platform was laptop. My thought was that data with larger scale factor may be too large for a laptop running spark. Thats why i didn't expose the scale factor. Here is the line in the code that hard-codes it to 1G at present.

 "2")  gen_data $TPCDS_ROOT_DIR '1G' ;;

You can change it to increase the scale factor. Please make sure to move the data to HDFS if you want parallelism in processing. Also you may want to partition data. I have very briefly touched up on this in the doc.

HichamISIMA · 2019-06-17T13:07:07Z

Hello @dilipbiswal,
You stated: "Please make sure to move the data to HDFS", does that mean that dsdgen can't generate the tables in parallel, distributed manner across a cluster that isn't HDFS? Also for the query execution with dsqgen I don't seem to get any distributed processing !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Howo do I actually change the scale factor #28

Howo do I actually change the scale factor #28

v-olmedo commented May 17, 2018

dilipbiswal commented Jun 24, 2018

HichamISIMA commented Jun 17, 2019

Howo do I actually change the scale factor #28

Howo do I actually change the scale factor #28

Comments

v-olmedo commented May 17, 2018

dilipbiswal commented Jun 24, 2018

HichamISIMA commented Jun 17, 2019