You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello @v-olmedo,
Thanks for trying out the code pattern. Actually this pattern is initially targeted towards developers and target platform was laptop. My thought was that data with larger scale factor may be too large for a laptop running spark. Thats why i didn't expose the scale factor. Here is the line in the code that hard-codes it to 1G at present.
"2") gen_data $TPCDS_ROOT_DIR '1G' ;;
You can change it to increase the scale factor. Please make sure to move the data to HDFS if you want parallelism in processing. Also you may want to partition data. I have very briefly touched up on this in the doc.
Hello @dilipbiswal,
You stated: "Please make sure to move the data to HDFS", does that mean that dsdgen can't generate the tables in parallel, distributed manner across a cluster that isn't HDFS? Also for the query execution with dsqgen I don't seem to get any distributed processing !
I do not see any way to do that.
The text was updated successfully, but these errors were encountered: