-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataGen error #126
Comments
Hi @hahasdnu1029 |
Hi@juliuszsompolski Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161) scala> import sqlContext.implicits._ scala> import com.databricks.spark.sql.perf.tpcds.TPCDSTables scala> val tables=new TPCDSTables(sqlContext,"/home/sparktest/tpcds-kit/tools","1",false,false) scala> tables.genData("hdfs://hw080:9000/tpctest","parquet",true,true,true,false,"",100) DISTRIBUTE BY 18/02/24 10:50:49 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job Driver stacktrace: Driver stacktrace: scala> This is all the information! |
Hi @hahasdnu1029 , For
|
hi @juliuszsompolski |
Hi @hahasdnu1029 |
tpcds-kit I put it on all cluster machines, not just drivers,And placed in the same directory。 |
Hi @hahasdnu1029
|
I hit the same issue. I was following the commands in README. |
@hahasdnu1029 |
Hi @gengliangwang @juliuszsompolski thanks,I'll try these ways. |
@Koprvhdix |
@juliuszsompolski IC, I clone wrong tpcds-kit, sorry. |
I am also facing same problem as above and i used correct git repo (https://github.com/databricks/tpcds-kit/ ) but still facing the issue and i executed in windows environment ` |
@manikantabandaru
(I don't guarantee it compiles, see https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/tpcds/TPCDSTables.scala#L27 to troubleshoot) That RDD[String] should contain generated TPCDS data as PSV (Pipe Separated Values). If there's an error instead, it may hint you what's happening. |
spark2.2.0
Caused by: java.lang.RuntimeException: Error while encoding: java.lang.ArrayIndexOutOfBoundsException: 0
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, cs_sold_date_sk), StringType), true) AS cs_sold_date_sk#939if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 1, cs_sold_time_sk), StringType), true) AS cs_sold_time_sk#940if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 2, cs_ship_date_sk), StringType), true) AS cs_ship_date_sk#941if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 3, cs_bill_customer_sk), StringType), true) AS cs_bill_customer_sk#942if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 4, cs_bill_cdemo_sk), StringType), true) AS cs_bill_cdemo_sk#943if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 5, cs_bill_hdemo_sk), StringType), true) AS cs_bill_hdemo_sk#944if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 6, cs_bill_addr_sk), StringType), true) AS cs_bill_addr_sk#945if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 7, cs_ship_customer_sk), StringType), true) AS cs_ship_customer_sk#946if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 8, cs_ship_cdemo_sk), StringType), true) AS cs_ship_cdemo_sk#947if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 9, cs_ship_hdemo_sk), StringType), true) AS cs_ship_hdemo_sk#948if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 10, cs_ship_addr_sk), StringType), true) AS cs_ship_addr_sk#949if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 11, cs_call_center_sk), StringType), true) AS cs_call_center_sk#950if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 12, cs_catalog_page_sk), StringType), true) AS cs_catalog_page_sk#951if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 13, cs_ship_mode_sk), StringType), true) AS cs_ship_mode_sk#952if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 14, cs_warehouse_sk), StringType), true) AS cs_warehouse_sk#953if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 15, cs_item_sk), StringType), true) AS cs_item_sk#954if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 16, cs_promo_sk), StringType), true) AS cs_promo_sk#955if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 17, cs_order_number), StringType), true) AS cs_order_number#956if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 18, cs_quantity), StringType), true) AS cs_quantity#957if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 19, cs_wholesale_cost), StringType), true) AS cs_wholesale_cost#958if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 20, cs_list_price), StringType), true) AS cs_list_price#959if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 21, cs_sales_price), StringType), true) AS cs_sales_price#960if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 22, cs_ext_discount_amt), StringType), true) AS cs_ext_discount_amt#961if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 23, cs_ext_sales_price), StringType), true) AS cs_ext_sales_price#962if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 24, cs_ext_wholesale_cost), StringType), true) AS cs_ext_wholesale_cost#963if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 25, cs_ext_list_price), StringType), true) AS cs_ext_list_price#964if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 26, cs_ext_tax), StringType), true) AS cs_ext_tax#965if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 27, cs_coupon_amt), StringType), true) AS cs_coupon_amt#966if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 28, cs_ext_ship_cost), StringType), true) AS cs_ext_ship_cost#967if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 29, cs_net_paid), StringType), true) AS cs_net_paid#968if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 30, cs_net_paid_inc_tax), StringType), true) AS cs_net_paid_inc_tax#969if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 31, cs_net_paid_inc_ship), StringType), true) AS cs_net_paid_inc_ship#970if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 32, cs_net_paid_inc_ship_tax), StringType), true) AS cs_net_paid_inc_ship_tax#971if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(getex
ternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 33, cs_net_profit), StringType), true) AS cs_net_profit#972 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:573)
at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:573)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.spark.sql.catalyst.expressions.GenericRow.get(rows.scala:173)
at org.apache.spark.sql.Row$class.isNullAt(Row.scala:191)
at org.apache.spark.sql.catalyst.expressions.GenericRow.isNullAt(rows.scala:165)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfCondExpr$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287)
... 16 more
The text was updated successfully, but these errors were encountered: