QA - SPARK CONNECTOR - parallel unloader: hangs doing UNION ALL statement #38

Pyrobal · 2016-05-20T12:56:45Z

4.2.3H with latest patch and daily spark 0.2 jar.

This unload query hangs:

select distinct col_int, col_char20 from vwload_reg02_unload_tbl UNION ALL select col_int, col_char20 from vwload_reg02_unload_tbl2\g

ENV:

export SPARK_MASTER=yarn
export HDFS_TMP=$(iigetenv II_HDFSDATA)
export SPARK_LOADER_JAR=/home/actian/kelch01/Spark/unloader/spark_vector_loader-assembly-0.2.jar
export SEPPARAMDB=vwtestdb0

export TMP_II_HDFSDATA=$(ingprenv II_HDFSDATA)
export TMP_II_INSTALLATION=$(ingprenv II_INSTALLATION)
export TMP_HOSTNAME=$HOSTNAME

export ING_TST=/home/actian/kelch01/svn_main_checkout/

export II_USER=actian
export II_PASSWORD=actian

LOAD:

create table vwload_reg02_unload_tbl (col_int int, col_float4 float4, col_money money,col_decimal382 decimal(38,2),col_decimal102 decimal(10,2), col_char20 char(20), col_varchar20 varchar(20), col_nchar20 nchar(20),col_nvarchar nvarchar(20), col_ansidate ansidate, col_timestamp timestamp);\g

use attached data file
vwload --nullvalue NULL --fdelim \t --table vwload_reg02_unload_tbl $SEPPARAMDB vwload_reg02_data.txt

create table vwload_reg02_unload_tbl2 as select * vwload_reg02_unload_tbl\g

UNLOAD:

hadoop fs -rm -r -skipTrash $(ingprenv HDFS_TMP)/*.csv
spark-shell --master $SPARK_MASTER --driver-java-options '-Duser.timezone=GMT' --conf "spark.executor.memory=10G" --conf "spark.driver.memory=10G" --jars $SPARK_LOADER_JAR

val savepath =sys.env("HDFS_TMP")
val installation_ID= sys.env("TMP_II_INSTALLATION")
val hostname= sys.env("TMP_HOSTNAME")
val databasename= sys.env("SEPPARAMDB")
val user = sys.env.get("II_USER")
val password = sys.env.get("II_PASSWORD")

val usernameAndPasswordStatement = (user, password) match {
case (Some(u), Some(p)) => s""", user "$u", password "$p""""
case _ => ""
}

sqlContext.sql(s"""CREATE TEMPORARY TABLE vwload_reg02_unload_tbl
USING com.actian.spark_vector.sql.DefaultSource
OPTIONS (
host "$hostname",
instance "$installation_ID",
database "$databasename",
table "vwload_reg02_unload_tbl"
$usernameAndPasswordStatement)""")

sqlContext.sql(s"""CREATE TEMPORARY TABLE vwload_reg02_unload_tbl2
USING com.actian.spark_vector.sql.DefaultSource
OPTIONS (
host "$hostname",
instance "$installation_ID",
database "$databasename",
table "vwload_reg02_unload_tbl2"
$usernameAndPasswordStatement)""")

sqlContext.sql("select distinct col_int, col_char20 from vwload_reg02_unload_tbl UNION ALL select col_int, col_char20 from vwload_reg02_unload_tbl2").write.format("com.databricks.spark.csv").save(s"$savepath/vwload_reg02_unload_tbl_12.csv")

hangs

cbarca · 2016-05-22T20:52:52Z

after fixing issue #39

16/05/22 22:50:41 INFO Executor: Running task 206.0 in stage 1.0 (TID 226)
16/05/22 22:50:41 INFO Executor: Finished task 184.0 in stage 1.0 (TID 204). 1424 bytes result sent to driver
16/05/22 22:50:41 INFO DataStreamConnector: PartitionId => 206
16/05/22 22:50:41 INFO CoarseGrainedExecutorBackend: Got assigned task 227
16/05/22 22:50:41 INFO Executor: Running task 207.0 in stage 1.0 (TID 227)
16/05/22 22:50:41 ERROR Executor: Exception in task 206.0 in stage 1.0 (TID 226)
java.lang.IndexOutOfBoundsException: 206
at scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:137)
at scala.collection.immutable.Vector.apply(Vector.scala:127)
at com.actian.spark_vector.datastream.DataStreamConnector.openSocketChannel(DataStreamConnector.scala:91)
at com.actian.spark_vector.datastream.DataStreamConnector.newConnection(DataStreamConnector.scala:109)
at com.actian.spark_vector.datastream.reader.DataStreamReader.read(DataStreamReader.scala:47)
at com.actian.spark_vector.vector.Vector$$anonfun$unloadVector$1$$anonfun$7.apply(Vector.scala:124)
at com.actian.spark_vector.vector.Vector$$anonfun$unloadVector$1$$anonfun$7.apply(Vector.scala:124)
at com.actian.spark_vector.datastream.reader.ScanRDD.compute(ScanRDD.scala:44)

Fixing the partition id which is passed further to DataStreamReader; the partitionId from taskContext was bogus, the Partition.index should've been used instead. Fixes #38

cbarca added the bug label May 22, 2016

cbarca self-assigned this May 22, 2016

cbarca mentioned this issue May 23, 2016

Fixing the partition id which is passed further to DataStreamReader; the partitionId from taskContext was bogus, the Partition.index should've been used instead. #42

Merged

cbarca closed this as completed in #42 May 24, 2016

cbarca added a commit that referenced this issue May 24, 2016

Merge pull request #42 from ActianCorp/issue38

40af3fc

Fixing the partition id which is passed further to DataStreamReader; the partitionId from taskContext was bogus, the Partition.index should've been used instead. Fixes #38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA - SPARK CONNECTOR - parallel unloader: hangs doing UNION ALL statement #38

QA - SPARK CONNECTOR - parallel unloader: hangs doing UNION ALL statement #38

Pyrobal commented May 20, 2016

cbarca commented May 22, 2016

QA - SPARK CONNECTOR - parallel unloader: hangs doing UNION ALL statement #38

QA - SPARK CONNECTOR - parallel unloader: hangs doing UNION ALL statement #38

Comments

Pyrobal commented May 20, 2016

cbarca commented May 22, 2016