Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA - SPARK CONNECTOR - parallel unloader: hangs doing UNION ALL statement #38

Closed
Pyrobal opened this issue May 20, 2016 · 1 comment
Closed
Assignees
Labels

Comments

@Pyrobal
Copy link

Pyrobal commented May 20, 2016

4.2.3H with latest patch and daily spark 0.2 jar.

This unload query hangs:

select distinct col_int, col_char20 from vwload_reg02_unload_tbl UNION ALL select col_int, col_char20 from vwload_reg02_unload_tbl2\g

ENV:

export SPARK_MASTER=yarn
export HDFS_TMP=$(iigetenv II_HDFSDATA)
export SPARK_LOADER_JAR=/home/actian/kelch01/Spark/unloader/spark_vector_loader-assembly-0.2.jar
export SEPPARAMDB=vwtestdb0

export TMP_II_HDFSDATA=$(ingprenv II_HDFSDATA)
export TMP_II_INSTALLATION=$(ingprenv II_INSTALLATION)
export TMP_HOSTNAME=$HOSTNAME

export ING_TST=/home/actian/kelch01/svn_main_checkout/

export II_USER=actian
export II_PASSWORD=actian

LOAD:

create table vwload_reg02_unload_tbl (col_int int, col_float4 float4, col_money money,col_decimal382 decimal(38,2),col_decimal102 decimal(10,2), col_char20 char(20), col_varchar20 varchar(20), col_nchar20 nchar(20),col_nvarchar nvarchar(20), col_ansidate ansidate, col_timestamp timestamp);\g

use attached data file
vwload --nullvalue NULL --fdelim \t --table vwload_reg02_unload_tbl $SEPPARAMDB vwload_reg02_data.txt

create table vwload_reg02_unload_tbl2 as select * vwload_reg02_unload_tbl\g

UNLOAD:

hadoop fs -rm -r -skipTrash $(ingprenv HDFS_TMP)/*.csv
spark-shell --master $SPARK_MASTER --driver-java-options '-Duser.timezone=GMT' --conf "spark.executor.memory=10G" --conf "spark.driver.memory=10G" --jars $SPARK_LOADER_JAR

val savepath =sys.env("HDFS_TMP")
val installation_ID= sys.env("TMP_II_INSTALLATION")
val hostname= sys.env("TMP_HOSTNAME")
val databasename= sys.env("SEPPARAMDB")
val user = sys.env.get("II_USER")
val password = sys.env.get("II_PASSWORD")

val usernameAndPasswordStatement = (user, password) match {
case (Some(u), Some(p)) => s""", user "$u", password "$p""""
case _ => ""
}

sqlContext.sql(s"""CREATE TEMPORARY TABLE vwload_reg02_unload_tbl
USING com.actian.spark_vector.sql.DefaultSource
OPTIONS (
host "$hostname",
instance "$installation_ID",
database "$databasename",
table "vwload_reg02_unload_tbl"
$usernameAndPasswordStatement)""")

sqlContext.sql(s"""CREATE TEMPORARY TABLE vwload_reg02_unload_tbl2
USING com.actian.spark_vector.sql.DefaultSource
OPTIONS (
host "$hostname",
instance "$installation_ID",
database "$databasename",
table "vwload_reg02_unload_tbl2"
$usernameAndPasswordStatement)""")

sqlContext.sql("select distinct col_int, col_char20 from vwload_reg02_unload_tbl UNION ALL select col_int, col_char20 from vwload_reg02_unload_tbl2").write.format("com.databricks.spark.csv").save(s"$savepath/vwload_reg02_unload_tbl_12.csv")

hangs

@cbarca
Copy link
Contributor

cbarca commented May 22, 2016

after fixing issue #39

16/05/22 22:50:41 INFO Executor: Running task 206.0 in stage 1.0 (TID 226)
16/05/22 22:50:41 INFO Executor: Finished task 184.0 in stage 1.0 (TID 204). 1424 bytes result sent to driver
16/05/22 22:50:41 INFO DataStreamConnector: PartitionId => 206
16/05/22 22:50:41 INFO CoarseGrainedExecutorBackend: Got assigned task 227
16/05/22 22:50:41 INFO Executor: Running task 207.0 in stage 1.0 (TID 227)
16/05/22 22:50:41 ERROR Executor: Exception in task 206.0 in stage 1.0 (TID 226)
java.lang.IndexOutOfBoundsException: 206
at scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:137)
at scala.collection.immutable.Vector.apply(Vector.scala:127)
at com.actian.spark_vector.datastream.DataStreamConnector.openSocketChannel(DataStreamConnector.scala:91)
at com.actian.spark_vector.datastream.DataStreamConnector.newConnection(DataStreamConnector.scala:109)
at com.actian.spark_vector.datastream.reader.DataStreamReader.read(DataStreamReader.scala:47)
at com.actian.spark_vector.vector.Vector$$anonfun$unloadVector$1$$anonfun$7.apply(Vector.scala:124)
at com.actian.spark_vector.vector.Vector$$anonfun$unloadVector$1$$anonfun$7.apply(Vector.scala:124)
at com.actian.spark_vector.datastream.reader.ScanRDD.compute(ScanRDD.scala:44)

@cbarca cbarca added the bug label May 22, 2016
@cbarca cbarca self-assigned this May 22, 2016
cbarca added a commit that referenced this issue May 24, 2016
Fixing the partition id which is passed further to DataStreamReader; the partitionId from taskContext was bogus, the Partition.index should've been used instead. Fixes #38
cbarca pushed a commit that referenced this issue May 24, 2016
Fixing the partition id which is passed further to DataStreamReader; the partitionId from taskContext was bogus, the Partition.index should've been used instead. Fixes #38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants