Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparksql cant INSERT a es table #330

Closed
experiences opened this issue Nov 24, 2014 · 3 comments
Closed

sparksql cant INSERT a es table #330

experiences opened this issue Nov 24, 2014 · 3 comments

Comments

@experiences
Copy link

spark version:1.1.0
hive version:0.13
elasticsearch-hadoop version:2.0.2

hive works well but in sparksql

> show create table src_es;
+----------------------------------------------------------------+
|                             result                             |
+----------------------------------------------------------------+
| CREATE EXTERNAL TABLE src_es(                                  |
|   key bigint COMMENT 'from deserializer',                      |
|   value string COMMENT 'from deserializer')                    |
| ROW FORMAT SERDE                                               |
|   'org.elasticsearch.hadoop.hive.EsSerDe'                      |
| STORED BY                                                      |
|   'org.elasticsearch.hadoop.hive.EsStorageHandler'             |
| WITH SERDEPROPERTIES (                                         |
|   'serialization.format'='1')                                  |
| LOCATION                                                       |
|   'hdfs://spark01:8020/user/hive/warehouse/src_es'  |
| TBLPROPERTIES (                                                |
|   'numFiles'='0',                                              |
|   'transient_lastDdlTime'='1414665191',                        |
|   'COLUMN_STATS_ACCURATE'='false',                             |
|   'totalSize'='0',                                             |
|   'numRows'='-1',                                              |
|   'rawDataSize'='-1',                                          |
|   'es.mapping.names'='key:key , value:value',       |
|   'es.resource'='test/src')                                    |
+----------------------------------------------------------------+

> INSERT OVERWRITE TABLE src_es SELECT * FROM src;


=============================================================
log:

14/11/24 17:43:36 INFO ParseDriver: Parsing command: INSERT OVERWRITE TABLE src_es SELECT * FROM src
14/11/24 17:43:36 INFO ParseDriver: Parse Completed
14/11/24 17:43:36 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/11/24 17:43:36 INFO MemoryStore: ensureFreeSpace(442462) called with curMem=5584, maxMem=278302556
14/11/24 17:43:36 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 432.1 KB, free 265.0 MB)
14/11/24 17:43:36 INFO MemoryStore: ensureFreeSpace(34035) called with curMem=448046, maxMem=278302556
14/11/24 17:43:36 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 33.2 KB, free 265.0 MB)
14/11/24 17:43:36 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on spark01:58534 (size: 33.2 KB, free: 265.4 MB)
14/11/24 17:43:36 INFO BlockManagerMaster: Updated info of block broadcast_5_piece0
14/11/24 17:43:36 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/11/24 17:43:36 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/11/24 17:43:36 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/11/24 17:43:36 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/11/24 17:43:36 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
14/11/24 17:43:36 INFO FileInputFormat: Total input paths to process : 3
14/11/24 17:43:37 INFO BlockManager: Removing broadcast 4
14/11/24 17:43:37 INFO BlockManager: Removing block broadcast_4
14/11/24 17:43:37 INFO MemoryStore: Block broadcast_4 of size 1680 dropped from memory (free 277822155)
14/11/24 17:43:37 INFO BlockManager: Removing block broadcast_4_piece0
14/11/24 17:43:37 INFO MemoryStore: Block broadcast_4_piece0 of size 1111 dropped from memory (free 277823266)
14/11/24 17:43:37 INFO BlockManagerInfo: Removed broadcast_4_piece0 on spark01:58534 in memory (size: 1111.0 B, free: 265.4 MB)
14/11/24 17:43:37 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
14/11/24 17:43:37 INFO BlockManagerInfo: Removed broadcast_4_piece0 on spark01:43875 in memory (size: 1111.0 B, free: 530.3 MB)
14/11/24 17:43:37 INFO ContextCleaner: Cleaned broadcast 4
14/11/24 17:43:37 INFO SparkContext: Starting job: runJob at InsertIntoHiveTable.scala:158
14/11/24 17:43:37 INFO BlockManager: Removing broadcast 3
14/11/24 17:43:37 INFO BlockManager: Removing block broadcast_3_piece0
14/11/24 17:43:37 INFO MemoryStore: Block broadcast_3_piece0 of size 1113 dropped from memory (free 277824379)
14/11/24 17:43:37 INFO DAGScheduler: Got job 5 (runJob at InsertIntoHiveTable.scala:158) with 3 output partitions (allowLocal=false)
14/11/24 17:43:37 INFO DAGScheduler: Final stage: Stage 5(runJob at InsertIntoHiveTable.scala:158)
14/11/24 17:43:37 INFO DAGScheduler: Parents of final stage: List()
14/11/24 17:43:37 INFO BlockManagerInfo: Removed broadcast_3_piece0 on spark01:58534 in memory (size: 1113.0 B, free: 265.4 MB)
14/11/24 17:43:37 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
14/11/24 17:43:37 INFO BlockManager: Removing block broadcast_3
14/11/24 17:43:37 INFO MemoryStore: Block broadcast_3 of size 1680 dropped from memory (free 277826059)
14/11/24 17:43:37 INFO DAGScheduler: Missing parents: List()
14/11/24 17:43:37 INFO BlockManagerInfo: Removed broadcast_3_piece0 on spark01:52672 in memory (size: 1113.0 B, free: 530.3 MB)
14/11/24 17:43:37 INFO DAGScheduler: Submitting Stage 5 (MapPartitionsRDD[19] at mapPartitions at InsertIntoHiveTable.scala:181), which has no missing parents
14/11/24 17:43:37 INFO ContextCleaner: Cleaned broadcast 3
14/11/24 17:43:37 INFO MemoryStore: ensureFreeSpace(122344) called with curMem=476497, maxMem=278302556
14/11/24 17:43:37 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 119.5 KB, free 264.8 MB)
14/11/24 17:43:37 INFO MemoryStore: ensureFreeSpace(41121) called with curMem=598841, maxMem=278302556
14/11/24 17:43:37 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 40.2 KB, free 264.8 MB)
14/11/24 17:43:37 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on spark01:58534 (size: 40.2 KB, free: 265.3 MB)
14/11/24 17:43:37 INFO BlockManagerMaster: Updated info of block broadcast_6_piece0
14/11/24 17:43:37 INFO DAGScheduler: Submitting 3 missing tasks from Stage 5 (MapPartitionsRDD[19] at mapPartitions at InsertIntoHiveTable.scala:181)
14/11/24 17:43:37 INFO YarnClientClusterScheduler: Adding task set 5.0 with 3 tasks
14/11/24 17:43:37 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 5, spark01, NODE_LOCAL, 1218 bytes)
14/11/24 17:43:37 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 6, spark01, NODE_LOCAL, 1225 bytes)
14/11/24 17:43:37 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID 7, spark01, NODE_LOCAL, 1225 bytes)
14/11/24 17:43:37 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on spark01:43875 (size: 40.2 KB, free: 530.2 MB)
14/11/24 17:43:37 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on spark01:52672 (size: 40.2 KB, free: 530.2 MB)
14/11/24 17:43:37 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on spark01:48794 (size: 40.2 KB, free: 530.2 MB)
14/11/24 17:43:37 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on spark01:43875 (size: 33.2 KB, free: 530.2 MB)
14/11/24 17:43:37 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on spark01:52672 (size: 33.2 KB, free: 530.2 MB)
14/11/24 17:43:37 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on spark01:48794 (size: 33.2 KB, free: 530.2 MB)
14/11/24 17:43:38 WARN TaskSetManager: Lost task 2.0 in stage 5.0 (TID 7, spark01): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: a valid progressable is required to report status to Hadoop
        org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
        org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:44)
        org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:177)
        org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
        org.apache.spark.sql.hive.SparkHiveHadoopWriter.write(SparkHadoopWriter.scala:98)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:151)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
14/11/24 17:43:38 INFO TaskSetManager: Starting task 2.1 in stage 5.0 (TID 8, spark01, NODE_LOCAL, 1225 bytes)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 1.0 in stage 5.0 (TID 6) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 1]
14/11/24 17:43:38 INFO TaskSetManager: Starting task 1.1 in stage 5.0 (TID 9, spark01, NODE_LOCAL, 1225 bytes)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 2.1 in stage 5.0 (TID 8) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 2]
14/11/24 17:43:38 INFO TaskSetManager: Starting task 2.2 in stage 5.0 (TID 10, spark01, NODE_LOCAL, 1225 bytes)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 0.0 in stage 5.0 (TID 5) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 3]
14/11/24 17:43:38 INFO TaskSetManager: Starting task 0.1 in stage 5.0 (TID 11, spark01, NODE_LOCAL, 1218 bytes)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 1.1 in stage 5.0 (TID 9) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 4]
14/11/24 17:43:38 INFO TaskSetManager: Starting task 1.2 in stage 5.0 (TID 12, spark01, NODE_LOCAL, 1225 bytes)
14/11/24 17:43:38 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on spark01:41092 (size: 40.2 KB, free: 530.2 MB)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 0.1 in stage 5.0 (TID 11) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 5]
14/11/24 17:43:38 INFO TaskSetManager: Starting task 0.2 in stage 5.0 (TID 13, spark01, NODE_LOCAL, 1218 bytes)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 1.2 in stage 5.0 (TID 12) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 6]
14/11/24 17:43:38 INFO TaskSetManager: Starting task 1.3 in stage 5.0 (TID 14, spark01, NODE_LOCAL, 1225 bytes)
14/11/24 17:43:38 INFO ConnectionManager: Accepted connection from [spark01/192.168.1.221:59747]
14/11/24 17:43:38 INFO SendingConnection: Initiating connection to [spark01/192.168.1.221:41092]
14/11/24 17:43:38 INFO SendingConnection: Connected to [spark01/192.168.1.221:41092], 1 messages pending
14/11/24 17:43:38 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on spark01:41092 (size: 33.2 KB, free: 530.2 MB)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 0.2 in stage 5.0 (TID 13) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 7]
14/11/24 17:43:38 INFO TaskSetManager: Starting task 0.3 in stage 5.0 (TID 15, spark01, NODE_LOCAL, 1218 bytes)
14/11/24 17:43:38 INFO TaskSetManager: Lost task 1.3 in stage 5.0 (TID 14) on executor spark01: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException (a valid progressable is required to report status to Hadoop) [duplicate 8]
14/11/24 17:43:38 ERROR TaskSetManager: Task 1 in stage 5.0 failed 4 times; aborting job
14/11/24 17:43:38 INFO YarnClientClusterScheduler: Cancelling stage 5
14/11/24 17:43:38 INFO YarnClientClusterScheduler: Stage 5 was cancelled
14/11/24 17:43:38 INFO StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@2c6450b4
14/11/24 17:43:38 INFO DAGScheduler: Failed to run runJob at InsertIntoHiveTable.scala:158
14/11/24 17:43:38 ERROR SparkSQLOperationManager: Error executing query:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 14, spark01): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: a valid progressable is required to report status to Hadoop
        org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
        org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:44)
        org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:177)
        org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
        org.apache.spark.sql.hive.SparkHiveHadoopWriter.write(SparkHadoopWriter.scala:98)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:151)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/11/24 17:43:38 INFO StatsReportListener: task runtime:(count: 9, mean: 583.222222, stdev: 722.121993, max: 1668.000000, min: 68.000000)
14/11/24 17:43:38 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
14/11/24 17:43:38 INFO StatsReportListener:     68.0 ms 68.0 ms 68.0 ms 73.0 ms 75.0 ms 1.5 s   1.7 s   1.7 s   1.7 s
14/11/24 17:43:38 INFO StatsReportListener: task result size:(count: 9, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
14/11/24 17:43:38 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
14/11/24 17:43:38 INFO StatsReportListener:     0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B
14/11/24 17:43:38 INFO StatsReportListener: executor (non-fetch) time pct: (count: 9, mean: 91.071601, stdev: 4.871944, max: 98.261391, min: 86.486486)
14/11/24 17:43:38 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
14/11/24 17:43:38 INFO StatsReportListener:     86 %    86 %    86 %    88 %    88 %    98 %    98 %    98 %    98 %
14/11/24 17:43:38 WARN ThriftCLIService: Error fetching results: 
org.apache.hive.service.cli.HiveSQLException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 14, spark01): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: a valid progressable is required to report status to Hadoop
        org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:40)
        org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:44)
        org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:177)
        org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:58)
        org.apache.spark.sql.hive.SparkHiveHadoopWriter.write(SparkHadoopWriter.scala:98)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:151)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
        org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$1.apply(InsertIntoHiveTable.scala:158)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
    at org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:201)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175)
    at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
    at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
    at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
14/11/24 17:43:38 INFO StatsReportListener: other time pct: (count: 9, mean: 8.928399, stdev: 4.871944, max: 13.513514, min: 1.738609)
14/11/24 17:43:38 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
14/11/24 17:43:38 INFO StatsReportListener:      2 %     2 %     2 %     2 %    12 %    12 %    14 %    14 %    14 %
14/11/24 17:43:38 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on spark01:41857 (size: 40.2 KB, free: 530.2 MB)
14/11/24 17:43:38 WARN TaskSetManager: Lost task 0.3 in stage 5.0 (TID 15, spark01): org.apache.spark.TaskKilledException: 
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:168)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
14/11/24 17:43:40 WARN TaskSetManager: Lost task 2.2 in stage 5.0 (TID 10, spark01): TaskKilled (killed intentionally)
14/11/24 17:43:40 INFO YarnClientClusterScheduler: Removed TaskSet 5.0, whose tasks have all completed, from pool 
@costin
Copy link
Member

costin commented Nov 27, 2014

@experiences I've added some minor formatting to your comment to make the stacktrace readable.
Unfortunately the bug you are encountering is caused by incomplete support from Spark of the MapReduce layer - in this case the Progressableis not initialized. I take it you are using SparkSQL on top of Hive.
Can you please try 2.1.0.Beta3, which officially supports SparkSQL natively (as well as Spark through Java/Scala DSLs)? You still have the Input/OutputFormat however you can talk directly to Spark which gives you a lot more richness and better performance.

I'll look into enhancing the M/R formats by depending even less on the Map/Reduce infrastructure to improve compatibility with non-M/R frameworks.

costin added a commit that referenced this issue Dec 10, 2014
In Hadoop-like envs, Progressable might be invalid causing exceptions
when going through the MR layer.

Fix #330
@costin costin closed this as completed in 694a19d Dec 10, 2014
@costin
Copy link
Member

costin commented Dec 10, 2014

@experiences Hey, I've pushed a fix for this on 2.0.x and master including a snapshot build for 2.0.x.Can you please try it out (see the docs for more info)? Thanks!

@experiences
Copy link
Author

I haven't try it yet,but
At spark 1.2 release note it was list as a Known Issues:
java.io.FileNotFound exceptions when creating EXTERNAL hive tables. Work around - set hive.stats.autogather = false. SPARK-4892

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants