Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch-hive0.14 issue #333

Closed
MaheshSankaran opened this issue Nov 27, 2014 · 6 comments
Closed

elasticsearch-hive0.14 issue #333

MaheshSankaran opened this issue Nov 27, 2014 · 6 comments

Comments

@MaheshSankaran
Copy link

Hi
I am currently working in hive0.14-elasticsearch connection.I use "curl XPUT" to sent data to elasticsearch . i created elasticsearch based hive table pointing to "es.resource",i could see data but i cant insert into elasticsearch storage table but it is working well in hive 0.13.1 and i could see data in it.

hive> insert overwrite table list select * from list_main;
Query ID = hadoop2_20141127124040_dece5520-15a8-4d6d-ab4b-5a95f274f51c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1417061780604_0011, Tracking URL = http://nn01:8088/proxy/application_1417061780604_0011/
Kill Command = /opt/hadoop-2.4.1/bin/hadoop job -kill job_1417061780604_0011
Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0
2014-11-27 12:40:49,470 Stage-0 map = 0%, reduce = 0%
Ended Job = job_1417061780604_0011 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-0: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

@costin
Copy link
Member

costin commented Nov 27, 2014

@MaheshSankaran I haven't tested Hive 0.14 yet waiting for Hive 0.14.1 to come up (since 0.14 pom contains SNAPSHOT dependencies).
I'll try it out and report back - can you please enable logging [1] and post a gist with the output? Based on your log above, it looks there's an error with inserting data but the actual cause is hidden...

Cheers,
[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/2.1.Beta/troubleshooting.html

@MaheshSankaran
Copy link
Author

Hi costin,
Here is the output of hive 0.14 with logging,

hive> insert overwrite table list select * from list_main;
14/11/28 10:14:46 [main]: INFO log.PerfLogger:
14/11/28 10:14:46 [main]: INFO log.PerfLogger:
14/11/28 10:14:46 [main]: INFO log.PerfLogger:
14/11/28 10:14:46 [main]: INFO log.PerfLogger:
14/11/28 10:14:46 [main]: INFO parse.ParseDriver: Parsing command: insert overwrite table list select * from list_main
14/11/28 10:14:46 [main]: INFO parse.ParseDriver: Parse Completed
14/11/28 10:14:46 [main]: INFO log.PerfLogger: </PERFLOG method=parse start=1417149886819 end=1417149886837 duration=18 from=org.apache.hadoop.hive.ql.Driver>
14/11/28 10:14:46 [main]: INFO log.PerfLogger:
14/11/28 10:14:46 [main]: INFO parse.SemanticAnalyzer: Starting Semantic Analysis
14/11/28 10:14:46 [main]: INFO parse.SemanticAnalyzer: Completed phase 1 of Semantic Analysis
14/11/28 10:14:46 [main]: INFO parse.SemanticAnalyzer: Get metadata for source tables
14/11/28 10:14:47 [main]: INFO parse.SemanticAnalyzer: Get metadata for subqueries
14/11/28 10:14:47 [main]: INFO parse.SemanticAnalyzer: Get metadata for destination tables
14/11/28 10:14:47 [main]: INFO parse.SemanticAnalyzer: Completed getting MetaData in Semantic Analysis
14/11/28 10:14:47 [main]: INFO parse.SemanticAnalyzer: Set stats collection dir : hdfs://hacluster/tmp/hive/hadoop2/5b528199-5fc3-452b-9791-ee3dca864564/hive_2014-11-28_10-14-46_819_9102972342057114632-1/-ext-10000
14/11/28 10:14:47 [main]: INFO ppd.OpProcFactory: Processing for FS(3)
14/11/28 10:14:47 [main]: INFO ppd.OpProcFactory: Processing for SEL(2)
14/11/28 10:14:47 [main]: INFO ppd.OpProcFactory: Processing for SEL(1)
14/11/28 10:14:47 [main]: INFO ppd.OpProcFactory: Processing for TS(0)
14/11/28 10:14:47 [main]: INFO log.PerfLogger:
14/11/28 10:14:47 [main]: INFO log.PerfLogger: </PERFLOG method=partition-retrieving start=1417149887884 end=1417149887888 duration=4 from=org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner>
14/11/28 10:14:47 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
14/11/28 10:14:47 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
14/11/28 10:14:47 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
14/11/28 10:14:47 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
14/11/28 10:14:47 [main]: INFO physical.NullScanTaskDispatcher: Looking for table scans where optimization is applicable
14/11/28 10:14:47 [main]: INFO physical.NullScanTaskDispatcher: Found 0 null table scans
14/11/28 10:14:47 [main]: INFO parse.SemanticAnalyzer: Completed plan generation
14/11/28 10:14:47 [main]: INFO ql.Driver: Semantic Analysis Completed
14/11/28 10:14:47 [main]: INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1417149886839 end=1417149887928 duration=1089 from=org.apache.hadoop.hive.ql.Driver>
14/11/28 10:14:47 [main]: INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:string, comment:null), FieldSchema(name:_col1, type:bigint, comment:null)], properties:null)
14/11/28 10:14:47 [main]: INFO log.PerfLogger: </PERFLOG method=compile start=1417149886818 end=1417149887929 duration=1111 from=org.apache.hadoop.hive.ql.Driver>
14/11/28 10:14:47 [main]: INFO log.PerfLogger:
14/11/28 10:14:48 [main]: INFO log.PerfLogger: </PERFLOG method=acquireReadWriteLocks start=1417149887929 end=1417149888843 duration=914 from=org.apache.hadoop.hive.ql.Driver>
14/11/28 10:14:48 [main]: INFO log.PerfLogger:
14/11/28 10:14:48 [main]: INFO ql.Driver: Starting command: insert overwrite table list select * from list_main
Query ID = hadoop2_20141128101414_35ea86f6-923d-41e8-a59a-0a6ce804c2be
14/11/28 10:14:48 [main]: INFO ql.Driver: Query ID = hadoop2_20141128101414_35ea86f6-923d-41e8-a59a-0a6ce804c2be
Total jobs = 1
14/11/28 10:14:48 [main]: INFO ql.Driver: Total jobs = 1
14/11/28 10:14:48 [main]: INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1417149886818 end=1417149888844 duration=2026 from=org.apache.hadoop.hive.ql.Driver>
14/11/28 10:14:48 [main]: INFO log.PerfLogger:
14/11/28 10:14:48 [main]: INFO log.PerfLogger:
Launching Job 1 out of 1
14/11/28 10:14:48 [main]: INFO ql.Driver: Launching Job 1 out of 1
14/11/28 10:14:48 [main]: INFO ql.Driver: Starting task [Stage-0:MAPRED] in parallel
14/11/28 10:14:48 [Thread-15]: INFO hive.metastore: Trying to connect to metastore with URI thrift://nn01:7099
14/11/28 10:14:48 [Thread-15]: INFO hive.metastore: Connected to metastore.
14/11/28 10:14:48 [Thread-15]: INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
Number of reduce tasks is set to 0 since there's no reduce operator
14/11/28 10:14:48 [Thread-15]: INFO exec.Task: Number of reduce tasks is set to 0 since there's no reduce operator
14/11/28 10:14:49 [Thread-15]: INFO ql.Context: New scratch dir is hdfs://hacluster/tmp/hive/hadoop2/5b528199-5fc3-452b-9791-ee3dca864564/hive_2014-11-28_10-14-46_819_9102972342057114632-2
14/11/28 10:14:49 [Thread-15]: INFO mr.ExecDriver: Using org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
14/11/28 10:14:49 [Thread-15]: INFO exec.Utilities: Processing alias list_main
14/11/28 10:14:49 [Thread-15]: INFO exec.Utilities: Adding input file hdfs://hacluster/user/hive/warehouse/elasticsearch.db/list_main
14/11/28 10:14:49 [Thread-15]: INFO exec.Utilities: Content Summary not cached for hdfs://hacluster/user/hive/warehouse/elasticsearch.db/list_main
14/11/28 10:14:49 [Thread-15]: INFO ql.Context: New scratch dir is hdfs://hacluster/tmp/hive/hadoop2/5b528199-5fc3-452b-9791-ee3dca864564/hive_2014-11-28_10-14-46_819_9102972342057114632-2
14/11/28 10:14:49 [Thread-15]: INFO log.PerfLogger:
14/11/28 10:14:49 [Thread-15]: INFO exec.Utilities: Serializing MapWork via kryo
14/11/28 10:14:50 [Thread-15]: INFO log.PerfLogger: </PERFLOG method=serializePlan start=1417149889332 end=1417149890275 duration=943 from=org.apache.hadoop.hive.ql.exec.Utilities>
14/11/28 10:14:50 [Thread-15]: INFO Configuration.deprecation: mapred.submit.replication is deprecated. Instead, use mapreduce.client.submit.file.replication
14/11/28 10:14:50 [Thread-15]: INFO client.RMProxy: Connecting to ResourceManager at nn01/10.10.10.25:8032
14/11/28 10:14:50 [Thread-15]: INFO client.RMProxy: Connecting to ResourceManager at nn01/10.10.10.25:8032
14/11/28 10:14:50 [Thread-15]: INFO exec.Utilities: No plan file found: hdfs://hacluster/tmp/hive/hadoop2/5b528199-5fc3-452b-9791-ee3dca864564/hive_2014-11-28_10-14-46_819_9102972342057114632-2/-mr-10002/8d483f90-3b1d-4f20-9c65-8b88ec7d757b/reduce.xml
14/11/28 10:14:50 [Thread-15]: WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption
14/11/28 10:14:50 [Thread-15]: INFO util.Version: Elasticsearch Hadoop v2.0.2 [ca81ff6]
14/11/28 10:14:50 [Thread-15]: INFO mr.EsOutputFormat: Writing to [test/list]
14/11/28 10:14:51 [Thread-15]: WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/11/28 10:14:53 [Thread-15]: INFO log.PerfLogger:
14/11/28 10:14:53 [Thread-15]: INFO io.CombineHiveInputFormat: CombineHiveInputSplit creating pool for hdfs://hacluster/user/hive/warehouse/elasticsearch.db/list_main; using filter path hdfs://hacluster/user/hive/warehouse/elasticsearch.db/list_main
14/11/28 10:14:53 [Thread-15]: INFO input.FileInputFormat: Total input paths to process : 2
14/11/28 10:14:53 [Thread-15]: INFO input.CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 3, size left: 0
14/11/28 10:14:53 [Thread-15]: INFO io.CombineHiveInputFormat: number of splits 1
14/11/28 10:14:53 [Thread-15]: INFO log.PerfLogger: </PERFLOG method=getSplits start=1417149893292 end=1417149893468 duration=176 from=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat>
14/11/28 10:14:53 [Thread-15]: INFO io.CombineHiveInputFormat: Number of all splits 1
14/11/28 10:14:53 [Thread-15]: INFO mapreduce.JobSubmitter: number of splits:1
14/11/28 10:14:54 [Thread-15]: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1417149067638_0001
14/11/28 10:14:54 [Thread-15]: INFO impl.YarnClientImpl: Submitted application application_1417149067638_0001
14/11/28 10:14:54 [Thread-15]: INFO mapreduce.Job: The url to track the job: http://nn01:8088/proxy/application_1417149067638_0001/
Starting Job = job_1417149067638_0001, Tracking URL = http://nn01:8088/proxy/application_1417149067638_0001/
14/11/28 10:14:54 [Thread-15]: INFO exec.Task: Starting Job = job_1417149067638_0001, Tracking URL = http://nn01:8088/proxy/application_1417149067638_0001/
Kill Command = /opt/hadoop-2.4.1/bin/hadoop job -kill job_1417149067638_0001
14/11/28 10:14:54 [Thread-15]: INFO exec.Task: Kill Command = /opt/hadoop-2.4.1/bin/hadoop job -kill job_1417149067638_0001
Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0
14/11/28 10:15:04 [Thread-15]: INFO exec.Task: Hadoop job information for Stage-0: number of mappers: 0; number of reducers: 0
14/11/28 10:15:04 [Thread-15]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2014-11-28 10:15:04,953 Stage-0 map = 0%, reduce = 0%
14/11/28 10:15:04 [Thread-15]: INFO exec.Task: 2014-11-28 10:15:04,953 Stage-0 map = 0%, reduce = 0%
14/11/28 10:15:04 [Thread-15]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
Ended Job = job_1417149067638_0001 with errors
14/11/28 10:15:05 [Thread-15]: ERROR exec.Task: Ended Job = job_1417149067638_0001 with errors
Error during job, obtaining debugging information...
14/11/28 10:15:05 [Thread-29]: ERROR exec.Task: Error during job, obtaining debugging information...
14/11/28 10:15:05 [Thread-29]: INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
14/11/28 10:15:05 [Thread-15]: INFO impl.YarnClientImpl: Killed application application_1417149067638_0001
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
14/11/28 10:15:06 [main]: ERROR ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
14/11/28 10:15:06 [main]: INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1417149888843 end=1417149906894 duration=18051 from=org.apache.hadoop.hive.ql.Driver>
MapReduce Jobs Launched:
14/11/28 10:15:06 [main]: INFO ql.Driver: MapReduce Jobs Launched:
14/11/28 10:15:06 [main]: WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
Stage-Stage-0: HDFS Read: 0 HDFS Write: 0 FAIL
14/11/28 10:15:06 [main]: INFO ql.Driver: Stage-Stage-0: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
14/11/28 10:15:06 [main]: INFO ql.Driver: Total MapReduce CPU Time Spent: 0 msec
14/11/28 10:15:06 [main]: INFO log.PerfLogger:
14/11/28 10:15:07 [main]: INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1417149906902 end=1417149907486 duration=584 from=org.apache.hadoop.hive.ql.Driver>
14/11/28 10:15:07 [main]: INFO log.PerfLogger:
14/11/28 10:15:07 [main]: INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1417149907554 end=1417149907554 duration=0 from=org.apache.hadoop.hive.ql.Driver>
hive>

Thanks
Mahesh.S

@costin
Copy link
Member

costin commented Dec 4, 2014

@MaheshSankaran I recommend using the just pushed 2.1.0.BUILD-SNAPSHOT nightly build (see the docs). Just upgraded es-hadoop to Hive 0.14 which internally had some breaking changes and again, broke backwards compatibility - I suspect these were the cause of your issue. The master addressed these so please try the latest snapshot and report back.

Thanks,

@costin costin added the feature label Dec 4, 2014
@costin
Copy link
Member

costin commented Dec 11, 2014

@MaheshSankaran any update on this front. I'm afraid I cannot reproduce the issue on my side..

@justinjoseph89
Copy link

@costin I was facing the same issue and once i changed the es-hadoop jar to 2.1.0.BUILD-SNAPSHOT its working perfectly.
Thanks,
Justin Joseph

@costin
Copy link
Member

costin commented Jan 15, 2015

@justinjoseph89 Thanks for the info!. Closing the issue.

@costin costin closed this as completed Jan 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants