MapJoin failed, Configuration and input path are inconsistent #169

shenguoquan · 2014-03-19T09:54:12Z

 Recently I come across a strange problem. I want to use the elasticsearch-1.0.0 as a backend storage for hive. I use the elasticsearch-hadoop-1.3.0.M2 to create hive tables on elasticsearch. The hive sql are as followings:

create external table supplier_es (S_SUPPKEY BIGINT, S_NAME STRING, S_ADDRESS STRING, S_NATIONKEY BIGINT, S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='q9/supplier','es.index.auto.create'='true'，'es.nodes' = 'localhost:9200');

create external table nation_es (N_NATIONKEY BIGINT, N_NAME STRING, N_REGIONKEY BIGINT, N_COMMENT STRING) stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='q9/nation','es.index.auto.create'='true','es.nodes' = 'localhost:9200');

The table join operation is as followings:

select s_suppkey, n_name from supplier_es s join nation_es n on n.n_nationkey = s.s_nationkey;

the error messages( I get from the log file):
2014-03-19 15:16:39,447 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2014-03-19 15:16:39,448 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: fpath:hdfs://server-220:8020/user/hive/warehouse/nation_es
2014-03-19 15:16:39,462 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: getPathToAliases
2014-03-19 15:16:39,463 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias s to work list for file hdfs://server-220:8020/user/hive/warehouse/supplier_es
2014-03-19 15:16:39,465 ERROR [main] org.apache.hadoop.hive.ql.exec.MapOperator: Configuration does not have any alias for path: hdfs://server-220:8020/user/hive/warehouse/nation_es
2014-03-19 15:16:39,480 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper_aroundBody2(MapTask.java:434)
at org.apache.hadoop.mapred.MapTask$AjcClosure3.run(MapTask.java:1)
at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 19 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 24 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 27 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:142)
... 32 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:419)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
... 32 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:413)
... 33 more
I have try to figure out the problem, but I can't find out the reason. I ask anyone for help. Thanks very much.

The text was updated successfully, but these errors were encountered:

costin · 2014-03-19T12:50:47Z

Adding my reply from the mailing list
Hi, The issue might be caused by the fact that M2 doesn't support different input and output indices for the same job; that is to use ES both as input and output within the same job (which is essentially what you are doing with the select). This has been fixed in master - can you try the latest nightly build or potentially build master yourself?

shenguoquan · 2014-03-19T13:40:09Z

hi Costin
Thank you very much for your reply. I will try the version of
elasticsearch-hadoop you said. By the way, you said my problem is using
elasticsearch as input and output, but I just use it as input storage not
output. please correct me if I am wrong.
ÔÚ 2014-3-19 PM8:50£¬"Costin Leau" notifications@github.comÐ´µÀ£º

Adding my reply from the mailing list

Hi,
The issue might be caused by the fact that M2 doesn't support different
input and output indices for the same job; that is to use ES both as input
and output within the same job (which is essentially what you are doing
with the select).
This has been fixed in master - can you try the latest nightly build or
potentially build master yourself?

¡ª
Reply to this email directly or view it on GitHubhttps://github.com//issues/169#issuecomment-38046460
.

costin · 2014-03-19T13:55:03Z

You are not directly but to perform the join, Hive might create some jobs that look at both tables. I'm not certain that's the case however using the latest master should give the answer.
By the way. what distro and version of Hadoop and Hive are you using? I'm assuming the Intel one but I'm interested in the versions of all the aforementioned components.

Thanks,

shenguoquan · 2014-03-20T02:45:03Z

I use the hadoop verison 2.2.0 and hive version 0.12.0. By the way I have a try with the lastest master version . The problem has happened again.

costin · 2014-03-21T14:34:34Z

Hey,

I've wasted several hours today trying to reproduce this but the VMs I had gave me just grief. I'll try it again over the weekend.

Lazy initialize settings in a Hive environment Separate table properties per input/output to prevent clashing Save input properties (as Hive doesn't pass them in) relates to #169

costin · 2014-03-21T21:20:49Z

@guoquans I think I've fixed the issue in master - can you please give the master a try and let me know whether it works for you?
Thanks!

Rather than saving the table properties into our own properties, use the job properties which seem to be per table. The old logic is still in place just in case. Relates to #169

shenguoquan · 2014-03-23T05:21:50Z

@costin,I have fixed the issue. Can I contribute my code to you.
ÔÚ 2014-3-22 AM5:20£¬"Costin Leau" notifications@github.comÐ´µÀ£º

@guoquans https://github.com/guoquans I think I've fixed the issue in
master - can you please give the master a try and let me know whether it
works for you?
Thanks!

¡ª
Reply to this email directly or view it on GitHubhttps://github.com//issues/169#issuecomment-38325473
.

costin · 2014-03-23T08:14:53Z

@guoquans Sure - see the contributing link - most important piece is signing the CLA.

Does that mean the fix I pushed in master yesterday did not fix your issue?

shenguoquan · 2014-03-24T06:29:37Z

@costin I'm so sorry for late to repsonse your commit. yeah, I think we both kown the problem about the reason why running the mapJoin failed. The origin code will failed when running two tables join operation. Because they are also stored into elasticsearch. I debug the code and find the problem is configuration mixed. Take the es.resource.read setting for example, The one table EsStorageHandler method use the job configuration to set the parameter es.resource.read='xxx', but another table also use EsStoragehandler method use job configuration to set parameter es.resource.read='xxx'. Because the job configuration is global variable, The later setting will overwritten the before one. I think @costin you fix the problem. But I think if you add the HiveValueWriter,HiveBytesConverter and HiveValueReader setting is much better. I fixed the issue with my idea and test through the complicate case. I'm so appreciate if you can look at my code and test case. Thank you very much.

costin · 2014-03-24T09:54:20Z

@guoquans I'm not sure I understand what you are saying:

But I think if you add the HiveValueWriter,HiveBytesConverter and HiveValueReader setting is much better.

Not sure what you mean by this? The settings are currently added - is that a problem or not?

I fixed the issue with my idea and test through the complicate case. I'm so appreciate if you can look at my code and test case.

For various reasons, the best way to move forward is to look at the (contributing)[https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/CONTRIBUTING.md] - which in short means that after you sign the CLA, you can post the code either as a gist or pull request, I can look at it and see where we take it from there - potentially change my fix, integrate some of your,etc...

These being said, have you tried the fix in master? Can you confirm whether it works or not?

Thanks!

shenguoquan · 2014-03-24T10:38:43Z

hi @costin. I'm so sorry that my english is poor. So sometime I can't express precisely what I think.
--> But I think if you add the HiveValueWriter,HiveBytesConverter and HiveValueReader setting is much better.
This clause means that I guess you forget to set the settings about valueReader and valueWriter for EsStorageHandler. Because I can't see this code from master branch.

costin · 2014-03-24T10:56:03Z

No need to be sorry :)
The Writer/Converter/Reader are now set in a lazy manner - to postpone clashes - inside the Serialized - see the full commit.

Hence me asking whether you managed to try the master or not against your example; that's the ultimate test that everything runs as expected.

costin · 2014-03-24T11:50:52Z

@guoquans Hi - just noticed your pull request ( #173 ). For some reason Github didn't notify me and I didn't check for it until some minutes ago - sorry for the confusion.
It looks like you went for a similar approach to mine - that is using jobProperties to propagate the settings as oppose to using the configuration directly.

Cheers,

shenguoquan · 2014-03-24T11:56:42Z

@costin Hi, That's right. Using the job properties not job configuration can fixed the issue.I have already tested it.

shenguoquan · 2014-03-24T11:57:30Z

@costin thank you very much for your patient answer.

costin · 2014-03-24T12:04:10Z

thank you for reporting and testing out the fix! I'll close the issue since it seems to be resolved.

Cheers!

Lazy initialize settings in a Hive environment Separate table properties per input/output to prevent clashing Save input properties (as Hive doesn't pass them in) relates to #169

Rather than saving the table properties into our own properties, use the job properties which seem to be per table. The old logic is still in place just in case. Relates to #169

costin added bug labels Mar 19, 2014

costin closed this as completed Mar 24, 2014

costin added a commit that referenced this issue Apr 8, 2014

Always copy table properties to job properties

d6221a0

Rather than saving the table properties into our own properties, use the job properties which seem to be per table. The old logic is still in place just in case. Relates to #169

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MapJoin failed, Configuration and input path are inconsistent #169

MapJoin failed, Configuration and input path are inconsistent #169

shenguoquan commented Mar 19, 2014

costin commented Mar 19, 2014

shenguoquan commented Mar 19, 2014

costin commented Mar 19, 2014

shenguoquan commented Mar 20, 2014

costin commented Mar 21, 2014

costin commented Mar 21, 2014

shenguoquan commented Mar 23, 2014

costin commented Mar 23, 2014

shenguoquan commented Mar 24, 2014

costin commented Mar 24, 2014

shenguoquan commented Mar 24, 2014

costin commented Mar 24, 2014

costin commented Mar 24, 2014

shenguoquan commented Mar 24, 2014

shenguoquan commented Mar 24, 2014

costin commented Mar 24, 2014

MapJoin failed, Configuration and input path are inconsistent #169

MapJoin failed, Configuration and input path are inconsistent #169

Comments

shenguoquan commented Mar 19, 2014

costin commented Mar 19, 2014

shenguoquan commented Mar 19, 2014

costin commented Mar 19, 2014

shenguoquan commented Mar 20, 2014

costin commented Mar 21, 2014

costin commented Mar 21, 2014

shenguoquan commented Mar 23, 2014

costin commented Mar 23, 2014

shenguoquan commented Mar 24, 2014

costin commented Mar 24, 2014

shenguoquan commented Mar 24, 2014

costin commented Mar 24, 2014

costin commented Mar 24, 2014

shenguoquan commented Mar 24, 2014

shenguoquan commented Mar 24, 2014

costin commented Mar 24, 2014