Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run pig job #68

Closed
mohitanchlia opened this issue Aug 14, 2013 · 13 comments
Closed

Unable to run pig job #68

mohitanchlia opened this issue Aug 14, 2013 · 13 comments

Comments

@mohitanchlia
Copy link

I downloaded latest build from the repo and ran a simple job like this:

grunt> A = LOAD 'twitter/tweet/_search?q=kim' USING org.elasticsearch.hadoop.pig.ESStorage();
grunt> DUMP A;

And got error

Pig Stack Trace

ERROR 2117: Unexpected error when launching map reduce job.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A
at org.apache.pig.PigServer.openIterator(PigServer.java:838)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias A
at org.apache.pig.PigServer.storeEx(PigServer.java:937)
at org.apache.pig.PigServer.store(PigServer.java:900)
at org.apache.pig.PigServer.openIterator(PigServer.java:813)
... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:352)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
at org.apache.pig.PigServer.storeEx(PigServer.java:933)
... 14 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.elasticsearch.hadoop.mr.ESInputFormat.getSplits(ESInputFormat.java:335)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:319)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:239)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:270)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:160)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)

    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:676)
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)

Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.elasticsearch.hadoop.mr.ESInputFormat.getSplits(ESInputFormat.java:335)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:319)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:239)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:270)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:160)
at java.lang.Thread.run(Thread.java:662)

at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)

@costin
Copy link
Member

costin commented Aug 14, 2013

Looks like you're running Yarn/Hadoop-2.0. We don't provide any official artifacts for it yet but you can download the sources, compile them against the yarn jars and you should be good to go.

@mohitanchlia
Copy link
Author

Thanks! Is it just changing the pom files? I think I'll have to change the interfaces as well?

@costin
Copy link
Member

costin commented Aug 14, 2013

Just changing the artifact is enough for compiling (note there's no pom since we're using gradle but it should be straight forward how to do the change - we have a cdh branch for this as well).
There's no need to change anything to interfaces - the breaking change is in Hadoop not in the clients and a simple recompilation fixes everything.

@mohitanchlia
Copy link
Author

I just started going through gradle manual. But it would be helpful if you could tell me which file I need to change to add the dependency? Is it build.gradle?

Thanks

@costin
Copy link
Member

costin commented Aug 15, 2013

Try using the yarn profile - from the command line

gradlew -P distro=hadoopYarn

costin added a commit that referenced this issue Aug 15, 2013
costin added a commit that referenced this issue Aug 15, 2013
@costin
Copy link
Member

costin commented Aug 15, 2013

I've changed the build system so a yarn based binary is published as well - add yarn and you should be good to go.
Note that the pom still refers to the 1.x Hadoop artifacts (and that's because there's only one pom shared across the entire project).

I've already pushed a build so try it out and let me know how it goes.

Thanks,

@mohitanchlia
Copy link
Author

Thanks a lot for quick response. I'll get the most recent build snapshot and try it out.

@mohitanchlia
Copy link
Author

I took this jar elasticsearch-hadoop-1.3.0.BUILD-20130815.110726-110.jar and registered it with pig and tried running the job. I still get

Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.elasticsearch.hadoop.mr.ESInputFormat.getSplits(ESInputFormat.java:335)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:319)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:239)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:270)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:160)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)

    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:676)
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)

Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.elasticsearch.hadoop.mr.ESInputFormat.getSplits(ESInputFormat.java:335)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
"pig_1376586430066.log" 71L, 5234C

@costin
Copy link
Member

costin commented Aug 15, 2013

That's because you are not using the 'yarn' jar but the normal one. You need this jar: https://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/1.3.0.BUILD-SNAPSHOT/elasticsearch-hadoop-1.3.0.BUILD-20130815.110726-110-yarn.jar - notice the -yarn.

See the readme page on how to do that with Maven.

@mohitanchlia
Copy link
Author

Thanks! my mistake. I tried with the new jar and it worked. There was ES error but I think that is ES related. I have some questions on how READS are done in parallel, but I'll take a look at the split code. thanks

@costin
Copy link
Member

costin commented Aug 15, 2013

It's great to hear that it's working.
As for the reads - you can check the docs or also check out the mailing list. In short, we do the splitting based on shards - the upcoming reference docs goes into details about that.
You can read it yourself under the doc/ folder until we publish it on the website.

@costin costin added the bug label Mar 6, 2014
@Foolius
Copy link

Foolius commented Apr 3, 2014

I think I have a similar problem, although I'm not getting such a big stacktrace.
I only get this error message: 2014-04-03 14:36:01,762 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve EsStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]

with both the yarn and non-yarn version.

@costin
Copy link
Member

costin commented Apr 3, 2014

@Foolius please open a new issue - this one is 8 months old and the code base has changed significantly.
Also please include info about your environment and most importantly your script otherwise it's just a guessing game.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants