-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pydoop mapreduce job hangs because pydoop is not supported by Cloudera Hadoop 2.6 CDH5 #216
Comments
Hi! --gianluigi |
Hi Gianluigi. The logs that I included above are the ones from: "<<<host_address>>>:19888/jobhistory/". That's all that's there. Nothing else. Thank you for your help. Job basically gets stuck on: I tried examples from the following page as well:
It works good. For some reason jobs hang only on mapreduce job. Am I missing something in a setup maybe or my runner? |
I also just tried to run this simple example as a script:
Command to start a script was: The same problem. There is something wrong with mapreduce API in pydoop. hdfs api works good for me, but mapreduce api does not. Or there are some settings that are required for Hadoop 2.6 CDH5 environment for pydoop's mapreduce api to start working. What are they, please if anybody knows, let me and everyone else who might face this issue know. Thank you for your help! On the following page:
then restart Hadoop daemons. Thank you. |
Hi @elzaggo. These are additional fresh set of logs from "....:19888/jobhistory/".
|
Hi. Do you see anything in the stderr or stdout logs for a task? Often this sort of failure coincides with the Python interpreter spitting out a message on stderr that can give us a clue as to exactly what's going on. Try running an example from the Pydoop installation. For instance, from the root Pydoop directory,
If it runs, then we'll know that Pydoop is properly installed on your cluster. If it doesn't, then we can try to figure out what's wrong with the installation. Luca |
Hello @ilveroluca. I did just what you have said. stderr logs are bellow. stdout logs bucket is empty, not a single line of log there. There is also example with new pydoop's api. I wanted to add that I installed pydoop using pip install, maybe I should reinstall it using source code from the repository? Maybe there is something wrong with a build installed by pip? Is there any way to get an older build of pydoop to try and see if that would work? Thank you.
Thank you so much for your help guys! |
Hello @ilveroluca. After setting paths for:
Found one more online thread where someone else faced similar, ran through the commands, environment is set correctly looks like, still I get above exception:
|
Now I got it to pick up host address and port. Still throws the following error. I masked host address by "HOST_ADDRESS" in the below error, but there is a correct one when exception is thrown.
|
OK, moving on. Got last issue resolved looks like, I am hoping :). Now I get the next exception. Stuck on this one now. Looks like it is looking for "mapred" in the wrong path, even though following returns correct path. And here we run on "Cloudera Distribution of Hadoop" if that helps.
Exception:
And when I checked again, there is mapred at the correct path, but then why do I get above exception?
|
So. I contacted Cloudera and this is what they told me:
So as of June 30st, 2016, Clouderas Distribution of Hadoop version CDH5 does not support Python's Hadoop Library "Pydoop". Though on CHD4 it was supported. I definitely wish CHD5 did support it. Maybe later support will be added. If Pydoop's support on Cloudera ever gets added and someone notices it before me, please send me a note to my LinkedIn. Thank you. |
Hi @wwwmaster2k . Sorry about the delay. I lost track of your issue. I'm afraid we never had "official" support from Cloudera, and they have a tendency to do things their own way -- different from the standard Hadoop distribution. We do our best to try to cope with their changes, but it costs us a lot of time and effort and since we're not Cloudera users issues can still slip by. Nevertheless, though they don't support our project, our tests on Travis run on various CDH5 releases. Have a look here:
That's the automated test results for the latest official Pydoop release (1.2.0). In theory your set-up should work. We'll take a closer look at your logs; hopefully we'll notice something. Oh, I saw how you're setting your environment variables. Actually, not that I think of it, it might help if you have a look at Cheers, Luca |
How did you install CDH? From packages, Cloudera Agent, or a tarball? |
Hello @ilveroluca. Thank you for your help. |
Hi, we tested Pydoop on CDH 5 installed on Ubuntu using deb packages. Maybe there is something different in other OS/installations methods that produces your bug. |
Ok @wwwmaster2k . You'll have to tell us where Reading the error you got, what's happening is that the |
Thank you so much for your help @ilveroluca and @mdrio. I ran find command and found: |
And what happens if you run the |
[hdfs]$ sh mapred |
I fixed path to mapred-config.sh in mapred executable and now I get following:
|
|
As of 2.0.0, Pydoop no longer tries to explicitly support the various layouts of customized Hadoop releases such as CDH or HDP. However, if the |
Hello guys. I have already tried all examples possible. I tried to set job to pick up files from local name node and from HDFS. The same problem. Can not find any online reference to the same behavior. What am I doing wrong? I am trying to run a simple word count example and that won't run. It hangs as soon as mapper looks like trying to read from the file. When I would specify wrong path, job does fail. Therefore I think the problem is on pydoop reading from file. I tried similar mapreduce on mrjob and it worked fine, but I want to use pydoop. What am I missing possibly? I am on the Hadoop 2.6 CDH5. I tried all examples I could find online. The same issue with all of them. Thank you.
Code of the Runner:
Code of the mapreduce example:
These are logs from console (more syslog logs are at the very bottom):
stderr logs:
syslog logs:
The text was updated successfully, but these errors were encountered: