Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use $HADOOP_PREFIX if available #844

Closed
coyotemarin opened this issue Dec 15, 2013 · 12 comments
Closed

use $HADOOP_PREFIX if available #844

coyotemarin opened this issue Dec 15, 2013 · 12 comments
Labels

Comments

@coyotemarin
Copy link
Collaborator

$HADOOP_HOME is actually a bit deprecated; newer versions of Hadoop use $HADOOP_PREFIX instead.

If neither the hadoop_home option nor $HADOOP_HOME is set, mrjob should use $HADOOP_PREFIX.

@coyotemarin
Copy link
Collaborator Author

As always, there's a wrinkle. When using $HADOOP_INSTALL, logs are stored by default in $HADOOP_INSTALL/logs, not $HADOOP_INSTALL/hadoop/logs. Which means we probably need a hadoop_log_dir option.

@coyotemarin
Copy link
Collaborator Author

Good grief, a lot of installs seem to keep the hadoop binary in $HADOOP_INSTALL/bin/hadoop, not $HADOOP_INSTALL/hadoop/bin/hadoop.

@coyotemarin
Copy link
Collaborator Author

Fortunately, the only ways we use these options are:

  • finding the hadoop binary
  • finding the Hadoop streaming jar
  • finding Hadoop's log directory

We could easily look for the hadoop binary in both <hadoop_home>/bin and <hadoop_home>/hadoop/bin directories. We already recurse through subdirectories to find the streaming jar. And the log directory is just <hadoop_home>/logs if $HADOOP_LOG_DIR isn't set.

Also, EMR now supports Hadoop 2.2.0, so I can poke around on there.

@coyotemarin
Copy link
Collaborator Author

Well, EMR's Hadoop 2.2.0 uses HADOOP_HOME and HADOOP_HOME_WARN_SUPPRESS.

Actually, it looks like HADOOP_INSTALL isn't actually read by Hadoop; it's more of a convention. HADOOP_HOME was replaced by HADOOP_PREFIX, which apparently works the same way.

@coyotemarin
Copy link
Collaborator Author

Yeah, looks like we should support HADOOP_PREFIX and ignore HADOOP_INSTALL for now.

@coyotemarin
Copy link
Collaborator Author

Here's a simlar ticket from another library that uses Hadoop.

@coyotemarin
Copy link
Collaborator Author

If we want to use the new $HADOOP_*_HOME environment variables, the hadoop binary is found at $HADOOP_COMMON_HOME/bin/hadoop, and the streaming jar is somewhere inside $HADOOP_MAPRED_HOME.

@coyotemarin
Copy link
Collaborator Author

And the log dir can be pretty much anywhere, including /var/log, depending on the installation/configs.

@anusha-r anusha-r modified the milestones: on the radar, v0.4.3 Jan 14, 2015
@anusha-r
Copy link
Contributor

Moving this issue out of 0.4.3 release since it seems a bit complicated. Will re-evaluate it once the current release is done.

@coyotemarin coyotemarin modified the milestones: v0.4.4, on the radar Mar 9, 2015
@coyotemarin coyotemarin modified the milestones: v0.4.4, on the radar Apr 7, 2015
@coyotemarin
Copy link
Collaborator Author

The 3.6.0 AMI uses $HADOOP_PREFIX, set to /home/hadoop. The hadoop binary is $HADOOP_PREFIX/bin/hadoop and the streaming jar is $HADOOP_PREFIX/contrib/streaming/hadoop-streaming.jar.

The log directory is /mnt/var/log/hadoop/. There is no $HADOOP_LOG_DIR set.

@coyotemarin coyotemarin modified the milestones: v0.4.4, v0.4.5 Apr 14, 2015
@coyotemarin coyotemarin modified the milestones: v0.4.5, v0.4.6 May 5, 2015
@coyotemarin coyotemarin modified the milestones: v0.4.6, v0.5.0 May 18, 2015
@coyotemarin
Copy link
Collaborator Author

Note that we don't currently have a test that the --hadoop-home switch works (and in fact it didn't until recently; see #1037). We should test both the --hadoop-home and --hadoop-prefix switches when we make this change.

@coyotemarin
Copy link
Collaborator Author

Replaced by #1160.

@coyotemarin coyotemarin removed their assignment Apr 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants