New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac OS X EI Capitan 下安装及配置伪分布式 Hadoop 环境 #10

Open
joyking7 opened this Issue Apr 4, 2016 · 9 comments

Comments

Projects
None yet
10 participants
@joyking7
Contributor

joyking7 commented Apr 4, 2016

Mac OS X EI Capitan 下安装及配置伪分布式 Hadoop 环境

一堆废话

前前后后几个星期都在看理论,所以趁着放小长假就搭了一下 hadoop 的环境,虽然教程一抓一大把,但是对于 Mac 上的伪分布搭建基本都是不怎么能跑的,各种博客都是互相转载,所以在撸了一部分官方文档之后,结合一些有点用的博客,总算是把这个环境打好了,正所以环境都不会搭,还谈什么开发,也是为了防止自己玩崩 hadoop 忘了怎么装,就写了这个,有兴趣的也可以考虑坑一下,对于 Linux 的话,教程很多,如果有时间,会再出一篇,各位看官往下看吧。

总环境配置

Mac OS X EI Captian 10.11.4

java version "1.8.0_77"

Hadoop 2.7.2

Xcode 7.3

Homebrew 0.9.5

一、预装环境配置

1. Homebrew

  • 打开<终端>窗口, 粘贴以下脚本

    /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    

2. JAVA

  • Oracle 官网下载 JDK8 的 Mac OS X 安装包:Java SE Downloads

  • 打开下载的 dmg 文件,双击包中的 pkg 文件进行安装

  • 打开<终端>,输入

    java -version
    
  • 显示为

    java version "1.8.0_77"
    Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
    Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
    
  • JDK目录为

    /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home
    

    3. Xcode

  • 打开 App Store 进行下载

  • PS:速度可能不是很快,但是官方的还是很安全

二、配置 SSH

为了保证远程登录管理 Hadoop 及 Hadoop 节点用户共享的安全性,Hadoop 需要配置使用 SSH 协议

  • 打开系统偏好设置-共享-远程登录-允许访问-所有用户

  • 打开<终端>,分别输入

    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
    
  • 配置好之后,输入

    ssh localhost
    
  • 显示

    Last login: Mon Apr  4 15:30:53 2016
    
  • 或者类似时间信息,即配置完成

    三、安装及配置 Hadoop

    1.安装 Hadoop

  • <终端>输入

    brew install hadoop
    
  • 显示如下即安装成功

    ==> Downloading https://www.apache.org/dyn/closer.cgi?path=hadoop/common/hadoop-
    ==> Best Mirror http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.2/hadoop-
    ######################################################################## 100.0%
    ==> Caveats
    In Hadoop's config file:
    /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh,
    /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-env.sh and
    /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-env.sh
    $JAVA_HOME has been set to be the output of:
    /usr/libexec/java_home
    ==> Summary
    ?  /usr/local/Cellar/hadoop/2.7.2: 6,304 files, 309.8M, built in 2 minutes 43 seconds
    

2. 配置伪分布式 Hadoop

(1)配置 hadoop-env.sh
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hadoop-env.sh
    
  •   export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
    
  • 修改为

    export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
    
(2) 配置 yarn-env.sh
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-env.sh
    
  • 加入

    YARN_OPTS="$YARN_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
    
(3)配置 core-site.xml
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/core-site.xml
    
  • 编辑

    <property>  
        <name>fs.defaultFS</name>             
        <value>hdfs://localhost:9000</value>          
    </property>
    
    (4) 配置 hdfs-core.xml
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/hdfs-site.xml
    
  • 编辑

     <property>
         <name>dfs.replication</name>
         <value>1</value>
    </property>
    
    (5) 配置 mapred-site.xml
  • <终端>依次输入

    cp /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml.template /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml
    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/mapred-site.xml
    
  • 编辑

     <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
    </property>
    
    (6) 配置 yarn-site.xml
  • <终端>输入

    open /usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/yarn-site.xml
    
  • 编辑

     <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
    

    3. 格式化 HDFS

  • <终端>输入

    rm -rf /tmp/hadoop-tanjiti #如果之前安装过需要清除
    hadoop namenode -format
    

    4.启动

  • 找到sbin目录

    cd /usr/local/Cellar/hadoop/2.7.2/sbin
    
    (1)启动 HDFS
    ./start-dfs.sh
    
    (2) 启动 MapReduce
    ./start-yarn.sh
    
    (3) 检查启动情况
    jps
    
  • 结果

    6467 Jps
    5991 DataNode
    6343 NodeManager
    6106 SecondaryNameNode
    6251 ResourceManager
    5901 NameNode
    

    5.运行 MapReduce 自带实例

  • 测算pi值的实例

    hadoop jar /usr/local/Cellar/hadoop/2.7.2/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi 2 5
    
  • 结果

Number of Maps  = 2
Samples per Map = 5
16/04/04 16:34:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
16/04/04 16:34:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/04 16:34:53 INFO input.FileInputFormat: Total input paths to process : 2
16/04/04 16:34:53 INFO mapreduce.JobSubmitter: number of splits:2
16/04/04 16:34:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459758345965_0002
16/04/04 16:34:53 INFO impl.YarnClientImpl: Submitted application application_1459758345965_0002
16/04/04 16:34:53 INFO mapreduce.Job: The url to track the job: http://mbp.local:8088/proxy/application_1459758345965_0002/
16/04/04 16:34:53 INFO mapreduce.Job: Running job: job_1459758345965_0002
16/04/04 16:34:59 INFO mapreduce.Job: Job job_1459758345965_0002 running in uber mode : false
16/04/04 16:34:59 INFO mapreduce.Job:  map 0% reduce 0%
16/04/04 16:35:06 INFO mapreduce.Job:  map 100% reduce 0%
16/04/04 16:35:12 INFO mapreduce.Job:  map 100% reduce 100%
16/04/04 16:35:12 INFO mapreduce.Job: Job job_1459758345965_0002 completed successfully
16/04/04 16:35:12 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=50
        FILE: Number of bytes written=353319
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=526
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=11
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=7821
        Total time spent by all reduces in occupied slots (ms)=2600
        Total time spent by all map tasks (ms)=7821
        Total time spent by all reduce tasks (ms)=2600
        Total vcore-milliseconds taken by all map tasks=7821
        Total vcore-milliseconds taken by all reduce tasks=2600
        Total megabyte-milliseconds taken by all map tasks=8008704
        Total megabyte-milliseconds taken by all reduce tasks=2662400
    Map-Reduce Framework
        Map input records=2
        Map output records=4
        Map output bytes=36
        Map output materialized bytes=56
        Input split bytes=290
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=56
        Reduce input records=4
        Reduce output records=0
        Spilled Records=8
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=196
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=547356672
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=236
    File Output Format Counters
        Bytes Written=97
Job Finished in 20.021 seconds
Estimated value of Pi is 3.60000000000000000000

6.可视化查看

四、总结

其实配置起来,如果按照上面的话,其实很快,但摸索的时候坑多,网速什么,路径什么,没事就会崩一崩。

环境搭好,继续撸理论,与一些也做这个的朋友们讨论了一下,还是要补一下统计学的知识,如果部门谁有兴趣,可以试一试哦。

@Zhangjd Zhangjd added the 数据挖掘 label Apr 4, 2016

@CaesarPan CaesarPan changed the title from Mac OS X EI Captian 下安装及配置伪分布式 Hadoop 环境 to Mac OS X EI Capitan 下安装及配置伪分布式 Hadoop 环境 Aug 24, 2016

@csyjgu

This comment has been minimized.

Show comment
Hide comment
@csyjgu

csyjgu Sep 4, 2016

很详细,太赞了!
我按楼主的步骤操作,到./start-dfs.sh就会出点问题,出现如下现象:
“2016-09-04 15:24:50.474 java[15275:613152] Unable to load realm info from SCDynamicStore”
在stackoverflow上查到,将“hadoopp-env.sh”中楼主修改的那句修改成如下可修复:
HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf=/dev/null"
这个不算问题。
现在遇到个问题,运行完start-dfs和start-yarn后,jps后出现如下
14572 ResourceManager
14242 NameNode
15585 Jps
4432
14450 SecondaryNameNode
14670 NodeManager
也就是DataNode没有显示出来,实际上4432应该是DataNode吧,不知怎么回事。
最后的示例也运行不了,显示如下问题:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/yjgu/QuasiMonteCarlo_1472973018410_433909577/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
请问楼主对这清不清楚,知道怎么解决吗。
谢谢。

csyjgu commented Sep 4, 2016

很详细,太赞了!
我按楼主的步骤操作,到./start-dfs.sh就会出点问题,出现如下现象:
“2016-09-04 15:24:50.474 java[15275:613152] Unable to load realm info from SCDynamicStore”
在stackoverflow上查到,将“hadoopp-env.sh”中楼主修改的那句修改成如下可修复:
HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf=/dev/null"
这个不算问题。
现在遇到个问题,运行完start-dfs和start-yarn后,jps后出现如下
14572 ResourceManager
14242 NameNode
15585 Jps
4432
14450 SecondaryNameNode
14670 NodeManager
也就是DataNode没有显示出来,实际上4432应该是DataNode吧,不知怎么回事。
最后的示例也运行不了,显示如下问题:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/yjgu/QuasiMonteCarlo_1472973018410_433909577/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
请问楼主对这清不清楚,知道怎么解决吗。
谢谢。

@yorkchu1995

This comment has been minimized.

Show comment
Hide comment
@yorkchu1995

yorkchu1995 Nov 30, 2016

求问, ssh localhost一直需要密码是怎么回事

yorkchu1995 commented Nov 30, 2016

求问, ssh localhost一直需要密码是怎么回事

@wjxiz1992

This comment has been minimized.

Show comment
Hide comment
@wjxiz1992

wjxiz1992 Nov 30, 2016

yorkchu1995:
你在之前可能已经配置过ssh了。

wjxiz1992 commented Nov 30, 2016

yorkchu1995:
你在之前可能已经配置过ssh了。

@poiu72

This comment has been minimized.

Show comment
Hide comment
@poiu72

poiu72 Mar 27, 2017

mac 上可以用普通的tar.gz的linux包安装运行hadoop么?楼主试过没有?

poiu72 commented Mar 27, 2017

mac 上可以用普通的tar.gz的linux包安装运行hadoop么?楼主试过没有?

@poiu72

This comment has been minimized.

Show comment
Hide comment
@poiu72

poiu72 Mar 27, 2017

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat /.ssh/id_dsa.pub >>/.ssh/authorized_keys
我执行上面这个好像不能免密,还是得输入密码,最后将dsa换成rsa就可以了

poiu72 commented Mar 27, 2017

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat /.ssh/id_dsa.pub >>/.ssh/authorized_keys
我执行上面这个好像不能免密,还是得输入密码,最后将dsa换成rsa就可以了

@l1905

This comment has been minimized.

Show comment
Hide comment
@l1905

l1905 Jun 13, 2017

感谢楼主教程, 刚又踩了一个坑, namenode跑不起来, 后来查看到, namenode用到的端口被占了

l1905 commented Jun 13, 2017

感谢楼主教程, 刚又踩了一个坑, namenode跑不起来, 后来查看到, namenode用到的端口被占了

@heidsoft

This comment has been minimized.

Show comment
Hide comment
@heidsoft

heidsoft Jun 28, 2017

job 一直处理Running 状态有遇到过?

17/06/28 23:16:11 INFO mapreduce.JobSubmitter: number of splits:1
17/06/28 23:16:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498662635832_0002
17/06/28 23:16:13 INFO impl.YarnClientImpl: Submitted application application_1498662635832_0002
17/06/28 23:16:13 INFO mapreduce.Job: The url to track the job: http://MacBook-Air-2.local:8088/proxy/application_1498662635832_0002/
17/06/28 23:16:13 INFO mapreduce.Job: Running job: job_1498662635832_0002

heidsoft commented Jun 28, 2017

job 一直处理Running 状态有遇到过?

17/06/28 23:16:11 INFO mapreduce.JobSubmitter: number of splits:1
17/06/28 23:16:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498662635832_0002
17/06/28 23:16:13 INFO impl.YarnClientImpl: Submitted application application_1498662635832_0002
17/06/28 23:16:13 INFO mapreduce.Job: The url to track the job: http://MacBook-Air-2.local:8088/proxy/application_1498662635832_0002/
17/06/28 23:16:13 INFO mapreduce.Job: Running job: job_1498662635832_0002

@Eustiar

This comment has been minimized.

Show comment
Hide comment
@Eustiar

Eustiar Aug 1, 2017

1楼应该是datanode没有启动,所以例子也无法运行,可能是namenode跟datanode的id不一致,你是否多次格式化过namenode

Eustiar commented Aug 1, 2017

1楼应该是datanode没有启动,所以例子也无法运行,可能是namenode跟datanode的id不一致,你是否多次格式化过namenode

@wzshpdjq123

This comment has been minimized.

Show comment
Hide comment
@wzshpdjq123

wzshpdjq123 Sep 16, 2018

一直提示18/09/16 18:45:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

wzshpdjq123 commented Sep 16, 2018

一直提示18/09/16 18:45:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment