# About: Hadoop準備用Notebook

----

Hadoop環境の準備用のNotebookです。以下のソフトウェアをインストールします。

- HDFS
- YARN
- HBase
- Hive
- Spark

## *Operation Note*

*This is a cell for your own recording.  ここに経緯を記述*

# インストール対象設定

インストール対象のグループ名(`hadoop_all_クラスタ名`)を指定してください。

In [1]:
target_group = 'hadoop_all_testcluster'

なお、各マシンは以下 prerequisite 項目の実行済みとします。

 - disable ipv6
 - ensure /hadoop/dataX directory for mount point is present
 - deploy /etc/hosts
 - deploy /etc/resolv.conf
 - setup ntpserver

操作のためのPlaybookを準備する。

In [2]:
import os
import tempfile

work_dir = tempfile.mkdtemp()
work_dir

'/tmp/tmpaDuGp8'

PlaybookはGitHubで公開しているものを使う。一時ディレクトリにcloneしておく。

In [3]:
!rm -fr {work_dir}/hadoop
!git clone https://github.com/NII-cloud-operation/Literate-computing-Hadoop.git {work_dir}/hadoop
!tree {work_dir}/hadoop

Cloning into '/tmp/tmpaDuGp8/hadoop'...
remote: Counting objects: 849, done.[K
remote: Compressing objects: 100% (7/7), done.[K
remote: Total 849 (delta 0), reused 0 (delta 0), pack-reused 841[K
Receiving objects: 100% (849/849), 172.26 KiB | 0 bytes/s, done.
Resolving deltas: 100% (267/267), done.
Checking connectivity... done.
/tmp/tmpaDuGp8/hadoop
└── playbooks
    ├── conf_base.retry
    ├── conf_base.yml
    ├── conf_hdfs_base.yml
    ├── conf_hdfs_spark.yml
    ├── conf_hdfs_tez.yml
    ├── conf_hdfs_yarn.yml
    ├── conf_namenode_bootstrapstandby.yml
    ├── conf_tez.yml
    ├── enter_hdfs_safemode.yml
    ├── format_namenode.yml
    ├── group_vars
    │   └── all
    │       ├── base
    │       ├── cgroups
    │       ├── collect
    │       ├── f500.dumpall
    │       ├── hbase_master
    │       ├── hbase_regionserver
    │       ├── hcatalog
    │       ├── hdfs_base
    │       ├── hdfs_spark
    │       ├── hdfs_tez
    │       ├── hdfs_yarn
    │       ├── hive
    │

In [4]:
playbook_dir = os.path.join(work_dir, 'hadoop/playbooks')
!ls -la {playbook_dir} | head

total 244
drwxr-xr-x  4 root root 4096 Aug 26 11:44 .
drwxr-xr-x  4 root root 4096 Aug 26 11:44 ..
-rw-r--r--  1 root root   13 Aug 26 11:44 conf_base.retry
-rw-r--r--  1 root root   39 Aug 26 11:44 conf_base.yml
-rw-r--r--  1 root root  136 Aug 26 11:44 conf_hdfs_base.yml
-rw-r--r--  1 root root  137 Aug 26 11:44 conf_hdfs_spark.yml
-rw-r--r--  1 root root  135 Aug 26 11:44 conf_hdfs_tez.yml
-rw-r--r--  1 root root  136 Aug 26 11:44 conf_hdfs_yarn.yml
-rw-r--r--  1 root root  188 Aug 26 11:44 conf_namenode_bootstrapstandby.yml


これでPlaybookの準備はOK。

## Notebook用変数の定義

Notebook上の各セルでスクリプトを実行する際に必要な変数を定義・・・

In [5]:
namenode_stdout = !ansible hadoop_namenode_primary -m ping -l {target_group}
active_namenode_host = map(lambda l: l.split()[0], filter(lambda l: 'SUCCESS' in l, namenode_stdout))[0]
namenode_stdout = !ansible hadoop_namenode_backup -m ping -l {target_group}
standby_namenode_host = map(lambda l: l.split()[0], filter(lambda l: 'SUCCESS' in l, namenode_stdout))[0]
print("active_namenode_host = '%s'\nstandby_namenode_host = '%s'\n" % (active_namenode_host, standby_namenode_host))

active_namenode_host = 'XXX.XXX.XXX.70'
standby_namenode_host = 'XXX.XXX.XXX.71'



## インストール対象マシンの確認

疎通確認・・・

In [6]:
!ansible hadoop_all -m ping -l {target_group}

[0;32mXXX.XXX.XXX.71 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m
[0;32mXXX.XXX.XXX.70 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m
[0;32mXXX.XXX.XXX.72 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m
[0;32mXXX.XXX.XXX.73 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m
[0;32mXXX.XXX.XXX.112 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m
[0;32mXXX.XXX.XXX.113 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m
[0;32mXXX.XXX.XXX.114 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}[0m


# ソフトウェアのインストールと起動

ソフトウェアは以下の手順でインストール・起動していきます。

- OS設定/cgroupsインストール・起動
- ZooKeeperインストール・起動
- HDFSインストール・起動
  - JournalNodeインストール・起動
  - NameNodeインストール・フォーマット・起動
  - DataNodeインストール・起動
- YARNインストール・起動
  - ResourceManagerインストール・起動
  - NodeManagerインストール・起動
  - MapReduceHistoryServerインストール・起動
  - TimelineServiceインストール・起動
- Tezインストール
- HBaseインストール・起動
- Hiveインストール
- Sparkインストール・起動

## インストール(OS/cgroups)

In [7]:
!ansible-playbook -l {target_group} {playbook_dir}/install-base.yml


PLAY [hadoop_all] **************************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.72][0m
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.73][0m
[0;32mok: [XXX.XXX.XXX.112][0m
[0;32mok: [XXX.XXX.XXX.113][0m
[0;32mok: [XXX.XXX.XXX.114][0m

TASK [os : include] ************************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/os/tasks/limits.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71, XXX.XXX.XXX.72, XXX.XXX.XXX.73, XXX.XXX.XXX.112, XXX.XXX.XXX.113, XXX.XXX.XXX.114[0m

TASK [os : set_nofile_soft_limit] **********************************************
[0;33mchanged: [XXX.XXX.XXX.70][0m
[0;33mchanged: [XXX.XXX.XXX.71][0m
[0;33mchanged: [XXX.XXX.XXX.72][0m
[0;33mchanged: [XXX.XXX.XXX.112][0m
[0;33mchanged: [XXX.XXX.XXX.73][0m
[0;33mchanged: [XXX.XXX.XXX.113][0m
[0;33mchanged: [XXX.XXX

念のため、各NodeからMasterへの疎通確認。

In [8]:
!ansible hadoop_all -a 'ping -c 4 {active_namenode_host}' -l {target_group}

[0;32mXXX.XXX.XXX.72 | SUCCESS | rc=0 >>
PING XXX.XXX.XXX.70 (XXX.XXX.XXX.70) 56(84) bytes of data.
64 bytes from XXX.XXX.XXX.70: icmp_seq=1 ttl=64 time=1.02 ms
64 bytes from XXX.XXX.XXX.70: icmp_seq=2 ttl=64 time=0.363 ms
64 bytes from XXX.XXX.XXX.70: icmp_seq=3 ttl=64 time=0.153 ms
64 bytes from XXX.XXX.XXX.70: icmp_seq=4 ttl=64 time=0.271 ms

--- XXX.XXX.XXX.70 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.153/0.452/1.023/0.338 ms
[0m
[0;32mXXX.XXX.XXX.70 | SUCCESS | rc=0 >>
PING XXX.XXX.XXX.70 (XXX.XXX.XXX.70) 56(84) bytes of data.
64 bytes from XXX.XXX.XXX.70: icmp_seq=1 ttl=64 time=0.031 ms
64 bytes from XXX.XXX.XXX.70: icmp_seq=2 ttl=64 time=0.048 ms
64 bytes from XXX.XXX.XXX.70: icmp_seq=3 ttl=64 time=0.012 ms
64 bytes from XXX.XXX.XXX.70: icmp_seq=4 ttl=64 time=0.031 ms

--- XXX.XXX.XXX.70 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.012/0.030/0.048/0

## ZooKeeper構築

### ZooKeeperのインストール

In [9]:
!ansible-playbook {playbook_dir}/install_zookeeper.yml -l {target_group}


PLAY [hadoop_zookeeperserver] **************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [java7 : include] *********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/java7/tasks/install.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71, XXX.XXX.XXX.72[0m

TASK [java7 : check_jdk7_installed] ********************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0m
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [java7 : download_oraclejdk7_by_wget] *************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m
[0m
[0;33mchanged: [XXX.XXX.XXX.70][0m
[0;33mchanged: [XXX.XXX.XXX.71][0m

TASK [java7 : md5sum_rpm] ******************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][

### ZooKeeperの起動

In [10]:
!ansible-playbook {playbook_dir}/start_zookeeper-server.yml -l {target_group}


PLAY [hadoop_zookeeperserver] **************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [start_zookeeper-server] **************************************************
[0;33mchanged: [XXX.XXX.XXX.70][0m
[0;33mchanged: [XXX.XXX.XXX.72][0m
[0;33mchanged: [XXX.XXX.XXX.71][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.70[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.71[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.72[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    fai

### ZooKeeperの動作確認

ZK上で *ls /* を実行した場合に *zookeeper* が返ってくればOK。(すでにZK上でサービスが動作している場合は他のファイルがあるかも)

In [11]:
zknode_stdout = !ansible hadoop_zookeeperserver -m ping -l {target_group}
zknodes = map(lambda l: l.split()[0], filter(lambda l: 'SUCCESS' in l, zknode_stdout))[0]
zknodes

'XXX.XXX.XXX.71'

`zk-shell` コマンド経由で動作確認を行う。

In [12]:
!zk-shell { zknodes } --run-once "ls /"

zookeeper


## HDFS構築・起動

### JournalNode

#### JournalNodeのインストール

In [13]:
!ansible-playbook -l {target_group} {playbook_dir}/install_journalnode.yml


PLAY [hadoop_journalnode] ******************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.72][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71, XXX.XXX.XXX.72[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.72][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71, XXX.XXX.XXX.72[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.

#### JournalNodeの起動

In [14]:
!ansible-playbook -l {target_group} {playbook_dir}/start_journalnode.yml


PLAY [hadoop_journalnode] ******************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.72][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [start_hadoop-hdfs-journalnode] *******************************************
[0;33mchanged: [XXX.XXX.XXX.71][0m
[0;33mchanged: [XXX.XXX.XXX.72][0m
[0;33mchanged: [XXX.XXX.XXX.70][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.70[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.71[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.72[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    fai

### NameNode

#### NameNodeのインストール

In [15]:
!ansible-playbook -l {target_group} {playbook_dir}/install_namenode.yml


PLAY [hadoop_namenode] *********************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : copy_conf_files] **************************************************
[0;32

#### Primary側のNameNodeのフォーマット

In [16]:
!ansible-playbook -l {target_group} {playbook_dir}/format_namenode.yml


PLAY [hadoop_namenode_primary] *************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.70[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.70[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [base : copy_conf_files] **************************************************
[0;32mok: [XXX.XXX.XXX.70] => (item=core-site.xml)[0m
[0;32mok: [XXX.XXX.XXX.70] => (item=hdfs-site.xml)[0m
[0;32mok: [XXX.XXX.XX

#### Active側のNameNodeを起動

In [17]:
!ansible-playbook -l {target_group} {playbook_dir}/start_namenode.yml -l { active_namenode_host }


PLAY [hadoop_namenode] *********************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [start_hadoop-hdfs-zkfc] **************************************************
[0;33mchanged: [XXX.XXX.XXX.70][0m

TASK [start_hadoop-hdfs-namenode] **********************************************
[0;33mchanged: [XXX.XXX.XXX.70][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.70[0m               : [0;32mok[0m[0;32m=[0m[0;32m3[0m    [0;33mchanged[0m[0;33m=[0m[0;33m2[0m    unreachable=0    failed=0   



#### Backup側のNameNodeをPrimary側に同期させる

In [18]:
!ansible-playbook -l {target_group} {playbook_dir}/conf_namenode_bootstrapstandby.yml -l { standby_namenode_host }


PLAY [hadoop_namenode] *********************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.71[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.71[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : copy_conf_files] **************************************************
[0;32mok: [XXX.XXX.XXX.71] => (item=core-site.xml)[0m
[0;32mok: [XXX.XXX.XXX.71] => (item=hdfs-site.xml)[0m
[0;32mok: [XXX.XXX.XX

#### Standby側のNameNodeを起動させる

In [19]:
!ansible-playbook -l {target_group} {playbook_dir}/start_namenode.yml -l { standby_namenode_host }


PLAY [hadoop_namenode] *********************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [start_hadoop-hdfs-zkfc] **************************************************
[0;33mchanged: [XXX.XXX.XXX.71][0m

TASK [start_hadoop-hdfs-namenode] **********************************************
[0;33mchanged: [XXX.XXX.XXX.71][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.71[0m               : [0;32mok[0m[0;32m=[0m[0;32m3[0m    [0;33mchanged[0m[0;33m=[0m[0;33m2[0m    unreachable=0    failed=0   



### DataNode(SlaveNode)

Slaveノードの構築の際に、DataNode, NodeManagerがインストールされる

#### DataNode/NodeManagerのインストール

*2016/08/06*

Notebookだけだと /hadoop/ のパーミッションが厳しすぎ(hdfsユーザのみアクセス可能)、YARN NodeManagerが /hadoop/tmp にアクセスすることができない。VMだったのでDisk mount用Prerequisiteを適用しなかったことが要因。

とりあえず応急措置として、fileモジュールを実行しておく。

In [20]:
!ansible -b -m file -a 'path=/hadoop state=directory owner=root group=root mode=0755' hadoop_slavenode -l {target_group}

[0;33mXXX.XXX.XXX.113 | SUCCESS => {
    "changed": true, 
    "gid": 0, 
    "group": "root", 
    "mode": "0755", 
    "owner": "root", 
    "path": "/hadoop", 
    "size": 4096, 
    "state": "directory", 
    "uid": 0
}[0m
[0;33mXXX.XXX.XXX.73 | SUCCESS => {
    "changed": true, 
    "gid": 0, 
    "group": "root", 
    "mode": "0755", 
    "owner": "root", 
    "path": "/hadoop", 
    "size": 4096, 
    "state": "directory", 
    "uid": 0
}[0m
[0;33mXXX.XXX.XXX.112 | SUCCESS => {
    "changed": true, 
    "gid": 0, 
    "group": "root", 
    "mode": "0755", 
    "owner": "root", 
    "path": "/hadoop", 
    "size": 4096, 
    "state": "directory", 
    "uid": 0
}[0m
[0;33mXXX.XXX.XXX.114 | SUCCESS => {
    "changed": true, 
    "gid": 0, 
    "group": "root", 
    "mode": "0755", 
    "owner": "root", 
    "path": "/hadoop", 
    "size": 4096, 
    "state": "directory", 
    "uid": 0
}[0m


In [21]:
!ansible-playbook -l {target_group} {playbook_dir}/install_slavenode.yml


PLAY [hadoop_slavenode] ********************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.113][0m
[0;32mok: [XXX.XXX.XXX.73][0m
[0;32mok: [XXX.XXX.XXX.112][0m
[0;32mok: [XXX.XXX.XXX.114][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.73, XXX.XXX.XXX.112, XXX.XXX.XXX.113, XXX.XXX.XXX.114[0m

TASK [base : install_hdp_repo] *************************************************
[0;33mchanged: [XXX.XXX.XXX.113][0m
[0;33mchanged: [XXX.XXX.XXX.73][0m
[0;33mchanged: [XXX.XXX.XXX.112][0m
[0;33mchanged: [XXX.XXX.XXX.114][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.73, XXX.XXX.XXX.112, XXX.XXX.XXX.113, XXX.XXX.XXX.114[0m

TASK [bas

#### DataNodeの起動

In [22]:
!ansible-playbook -l {target_group} {playbook_dir}/start_datanode.yml


PLAY [hadoop_slavenode] ********************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.73][0m
[0;32mok: [XXX.XXX.XXX.113][0m
[0;32mok: [XXX.XXX.XXX.112][0m
[0;32mok: [XXX.XXX.XXX.114][0m

TASK [start_hadoop-hdfs-datanode] **********************************************
[0;33mchanged: [XXX.XXX.XXX.113][0m
[0;33mchanged: [XXX.XXX.XXX.73][0m
[0;33mchanged: [XXX.XXX.XXX.112][0m
[0;33mchanged: [XXX.XXX.XXX.114][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.112[0m              : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.113[0m              : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.114[0m              : [0;32mok[0m[0;32m=[0m[0;32m

### HDFSの初期設定

HDFS上に必要なディレクトリを作成する。

In [23]:
!ansible-playbook -l {target_group} {playbook_dir}/conf_hdfs_base.yml


PLAY [hadoop_namenode_primary] *************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [java7 : include] *********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/java7/tasks/install.yml for XXX.XXX.XXX.70[0m

TASK [java7 : check_jdk7_installed] ********************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0m

TASK [java7 : download_oraclejdk7_by_wget] *************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : md5sum_rpm] ******************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : check_md5sum] ****************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : install_oraclejdk] ***********************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : includ

これでHDFS環境の準備は完了。以下のURLからNameNodeの状態が確認できる。

In [24]:
print("http://%s:50070" % active_namenode_host)

http://XXX.XXX.XXX.70:50070


## YARNのインストール・起動

### ResourceManagerのインストール

In [25]:
!ansible-playbook -l {target_group} {playbook_dir}/install_resourcemanager.yml


PLAY [hadoop_resourcemanager] **************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : copy_conf_files] **************************************************
[0;32

### NodeManagerのインストール

*HDFS内の手順『SlaveNodeのインストール』ですでにインストールされているため不要*

### HDFSの準備

HDFSにYARNの動作に必要なディレクトリを作成する

In [26]:
!ansible-playbook -l {target_group} {playbook_dir}/conf_hdfs_yarn.yml


PLAY [hadoop_namenode_primary] *************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [java7 : include] *********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/java7/tasks/install.yml for XXX.XXX.XXX.70[0m

TASK [java7 : check_jdk7_installed] ********************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0m

TASK [java7 : download_oraclejdk7_by_wget] *************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : md5sum_rpm] ******************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : check_md5sum] ****************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : install_oraclejdk] ***********************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : includ

### MapReduceHistoryServerのインストール

In [27]:
!ansible-playbook -l {target_group} {playbook_dir}/install_mapreduce_history.yml


PLAY [hadoop_mapreduce_historyserver] ******************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.72[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.72[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [base : copy_conf_files] **************************************************
[0;32mok: [XXX.XXX.XXX.72] => (item=core-site.xml)[0m
[0;32mok: [XXX.XXX.XXX.72] => (item=hdfs-site.xml)[0m
[0;32mok: [XXX.XXX.XX

### TimelineServiceのインストール

In [28]:
!ansible-playbook -l {target_group} {playbook_dir}/install_timelineservice.yml


PLAY [hadoop_timelineservice] **************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.72[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.72[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [base : copy_conf_files] **************************************************
[0;32mok: [XXX.XXX.XXX.72] => (item=core-site.xml)[0m
[0;32mok: [XXX.XXX.XXX.72] => (item=hdfs-site.xml)[0m
[0;32mok: [XXX.XXX.XX

### ResourceManagerを起動

In [29]:
!ansible-playbook -l {target_group} {playbook_dir}/start_resourcemanager.yml


PLAY [hadoop_resourcemanager] **************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [start_hadoop-yarn-resourcemanager] ***************************************
[0;33mchanged: [XXX.XXX.XXX.70][0m
[0;33mchanged: [XXX.XXX.XXX.71][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.70[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.71[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   



### NodeManagerを起動

In [30]:
!ansible-playbook -l {target_group} {playbook_dir}/start_nodemanager.yml


PLAY [hadoop_slavenode] ********************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.73][0m
[0;32mok: [XXX.XXX.XXX.112][0m
[0;32mok: [XXX.XXX.XXX.113][0m
[0;32mok: [XXX.XXX.XXX.114][0m

TASK [start_hadoop-yarn-nodemanager] *******************************************
[0;33mchanged: [XXX.XXX.XXX.113][0m
[0;33mchanged: [XXX.XXX.XXX.112][0m
[0;33mchanged: [XXX.XXX.XXX.73][0m
[0;33mchanged: [XXX.XXX.XXX.114][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.112[0m              : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.113[0m              : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.114[0m              : [0;32mok[0m[0;32m=[0m[0;32m

### MapReduceHistoryServerを起動

In [31]:
!ansible-playbook -l {target_group} {playbook_dir}/start_mapreduce_historyserver.yml


PLAY [hadoop_mapreduce_historyserver] ******************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [start_mapreduce-historyserver] *******************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.72[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   



### TimelineServiceを起動

In [32]:
!ansible-playbook -l {target_group} {playbook_dir}/start_timelineservice.yml


PLAY [hadoop_timelineservice] **************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [start_hadoop-yarn-timelineserver] ****************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.72[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   



これでYARNのインストールは完了。以下から動作状態を確認できる。

In [33]:
rmadmin_stdout = !ansible hadoop_resourcemanager -s -U yarn -m shell -a 'timeout 15 yarn rmadmin -getServiceState $(hostname)' -l {target_group}
rmadmin_result = map(lambda l: l.split()[0], filter(lambda l: len(l) > 0, rmadmin_stdout))
active_resourcemanager_host = rmadmin_result[rmadmin_result.index("active") - 1]
print("http://%s:8088/cluster/apps" % active_resourcemanager_host)

http://XXX.XXX.XXX.70:8088/cluster/apps


## Tez

In [34]:
!ansible-playbook -l {target_group} {playbook_dir}/conf_hdfs_tez.yml


PLAY [hadoop_namenode_primary] *************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [java7 : include] *********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/java7/tasks/install.yml for XXX.XXX.XXX.70[0m

TASK [java7 : check_jdk7_installed] ********************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0m

TASK [java7 : download_oraclejdk7_by_wget] *************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : md5sum_rpm] ******************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : check_md5sum] ****************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : install_oraclejdk] ***********************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : includ

In [35]:
!ansible-playbook -l {target_group} {playbook_dir}/conf_tez.yml


PLAY [hadoop_tez] **************************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [tez : include] ***********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/tez/tasks/install.yml for XXX.XXX.XXX.72[0m

TASK [tez : install_tez_packages] **********************************************
[0;33mchanged: [XXX.XXX.XXX.72] => (item=[u'tez'])[0m

TASK [tez : check_tez_packages_on_HDFS] ****************************************
[0;36m...ignoring[0m

TASK [tez : copy_tez_packages_to_HDFS] *****************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m

TASK [tez : change_owner_of_tez_packages] **************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m

TASK [tez : change_mode_of_tez_packages] ***************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m

TASK [tez : include] 

## HBase

### インストール

Masterをインストールする・・・

In [36]:
!ansible-playbook -l {target_group} {playbook_dir}/install_hbase_master.yml


PLAY [hadoop_hbase_master] *****************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.70, XXX.XXX.XXX.71[0m

TASK [base : create_hadoop_conf_dir] *******************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0;32mok: [XXX.XXX.XXX.71][0m

TASK [base : copy_conf_files] **************************************************
[0;32

RegionServerをインストールする・・・

In [37]:
!ansible-playbook -l {target_group} {playbook_dir}/install_hbase_regionserver.yml


PLAY [hadoop_hbase_regionserver] ***********************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.113][0m
[0;32mok: [XXX.XXX.XXX.112][0m
[0;32mok: [XXX.XXX.XXX.73][0m
[0;32mok: [XXX.XXX.XXX.114][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/repo.yml for XXX.XXX.XXX.73, XXX.XXX.XXX.112, XXX.XXX.XXX.113, XXX.XXX.XXX.114[0m

TASK [base : install_hdp_repo] *************************************************
[0;32mok: [XXX.XXX.XXX.73][0m
[0;32mok: [XXX.XXX.XXX.112][0m
[0;32mok: [XXX.XXX.XXX.113][0m
[0;32mok: [XXX.XXX.XXX.114][0m

TASK [base : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/base/tasks/conf.yml for XXX.XXX.XXX.73, XXX.XXX.XXX.112, XXX.XXX.XXX.113, XXX.XXX.XXX.114[0m

TASK [base : create_hadoop_co

### 起動

Masterを起動する・・・

In [38]:
!ansible-playbook -l {target_group} {playbook_dir}/start_hbase_master.yml


PLAY [hadoop_hbase_master] *****************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.71][0m
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [start_hbase-master] ******************************************************
[0;33mchanged: [XXX.XXX.XXX.71][0m
[0;33mchanged: [XXX.XXX.XXX.70][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.70[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.71[0m               : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   



RegionServerを起動する・・・

In [39]:
!ansible-playbook -l {target_group} {playbook_dir}/start_hbase_regionserver.yml


PLAY [hadoop_hbase_regionserver] ***********************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.112][0m
[0;32mok: [XXX.XXX.XXX.73][0m
[0;32mok: [XXX.XXX.XXX.113][0m
[0;32mok: [XXX.XXX.XXX.114][0m

TASK [start_hbase-regionserver] ************************************************
[0;33mchanged: [XXX.XXX.XXX.112][0m
[0;33mchanged: [XXX.XXX.XXX.73][0m
[0;33mchanged: [XXX.XXX.XXX.113][0m
[0;33mchanged: [XXX.XXX.XXX.114][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.112[0m              : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.113[0m              : [0;32mok[0m[0;32m=[0m[0;32m2[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   
[0;33mXXX.XXX.XXX.114[0m              : [0;32mok[0m[0;32m=[0m[0;32m

これでインストール完了。
以下から状態を確認できる。

In [40]:
hostname_stdout = !ansible -a hostname hadoop_hbase_master -l {target_group}
hbase_masters = map(lambda m: (m[1], m[0].split()[0]), filter(lambda m: 'SUCCESS' in m[0], zip(hostname_stdout, hostname_stdout[1:])))
hbase_masters

[('testvm001', 'XXX.XXX.XXX.70'), ('testvm002', 'XXX.XXX.XXX.71')]

In [41]:
from kazoo.client import KazooClient
zk = KazooClient(hosts='%s:2181' % zknodes, read_only=True)
zk.start()
(master_result,v) = zk.get("/hbase/master")
zk.stop()
hbase_master_host = filter(lambda m: m[0] in master_result, hbase_masters)[0][1]
print("http://%s:60010" % hbase_master_host)

http://XXX.XXX.XXX.71:60010


## Hive

Hiveのインストール・・・

*2016/08/06*

Notebookだけだと /hadoop/ のパーミッションが厳しすぎ(hdfsユーザのみアクセス可能)、metastoreを作成することができない。VMだったのでDisk mount用Prerequisiteを適用しなかったことが要因。

とりあえず応急措置として、タスクを実行しておく。

In [42]:
!ansible -b -m file -a 'path=/hadoop state=directory owner=root group=root mode=0755' hadoop_hive -l {target_group}
!ansible -b -m file -a 'path=/hadoop/data state=directory owner=root group=root mode=0755' hadoop_hive -l {target_group}

[0;33mXXX.XXX.XXX.72 | SUCCESS => {
    "changed": true, 
    "gid": 0, 
    "group": "root", 
    "mode": "0755", 
    "owner": "root", 
    "path": "/hadoop", 
    "size": 4096, 
    "state": "directory", 
    "uid": 0
}[0m
[0;33mXXX.XXX.XXX.72 | SUCCESS => {
    "changed": true, 
    "gid": 0, 
    "group": "root", 
    "mode": "0755", 
    "owner": "root", 
    "path": "/hadoop/data", 
    "size": 4096, 
    "state": "directory", 
    "uid": 0
}[0m


In [43]:
!ansible-playbook -l {target_group} {playbook_dir}/install_hive.yml


PLAY [hadoop_hive] *************************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [hive : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/hive/tasks/install.yml for XXX.XXX.XXX.72[0m

TASK [hive : install_hive_packages] ********************************************
[0;33mchanged: [XXX.XXX.XXX.72] => (item=[u'hive'])[0m

TASK [hive : include] **********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/hive/tasks/config.yml for XXX.XXX.XXX.72[0m

TASK [hive : create_hive_metastore_dir] ****************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m

TASK [hive : create_hive_conf_dir] *********************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [hive : include] ********************************************

OK・・・！

## Sparkのインストール・起動

Sparkをインストールする。

### HDFSの準備

In [44]:
!ansible-playbook -l {target_group} {playbook_dir}/conf_hdfs_spark.yml


PLAY [hadoop_namenode_primary] *************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.70][0m

TASK [java7 : include] *********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/java7/tasks/install.yml for XXX.XXX.XXX.70[0m

TASK [java7 : check_jdk7_installed] ********************************************
[0;32mok: [XXX.XXX.XXX.70][0m
[0m

TASK [java7 : download_oraclejdk7_by_wget] *************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : md5sum_rpm] ******************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : check_md5sum] ****************************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : install_oraclejdk] ***********************************************
[0;36mskipping: [XXX.XXX.XXX.70][0m

TASK [java7 : includ

### Sparkのインストール

In [45]:
!ansible-playbook -l {target_group} {playbook_dir}/install_spark.yml


PLAY [hadoop_spark] ************************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [java7 : include] *********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/java7/tasks/install.yml for XXX.XXX.XXX.72[0m

TASK [java7 : check_jdk7_installed] ********************************************
[0;32mok: [XXX.XXX.XXX.72][0m
[0m

TASK [java7 : download_oraclejdk7_by_wget] *************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : md5sum_rpm] ******************************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : check_md5sum] ****************************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : install_oraclejdk] ***********************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : includ

### Spark HistoryServerのインストール

In [46]:
!ansible-playbook -l {target_group} {playbook_dir}/install_spark_historyserver.yml


PLAY [hadoop_spark_history] ****************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [java7 : include] *********************************************************
[0;36mincluded: /tmp/tmpaDuGp8/hadoop/playbooks/roles/java7/tasks/install.yml for XXX.XXX.XXX.72[0m

TASK [java7 : check_jdk7_installed] ********************************************
[0;32mok: [XXX.XXX.XXX.72][0m
[0m

TASK [java7 : download_oraclejdk7_by_wget] *************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : md5sum_rpm] ******************************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : check_md5sum] ****************************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : install_oraclejdk] ***********************************************
[0;36mskipping: [XXX.XXX.XXX.72][0m

TASK [java7 : includ

### Spark HistoryServerの起動

In [47]:
!ansible-playbook -l {target_group} {playbook_dir}/start_spark_historyserver.yml


PLAY [hadoop_spark_history] ****************************************************

TASK [setup] *******************************************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [check_status_spark_history_server] ***************************************
[0;32mok: [XXX.XXX.XXX.72][0m

TASK [start_spark_history_server] **********************************************
[0;33mchanged: [XXX.XXX.XXX.72][0m

PLAY RECAP *********************************************************************
[0;33mXXX.XXX.XXX.72[0m               : [0;32mok[0m[0;32m=[0m[0;32m3[0m    [0;33mchanged[0m[0;33m=[0m[0;33m1[0m    unreachable=0    failed=0   



これでSparkの準備は完了。以下のURLからSparkの実行履歴が確認できる。

In [48]:
sparkhistory_stdout = !ansible hadoop_spark_history -m ping -l {target_group}
sparkhistory_nodes = map(lambda l: l.split()[0], filter(lambda l: 'SUCCESS' in l, sparkhistory_stdout))
print('http://%s:18080/' % sparkhistory_nodes[0])

http://XXX.XXX.XXX.72:18080/


# インストール後確認

ディスクの使用状況は？

In [49]:
!ansible -a "df -H" {target_group}

[0;32mXXX.XXX.XXX.72 | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       106G  2.8G   98G   3% /
tmpfs           5.2G     0  5.2G   0% /dev/shm
[0m
[0;32mXXX.XXX.XXX.71 | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       106G  2.4G   98G   3% /
tmpfs           5.2G     0  5.2G   0% /dev/shm
[0m
[0;32mXXX.XXX.XXX.70 | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       106G  2.4G   98G   3% /
tmpfs           5.2G     0  5.2G   0% /dev/shm
[0m
[0;32mXXX.XXX.XXX.113 | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       106G  2.4G   98G   3% /
tmpfs           5.2G     0  5.2G   0% /dev/shm
[0m
[0;32mXXX.XXX.XXX.112 | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       106G  2.5G   98G   3% /
tmpfs           5.2G     0  5.2G   0% /dev/shm
[0m
[0;32mXXX.XXX.XXX.73 | SUCCESS | rc=0 >>
Filesystem      Size  Used Avail Use% Mou

# 動作確認

## HDFSの動作確認

`hadoop_client` とした仮想マシンで、 `hdfs dfs -ls /` のように実行してみる。

(以下、Ansible経由でコマンド実行をしているが、SSH経由で実行してもOK)

In [50]:
!ansible -a "hdfs dfs -ls /" -l {target_group} hadoop_client

[0;32mXXX.XXX.XXX.72 | SUCCESS | rc=0 >>
Found 6 items
drwxr-xr-x   - hdfs   supergroup          0 2016-08-26 12:16 /apps
drwxr-xr-x   - hbase  supergroup          0 2016-08-26 12:20 /hbase
drwxr-xr-x   - mapred hadoop              0 2016-08-26 12:10 /mapred
drwxrwxrwt   - hdfs   hadoop              0 2016-08-26 12:10 /tmp
drwxrwxrwt   - hdfs   hadoop              0 2016-08-26 12:05 /user
drwxrwxrwt   - hdfs   hadoop              0 2016-08-26 12:09 /var
[0m


NameNodeのWeb UIの *Browsing HDFS* でも確認可能

In [51]:
haadmin_stdout = !ansible hadoop_namenode -l {target_group} -s -U hdfs -m shell -a 'timeout 15 hdfs haadmin -getServiceState $(hostname)'
haadmin_result = map(lambda line: line.split()[0], filter(lambda line: len(line) > 0, haadmin_stdout))
print("http://%s:50070/explorer.html" % haadmin_result[haadmin_result.index("active") - 1])

http://XXX.XXX.XXX.70:50070/explorer.html


## MapReduceの動作確認

サンプルジョブを動かしてみる。
`hadoop_client` で、 `yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 10 1000` のように実行してみる。ただしこの際、YARNのパーミッション定義の関係で、sudoでyarnユーザとして実行しないとエラーとなる。

> 詳しくは [T12b_Hadoop - Simple YARN job for Test](T12b_Hadoop - Simple YARN job for Test.ipynb) を参考にしてみてください。

In [52]:
!ansible -a "yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 10 1000" --sudo --sudo-user yarn -l {target_group} hadoop_client

[0;32mXXX.XXX.XXX.72 | SUCCESS | rc=0 >>
Number of Maps  = 10
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
Job Finished in 38.543 seconds
Estimated value of Pi is 3.1408000000000000000016/08/26 12:27:14 INFO impl.TimelineClientImpl: Timeline service address: http://testvm003:8188/ws/v1/timeline/
16/08/26 12:27:15 INFO input.FileInputFormat: Total input paths to process : 10
16/08/26 12:27:15 INFO mapreduce.JobSubmitter: number of splits:10
16/08/26 12:27:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1472181274763_0001
16/08/26 12:27:16 INFO client.YARNRunner: Number of stages: 2
16/08/26 12:27:17 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=XXX.XXX.XXX.2.4.2.0-258, revision=fa554fdce4e3495e09a310e0a32bb34

MapReduceの実行に関する情報は、Resource ManagerのWeb UIから確認可能。

In [53]:
rmadmin_stdout = !ansible hadoop_resourcemanager -l {target_group} -s -U yarn -m shell -a 'timeout 15 yarn rmadmin -getServiceState $(hostname)'
rmadmin_result = map(lambda line: line.split()[0], filter(lambda line: len(line) > 0, rmadmin_stdout))
active_resourcemanager_host = rmadmin_result[rmadmin_result.index("active") - 1]
print("http://%s:8088/cluster/apps" % active_resourcemanager_host)

http://XXX.XXX.XXX.70:8088/cluster/apps


## HBaseの動作確認

HBase Shellコマンドは HBase Master にインストールしてある。ここではテーブル一覧の確認だけしてみる。

> 詳しくは [T12c_Hadoop - Simple HBase query for Test](T12c_Hadoop - Simple HBase query for Test.ipynb) を参考にしてみてください。

In [59]:
!ansible -m shell -a "echo 'list;' | hbase shell" -l {target_group} {hbase_master_host}

[0;32mXXX.XXX.XXX.71 | SUCCESS | rc=0 >>
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version XXX.XXX.XXX.2.4.2.0-258, rUnknown, Mon Apr 25 06:36:21 UTC 2016

list;
TABLE
0 row(s) in 0.2920 seconds

[]SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/XXX.XXX.XXX.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/XXX.XXX.XXX.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[0m


## Hiveの動作確認

Hiveは hadoop_client (IPは出力参照) にインストールしてある。
ここではヘルプの確認のみおこなっています。

> 詳しくは [T13b_Hadoop - Simple Hivemall query for Test](T13b_Hadoop - Simple Hivemall query for Test.ipynb) を参考にしてみてください。

In [57]:
!ansible -a "hive --help" -l {target_group} --sudo --sudo-user yarn hadoop_client

[0;32mXXX.XXX.XXX.72 | SUCCESS | rc=0 >>
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cleardanglingscratchdir cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version 
Parameters parsed:
  --auxpath : Auxillary jars 
  --config : Hive configuration directory
  --service : Starts specific service/component. cli is default
Parameters used:
  HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
  HIVE_OPT : Hive options
For help on a particular service:
  ./hive --service serviceName --help
Debug help:  ./hive --debug --help
[0m


## Sparkの動作確認

Sparkクライアントは hadoop_client (IPは出力参照) にインストールしてある。
ここではヘルプの確認のみおこなっています。

> 詳しくは [T12d_Hadoop - Simple Spark script for Test](T12d_Hadoop - Simple Spark script for Test.ipynb) を参考にしてみてください。

In [60]:
!ansible -a 'spark-submit --help' hadoop_client -l {target_group}

[0;32mXXX.XXX.XXX.72 | SUCCESS | rc=0 >>
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.
  --packages                  Comma-separated list of maven coordinates of jars to include
                              on the 

# 後始末

一時ディレクトリを削除する。

In [61]:
!rm -fr {work_dir}