#Hadoop & Yarn

JuliaCon 2015 Workshop

- _This notebook: https://github.com/tanmaykm/juliacon_
- _Code: https://gist.github.com/tanmaykm/ec0f34cd74813dd2547a_

##Installing HDFS / Yarn
Setup a toy cluster with docker containers
- Install docker (https://www.docker.com/)
- JuliaDockerImages - https://github.com/tanmaykm/JuliaDockerImages/tree/master/pkgdists/hadoop
    - `git clone https://github.com/tanmaykm/JuliaDockerImages.git`

- Download
    - `docker pull julialang/hadoop:v0.4.0_build5`
    - `docker tag julialang/hadoop:v0.4.0_build5 julialang/hadoop:latest`
- or Build
    - `docker build -t julialang/julia:v0.4.0 JuliaDockerImages/base/v0.4`
    - `docker build -t julialang/hadoop:v0.4.0_build5 JuliaDockerImages/pkgdists/hadoop`
    - `docker tag julialang/hadoop:v0.4.0_build5 julialang/hadoop:latest`

- Start the cluster
    - `cd JuliaDockerImages/pkgdists/hadoop`
    - `./cluster.sh start 5`
    - creates file `id_rsa` (the ssh key file to the cluster)
- Remember to stop the cluster after the workshop!
    - `./cluster.sh stop 5`

In [None]:
;docker ps -a

##TCP Ports Exposed:
- SSH: 22
- Yarn
    - Resource Manager: 8032
    - Scheduler: 8030
- HDFS:
    - HDFS Client: 9000
    - DFS Browser: 50070

## Figure out the connections
- ssh into the master node and get its IP address
- Browse the HDFS datastore:
    - Open http://[master]:50070/
    - Replace [master] with the IP of the master node

In [None]:
using Elly

# Master IP
MASTER_IP = "172.17.0.58";

# HDFS Client: 9000
HDFS_PORT = 9000;
# Yarn Resource Manager: 8032
YARNRM_PORT = 8032;
# Yarn Scheduler: 8030
YARNSCHED_PORT = 8030;

In [None]:
h = HDFSClient(MASTER_IP, HDFS_PORT)

In [None]:
fs_status = hdfs_status(h)

In [None]:
# create a folder, write and read a file
mkdir(h, "test")
cd(h, "test")

# write a file
hfile = HDFSFile(h, "testfile.txt")
open(hfile, "w") do fhandle
    println(fhandle, "hello world")
end
 
# read the file
open(hfile, "r") do fhandle
    bytes = Array(UInt8, filesize(fhandle))
    read!(fhandle, bytes)
    println(bytestring(bytes))
end

In [None]:
# delete the path and file we created
cd(h, "/")
rm(h, "test", true)