## Launching a Hadoop cluster

We look at how a full Hadoop cluster can be launched on CloudLab 

In this lab, we are going to modify the profile of Lab 1 to contain:
- 4 VM nodes, named *node0* through *node3*
- Each node has a unique internal IP address, ranging from 192.168.1.1 to 192.168.1.4
- Each node will download and untar the hadoop-3.0.0.tar.gz file into /opt/hadoop and also download and copy the master and slaves files from the repository into /opt/hadoop/conf
- *node0* has a public IP address and launches the namenode daemon for the Hadoop cluster
- *node1* through *node3* will launch the datanode daemon 

In [None]:
%%writefile examples/profile.py
import geni.portal as portal
import geni.rspec.pg as rspec
import geni.rspec.igext as IG

pc = portal.Context()
request = rspec.Request()

pc.defineParameter("workerCount",
                   "Number of Hadoop DataNodes",
                   portal.ParameterType.INTEGER, 3)

pc.defineParameter("controllerHost", "Name of NameNode",
                   portal.ParameterType.STRING, "node0", advanced=True,
                   longDescription="The short name of the Hadoop NameNode.  You shold leave \
                   this alone unless you really want the hostname to change.")

params = pc.bindParameters()

tourDescription = "This profile provides a configurable Hadoop testbed with one NameNode \
and customizable number of DataNodes."

tourInstructions = \
  """
### Basic Instructions
Once your experiment nodes have booted, and this profile's configuration scripts \
have finished deploying Hadoop inside your experiment, you'll be able to visit 
[the HDFS Web UI](http://{host-%s}:9870) (approx. 5-15 minutes).  
""" % (params.controllerHost)

#
# Setup the Tour info with the above description and instructions.
#  
tour = IG.Tour()
tour.Description(IG.Tour.TEXT,tourDescription)
tour.Instructions(IG.Tour.MARKDOWN,tourInstructions)
request.addTour(tour)

# Create a link with type LAN
link = request.LAN("lan")

# Generate the nodes
for i in range(params.workerCount + 1):
    node = request.XenVM("node" + str(i))
    node.disk_image = "urn:publicid:IDN+emulab.net+image+emulab-ops:UBUNTU16-64-STD"
    iface = node.addInterface("if" + str(i))
    iface.component_id = "eth1"
    iface.addAddress(rspec.IPv4Address("192.168.1." + str(i + 1), "255.255.255.0"))
    link.addInterface(iface)
    
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="sudo wget http://apache.cs.utah.edu/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz"))
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="sudo tar xzf hadoop-3.0.0.tar.gz -C /opt/"))
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="sudo cp /local/repository/master /opt/hadoop-3.0.0/ect/hadoop/"))
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="sudo cp /local/repository/slaves /opt/hadoop-3.0.0/etc/hadoop/workers"))
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="sudo cp /local/repository/core-site.xml /opt/hadoop-3.0.0/etc/hadoop/core-site.xml"))
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="sudo apt-get update -y"))
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="sudo apt-get install -y default-jdk"))
    node.addService(rspec.Execute(shell="/bin/sh",
                                  command="export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/"))    
    if i != 0:
        node.addService(rspec.Execute(shell="/bin/sh",
                                      command="sudo sleep 30"))
        node.addService(rspec.Execute(shell="/bin/sh",
                                      command="sudo /opt/hadoop-3.0.0/bin/hadoop-daemon.sh start datanode"))
    else:
        node.routable_control_ip = True
        node.addService(rspec.Execute(shell="/bin/sh",
                                      command="sudo /opt/hadoop-3.0.0/bin/hdfs namenode -format PEARC18"))
        node.addService(rspec.Execute(shell="/bin/sh",
                                      command="sudo /opt/hadoop-3.0.0/bin/hdfs --daemon start namenode"))

# Print the RSpec to the enclosing page.
portal.context.printRequestRSpec(request)

Validate the profile

In [None]:
!python examples/profile.py

Once the profile is successfully launched, the profile instruction will have a link pointing to the web interface of the Hadoop cluster. 

<img src="figures/lab2/hadoop-profile.png"/>

<img src="figures/lab2/hadoop-webui.png"/>

**Challenge**

Update the profile such that YARN is also deployed, with the *Resource Manage* resides on *node0* and each of the remaining nodes hosts a *Node Manager* 
