Skip to content

mnmami/Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Data Hackaton: Deploy a Spark Standalone cluster in one machine using Vagrant

STEP 1: Make the base setup

STEP 2: Get and configure the environment

  • Open a terminal and create a folder for your Vagrant project then navigate to it:
mkdir myvagrant
cd myvagrant
  • Create a file called Vagrantfile and put inside it:
Vagrant.configure("2") do |config|
  config.vm.provision "shell", inline: "echo Hello there"
  # config.ssh.insert_key = false

  config.vm.define "master" do |master|
    master.vm.box = "ubuntu/xenial64"
    master.vm.network "public_network", ip: "192.168.0.10"
    master.vm.network "forwarded_port", guest: 4040, host: 4040
    master.vm.network "forwarded_port", guest: 8080, host: 8080
    master.vm.hostname = "ubuntu1"
  end

  config.vm.define "slave" do |slave|
    slave.vm.box = "ubuntu/xenial64"
    slave.vm.network "public_network", ip: "192.168.0.11"
    slave.vm.network "forwarded_port", guest: 8081, host: 8081
    slave.vm.hostname = "ubuntu2"
  end
 end
  • Windows users
    • Uncomment third line # config.ssh.insert_key = false
    • do not use sudo in all the command lines of this step
  • Then run: sudo vagrant up and wait a few minutes
    • If you get asked which 'network interface' you should use, select the one you are connected to. For example, if you use an ethernet, common names are eth0 or em0. To be sure, run ifconfig and pick the one showing your current ip address.

STEP 3: Connect (SSH) to the Master and Slave boxes

Once STEP 2 is done successfully, we obtain two Linux 16.04 boxes (guest virtual machines) connected between them using a (public) network. One will be used as Apache Spark Master, the other for the slave. We also exposed the ports 4040, 8080 and 8181 to the host machine (that runs Vagrant). We use those ports to open web interfaces to the master and slave.

  • Now, ssh to the master using sudo vagrant ssh master and open another terminal and ssh to the slave using sudo vagrant ssh slave. Now you are moving to an Ubuntu System.
  • In both boxes run to install the missing packages: sudo apt-get update
  • If the command hangs with the message [Connecting to archive.ubuntu.com (2001:67c:1360:8c01::1a)], solve it by disabling ip6, solve it using the steps here: https://askubuntu.com/questions/440649/how-to-disable-ipv6-in-ubuntu-14-04

STEP 4: Install Java (in both boxes)

  • Run the dollowing 2 lines:
sudo apt-get install openjdk-8-jre
sudo apt-get update

STEP 5: Download and configure Spark (in both boxes)

  • we will install version 2.1, so run:
sudo wget https://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
sudo tar -xzvf  spark-2.1.0-bin-hadoop2.7.tgz 
cd spark-2.1.0-bin-hadoop2.7

  • Navigate to the conf folder and create Spark configurations file:
cd conf
sudo cp spark-env.sh.template spark-env.sh
  • Open spark-env.sh for editing and add the following line:
export SPARK_MASTER_HOST=192.168.0.10

STEP 6: Start Spark

  • In the Master box, navigate to the sbin folder and execute start-master.sh script:
cd ../sbin
sudo ./start-master.sh
  • This will return a message mentioning a logging file, open it to obtain the master URL. You should find spark://192.168.0.10:7077.
  • In the Slave box, also navigate to the sbin folder and execute start-slave.sh script passing Spark URL in argument:
cd ../sbin
sudo ./start-slave.sh spark://192.168.0.10:7077

STEP 7: Open Spark Shell

  • Navigate to the bin folder and run spark-shell script passing Spark URL in argument:
cd ../bin
sudo ./spark-shell --master spark://192.168.0.10:7077

Helpful tips


Have a question on the above? no panic, shoot me an email at: mami@cs.uni-bonn.de

About

Deploy a Spark Standalone cluster in one machine using Vagrant. A set of exercises prepared for GraDana project Hackaton

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published