The goal of this post is to setup single node spark cluster in vagrant. Vagrant will prevent the the base OS from getting corrupted.
Vagrant is an open-source software product for building and maintaining portable virtual software development environments,[4] e.g. for VirtualBox, Hyper-V, Docker, VMware, and AWS. More information can be found on https://www.vagrantup.com/intro/getting-started/index.html
Follow the vagrant getting started guide (https://www.vagrantup.com/intro/getting-started/index.html)
Below are the steps:
- Install vagrant Download package from (https://www.vagrantup.com/downloads.html)
- Project Setup
$ mkdir vagrant_getting_started
$ cd vagrant_getting_started
$ vagrant init
- Installing Vagrant Box
$ vagrant box add hashicorp/precise64
Edit the Vagrantfile
Vagrant.configure("2") do |config|
config.vm.box = "hashicorp/precise64"
end
- Starting the machine
$ vagrant up
$ vagrant ssh
Once we have logged into the vagrant machine we can install the other components.
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
To download JDK we need to accept the license term. In the below wget we are accepting the license term by setting the cookie in header.
$ wget --no-check-certificate -c --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gzjdk-8u161-linux-x64.tar.gz
$ tar -zxvf jdk-8u161-linux-x64.tar.gz
Add lines in /home/vagrant/.bashrc
export JAVA_HOME=/home/vagrant/jdk1.8.0_161
export PATH=$PATH:/home/vagrant/jdk1.8.0_161/bin
- Install sbt
$ wget https://github.com/sbt/sbt/releases/download/v1.1.0/sbt-1.1.0.tgz
$ tar -zxvf sbt-1.1.0.tgz
- Install scala binaries
$ wget https://downloads.lightbend.com/scala/2.12.4/scala-2.12.4.tgz
$ ln -s scala-2.12.4/ scala
- Add scala and sbt in path: add below line in /home/vagrant/bashrc
export PATH=$PATH:/home/vagrant/sbt/bin:/home/vagrant/scala/bin
- Download and extract hadoop
$ wget http://www-us.apache.org/dist/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
$ tar -zxvf hadoop-3.0.0.tar.gz
$ ln -s hadoop-3.0.0 hadoop
- Modify JAVA_HOME value in hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/vagrant/jdk1.8.0_161
- Add binaries in path: add below line in /home/vagrant/bashrc
export PATH=$PATH:/home/vagrant/hadoop/sbin:/home/vagrant/hadoop/bin
$ wget http://www-eu.apache.org/dist/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
$ tar -zxvf spark-2.2.1-bin-hadoop2.7.tgz
$ ln -s spark-2.2.1-bin-hadoop2.7 spark
- Download and extract Apache Livy
$ sudo apt-get install unzip
$ wget http://www-eu.apache.org/dist/incubator/livy/0.4.0-incubating/livy-0.4.0-incubating-bin.zip
$ unzip livy-0.4.0-incubating-bin.zip
$ ln -s livy-0.4.0-incubating-bin/ livy
- Export SPARK_HOME: add below line in /home/vagrant/bashrc
export SPARK_HOME=/home/vagrant/spark
export HADOOP_CONF_DIR=/home/vagrant/hadoop/etc/hadoop/
- Create log directory
$ mkdir /home/vagrant/livy-0.4.0-incubating-bin/logs
- Install Python Requests package
$ sudo apt-get install python-pip
$ sudo pip install -U requests -i https://pypi.python.org/simple
Add the following line in VagrantFile
config.vm.network "forwarded_port", guest: 8088, host: 8488