Skip to content

Alluxio

Austin Ouyang edited this page Aug 30, 2016 · 1 revision

Introduction

Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.

Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code change. The project is open source (Apache License 2.0) and is deployed at multiple companies. It has more than 80 contributors from over 30 institutions, including Yahoo,Intel, Red Hat, and Tachyon Nexus. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution.

Setup Tachyon

Tachyon will be installed on the master and all the workers. For this simple installation, the configurations between the master and the workers are the same, so you can either use broadcast input in iTerm or simply rsync/copy the configuration file from the master to the other worker nodes

Run the following on the master and all workers by SSH-ing into each node:

Install java-development-kit
master-worker-node$ sudo apt-get update
master-worker-node$ sudo apt-get install openjdk-7-jdk

Install Tachyon
master-worker-node$ wget https://github.com/amplab/tachyon/releases/download/v0.7.1/tachyon-0.7.1-bin.tar.gz -P ~/Downloads
master-worker-node$ sudo tar zxvf ~/Downloads/tachyon-* -C /usr/local
master-worker-node$ sudo mv /usr/local/tachyon-* /usr/local/tachyon

Change ownership of the Tachyon directory
master-worker-node$ sudo chown -R ubuntu /usr/local/tachyon

Set the TACHYON_HOME environment variable and add to PATH in .profile
master-worker-node$ sudo nano ~/.profile
Add the following to ~/.profile and source it

export TACHYON_HOME=/usr/local/TACHYON
export PATH=$PATH:$TACHYON_HOME/bin

master-worker-node$ . ~/.profile

Set the TACHYON_MASTER_ADDRESS in tachyon-env
master-worker-node$ cp $TACHYON_HOME/conf/tachyon-env.sh.template $TACHYON_HOME/conf/tachyon-env.sh

master-worker-node$ nano $TACHYON_HOME/conf/tachyon-env.sh
Locate the following lines and change the TACHYON_MASTER_ADDRESS to the Master node’s hostname e.g. ip-172-31-239
...
export JAVA="$JAVA_HOME/bin/java"
export TACHYON_MASTER_ADDRESS=<master-hostname>
export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underFSStorage
...

Place worker hostnames into the workers file under $TACHYON_HOME/conf
master-worker-node$ nano $TACHYON_HOME/conf/workers
By default localhost is the only one in the file. Remove this before placing the worker hostnames

e.g. with 3 workers
ip-172-31-240
ip-172-31-241
ip-172-31-242

Format and Start Tachyon from the Master Node

SSH into the master node and run the following SSH into master node

localhost$ ssh -i ~/.ssh/personal_aws.pem ubuntu@master-public-dns

Format Tachyon
master-node$ $TACHYON_HOME/bin/tachyon format

Start Tachyon on master
master-node$ $TACHYON_HOME/bin/tachyon-start.sh all SudoMount

You can check if your standalone cluster is up and running by checking the WebUI at master-public-dns:19999. The webpage should look like the following. Be sure the number of workers available matches what you expect. The example below is a cluster of 4 nodes with 3 acting as workers.

You can check how much memory each worker is allocating under the Workers tab