oscon-bigtop

Resources for the demo presented at Bigtop presentation at OSCON 2013

Introduction

This repository contains data and code for use by a beginner's introduction to Apache Bigtop. All instructions have been inspired from the instructions on the Bigtop wiki page for 0.6.0

File list and description

README.md: This file, with instructions for your use
demo-setup.sh: A script I used to provision a single node for using in my demo. This script installs some essential tools and software, grabs a well known publicly available dataset and inserts it into a relational DB. Note that this script was run on Ubuntu Lucid 64 bit machine. It should work on other Ubuntu/Debian variants as well. It can be easily ported for use on RPM based systems. I haven't gotten a chance to do so but if you decide to do so, please send me a pull request, thanks! Note that Ubuntu doesn't come with JDK6 by default so this script also installs Oracle JDK6. By the time this script is done, your machine is ready for installing Bigtop as per the instructions below.
median_income_by_zipcode_census_2000.zip: A household income dataset from 2000 United States Census for use in the demo.

Demo VM setup

Let's prepare a VM for your setup. You can use any of OS for your VM. The setup script dmeo-setup.sh was run and has only been tested on Lucid. Also, I use vagrant for all my VM needs. Vagrant builds on top of Virtualbox but you are welcome to use any VM Hypervisor software of your choice. All you need is a vanilla Linux install at the end of the day:-) Log in to the VM and run the following commands:


wget https://raw.github.com/markgrover/oscon-bigtop/master/demo-setup.sh
chmod 755 ./demo-setup.sh
./demo-setup.sh

This setup may take a while, please be patient!

Instructions

Inspired from the Bigtop wiki page

Add Bigtop key so you can use the Bigtop artifacts with apt-get


wget -O- http://archive.apache.org/dist/bigtop/bigtop-0.6.0/repos/GPG-KEY-bigtop | sudo apt-key add -

Add Bigtop list, so apt-get knows where to find the Bigtop artifacts


sudo wget -O /etc/apt/sources.list.d/bigtop-0.6.0.list http://archive.apache.org/dist/bigtop/bigtop-0.6.0/repos/`lsb_release --codename --short`/bigtop.list

Update apt-get so it sees our newly added Bigtop repository


sudo apt-get update

Install our pseduo-distributed hadoop package from Apache Bigtop


sudo apt-get install hadoop-conf-pseudo

Initialize the name (needs to be done only once). Don't do it again, it will format (i.e wipe off) the data in your cluster (i.e. data on HDFS).


sudo service hadoop-hdfs-namenode init

Start the namenode and datanode


sudo service hadoop-hdfs-namenode start
sudo service hadoop-hdfs-datanode start

Initialize HDFS. This creates a bunch of directories on HDFS that are required for YARN to run


sudo /usr/lib/hadoop/libexec/init-hdfs.sh

Restart YARN daemons. Yarn needs the directories created by the previous step to work properly. Since we just created those directories, let's restart YARN daemons.


sudo service hadoop-yarn-resourcemanager restart
sudo service hadoop-yarn-nodemanager restart

Time to run our first MapReduce Job. This is run using MapReduce v2, running on top of YARN.


sudo -u root hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples*.jar pi 10 1000

If you want to install more artifacts, all you do is run a simple apt-get command


sudo apt-get install hive

There is a bug in Bigtop because of which we have to correct the permission of one of the HDFS directories before we run hive. Let's do that first


sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

Let's create a table in Hive. On bash, type:


hive -e "CREATE TABLE zipcode_incomes(id STRING, zip STRING, description1 STRING, description2 STRING, income INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','"

Load the data in Hive. On bash, type:


cd ~
hive -e "LOAD DATA LOCAL INPATH 'workspace/dataset/DEC_00_SF3_P077_with_ann.csv' OVERWRITE INTO TABLE zipcode_incomes"

Run Hive queries!

To clean up (only if necessary)


sudo rm -rf /var/lib/hadoop-hdfs/cache/*
sudo service hadoop-hdfs-datanode stop
sudo service hadoop-hdfs-namenode stop
sudo service hadoop-hdfs-namenode init

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
OSCON_Bigtop.pdf		OSCON_Bigtop.pdf
README.md		README.md
demo-setup.sh		demo-setup.sh
median_income_by_zipcode_census_2000.zip		median_income_by_zipcode_census_2000.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OSCON_Bigtop.pdf

OSCON_Bigtop.pdf

README.md

README.md

demo-setup.sh

demo-setup.sh

median_income_by_zipcode_census_2000.zip

median_income_by_zipcode_census_2000.zip

Repository files navigation

oscon-bigtop

Introduction

File list and description

Demo VM setup

Instructions

To clean up (only if necessary)

About

Releases

Packages

Languages

markgrover/oscon-bigtop

Folders and files

Latest commit

History

Repository files navigation

oscon-bigtop

Introduction

File list and description

Demo VM setup

Instructions

To clean up (only if necessary)

About

Resources

Stars

Watchers

Forks

Languages