Skip to content

datafibers/lab_env

Repository files navigation

Build Status

Overview

This is very lightweighted vagrant image for Hadoop big data lab. The total memory needed is only 4G (450M left after all service are started). It will take around 30 minutes to download and setup depending your download speed. The above daily status indicates if the software download url is live or broken.

Soft Installed

This distribution is compatible with HDP 2.6.4, besides upgrade hive and hadoop to stable version.

Hadooper Stream Visualization Utility
hadoop-2.7.7 flink-1.5.0 grafana-5.1.3 git
hive-1.2.2 spark-2.3.3 zeppelin-0.8.1 mysql
hive-2.3.6 confluent-4.1.1 maven
hbase-1.3.6 dos2unix
phoenix-4.13.2 aria2

Quick Setup

  1. Install Oracle VirtualBox in main operation system
  2. Install Vagrant in main operation system
  3. Go to a proper folder and git clone this repository git clone https://github.com/datafibers/lab_env.git
  4. If you prefer to install specific configuration from branch, git checkout <branch_name>
  5. If you prefer to customize install config, you can modify conf/install_config.sh or install_version.sh or this section.
  6. To install cd lab_env && vagrant up
  7. After installed, you'll need to run ops format to format hadoop only for the first time.
  8. To update the default settings, use command cd lab_env && git pull && vagrant provision

Operation Command Reference (Run inside VM)

  • Enter ops to get full command help
  • Enter ops start all to start all service
  • Enter ops status to check status as follows
vagrant@vagrant:~$ ops status
****************Starting Operations****************
[INFO]   [ZooKeeper]          is running at [3232]
[INFO]   [Kafka]              is running at [3302]
[INFO]   [Kafka_Connect]      is running at [3464]
[INFO]   [Schema_Registry]    is running at [3387]
[INFO]   [Flink_JobMgr]       is running at [4298]
[INFO]   [Flink_TaskMgr]      is running at [4644]
[INFO]   [Spark_Master]       is running at [4702]
[INFO]   [Spark_Worker]       is running at [4926]
[INFO]   [Zeppelin_Server]    is running at [5060]
[INFO]   [HBase_Master]       is running at [3658]
[INFO]   [HBase_Region]       is running at [3777]
[INFO]   [Hadoop_NameNode]    is running at [2131]
[INFO]   [Hadoop_DataNode]    is running at [2251]
[INFO]   [Yarn_ResourceMgr]   is running at [2615]
[INFO]   [Yarn_NodeMgr]       is running at [2737]
[INFO]   [HiveServer2]        is running at [2953]
[INFO]   [HiveMetaStore]      is running at [2952]
[INFO]   [Hive2Server2]       is running at [2954]
[INFO]   [Hive2MetaStore]     is running at [2955]

Tool Command Reference (Run inside VM)

  • Enter mongo to connect to mongodb
  • Enter mysql -u root --password="mypassword" to connect to mysql
  • Enter beeline -u jdbc:hive2://localhost:10000/ to connect to hive1
  • Enter beeline -u jdbc:hive2://localhost:10500/ to connect to hive2
  • Enter spark-sql to use spark sql shell
  • Enter spark-shell to use spark scala shell
  • Enter pyspark to use spark python shell
  • Enter hbase shell to use hbase shell
  • Enter sqlline.py localhost to use phoenix shell
  • Browse http://localhost:8080 to use zeppelin
  • Browse http://localhost:8001 to use flink web console
  • Browse http://localhost:3000 to use grafana
  • Browse http://localhost:8088 to check Yarn Application Master
  • Browse http://localhost:8042 to check Yarn Node Manager
  • Browse http://localhost:16010 to check HBase Master

Vagrant Command Reference (Run outside VM)

Purpose Command
Start the vm/image install vagrant up
Stop the vm vagrant halt
Update the vm git pull && vagrant provision
Suspend the vm/hibernate vagrant suspend
Wake up the vm vagrant resume
Restart the vm vagrant reload

Customization

  • Customize VM memory, either modify this line before install or adjust memory setting in virtualbox once install is done.

Known Issues

  • If the start up requires password, please do following setting.
ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 755 ~/.ssh/authorized_keys
  • To re-associate the vagrant and virtualbox at here
  • When vagrant provision has SSH authentication issues, add following in the Vagrantfile.
config.ssh.username = "vagrant"  
config.ssh.password = "vagrant"  
config.ssh.insert_key = false