Skip to content

Deployment Guide en

akirakw edited this page Jun 27, 2011 · 7 revisions

#Deployment Guide (en)

#Table of Contents

#1. About This Document This document explains the procedure to deploy batch applications with Asakusa Framework on Hadoop clusters.

#2. Preparing for the DB server and Hadoop clusters Deploying Asakusa framework itself and the batch applications requires Hadoop clusters.Additionally, the DB server is required for Asakusa ThunderGate (following "ThunderGate") to read and write data to the Hadoop clustersIn this document, that DB server is called a "database node."

Install "Extractor" and "Collector" components of ThunderGate which handles the input/output process with HDFS, into the master node (any of other server in HDFS clusters is acceptable). In this document, we call that server to host "Extractor" and "Collector" components as "Hadoop client node".

Before deploying Asakusa framework, check the configuration of the Hadoop cluster and the database node as described below.In addition, as for supported versions of softwares, Please refer to Target Platform en

##2.1. Configure a Hadoop cluster environment Configure the environment for each of Hadoop cluster, according to below policy. and then run it and confirm the configuration is effective.

  • Create a new user account (ASAKUSA_USER) on operation system for the administrator of Asakusa.
  • Install Hadoop itself, and then configure it within fully-distributed mode.
  • Execute the sample jobs of Hadoop with the fully-distributed mode from ASAKUA_USER account to confirm the depolyment is correctly worked.

##2.2. Database node Configure the environment for the database node, according to below policy. and then run it and confirm the configuration is effective.

  • Create a new user account (ASAKUSA_USER) on operation system for the administrator of Asakusa.ASAKUSA_USER must be the same name for both of Hadoop cluster and database node.
  • Install MySQL server and confirm it works.
  • Configure to enable ASAKUSA_USER on database node to execute with ssh to ASAKUSA_USER on Hadoop client node without pass phrase.

#3. Preparing for the files for deployment Prepare for the following files to deploy Asakusa framework and batch applications.

##3.1. Required files to deploy Asakusa framework Download the source code from GitHub and create files to deploy Asakusa framework

To take the following steps, you will need Maven that is the same version of the Asakusa Framework development environment.So, it is recommended that you work on the development environment same as we used.

3.1.1. Obtain the source archive of Asakusa Framework

Download the source archive from GitHub repository of Asakusa Framework (https://github.com/asakusafw/asakusafw)

Following is the example to get Asakusa Framework Ver. 0.1.0 with wget command.

$ wget --no-check-certificate https://github.com/asakusafw/asakusafw/zipball/0.1.0

###3.1.2. Building Asakusa Framework Extract the archive and execute install phase against the pom.xml in "Asakusa aggreagor" project to build the whole Asakusa Framework modules.

Build with following examples. You can confirm the message "BUILD SUCCESS" if you succeed.

$ unzip asakusafw-asakusafw-*.zip
$ cd asakusafw-asakusafw-*/asakusa-aggregator
$ mvn clean install -Dmaven.test.skip=true
...
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

###3.1.3. Building an arhive of Asakusa Framework for deployment Execute assembly:single goal pom.xml on the project "asakusa-distribution". It will build an archive file for the deployment into Hadoop cluster.

Execute the build by following example. You'll see the message "BUILD SUCCESS" if you succeed.

$ cd ../asakusa-distribution
$ mvn clean assembly:single
...
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

3.1.4. Confirm the archive for deployment

Confirm you have the files listed below in the target directory under the asakusa-distribution.

  • 1.asakusa-distribution-${version}-prod-hc.tar.gz
    • The archive file to extract in Hadoop client node.
  • 2.asakusa-distribution-${version}-prod-db.tar.gz
    • The archive file to extract in the database node.

In addition, following files will be created in target directory, but we will not use those in this procedure.

  • 3.asakusa-distribution-${version}-prod-cleaner.tar.gz
    • The archive file using to deploy cleaning tool of Asakusa framework
  • 4.asakusa-distribution-${version}-dev.tar.gz
    • The archive file using to install the development environment of Asakusa framework.

##3.2. Required files to deploy batch applications Application developers s create or prepare following files in development environment

    1. Batch application's jar file generated from compiling batch (${artifactid}-batchapps-${version}.jar)
    • Created under target directory in the work space after compiling batches on development environment.
    • Details are in Application Development Guide (ja). See "7.4. Batch Build" for more information.
  • 2.Application common library
    • Shared libraries used by batch applications (Non-Hadoop libraries. For example, Apache Commons Lang, etc.) Only if your application needs them.
  1. Table DDLs for Application
    • DDL for application initialization. Applied to DBMS on database node.

#4. Asakusa Framework Deployment Deploy Asakusa framework into Hadoop cluster and database node.

** In the remaining steps in this document, please execute commands using "ASAKUSA_USER" unless otherwise specified. **

##4.1. Deploy Asakusa framework into Hadoop client node Deploy Asakua framework into Hadoop client node.

  1. Add environment variables, HADOOP_HOME and ASAKUSA_HOME, into ASAKUSA_USER's ~/.bash_profie
export ASAKUSA_HOME="$HOME/asakusa"
export HADOOP_HOME=/usr/lib/hadoop
* We suppose you use "$HOME/asakusa" for ASAKUSA_HOME in following steps.

Load the environment variables into current shell environment.

$ Source ~/.bash_profile
  1. Create a directory ASAKUSA_HOME. Then extract the archive file for Hadoop client node named "asakusa-distribution-${version}-prod-hc.tar.gz" in ASAKUSA_HOME.After extracting, add execution permission to *.sh files in ASAKUSA_HOME.
$ mkdir $ASAKUSA_HOME
$ mv asakusa-distribution-*-prod-hc.tar.gz $ASAKUSA_HOME
$ cd $ASAKUSA_HOME
$ tar -xzf asakusa-distribution-*-prod-hc.tar.gz
$ find $ASAKUSA_HOME -name "*.sh" | xargs chmod u+x
  1. Move $ASAKUSA_HOME/bulkloader/bin/bulkloader_hc_profile to $HOME
$ mv $ASAKUSA_HOME/bulkloader/bin/.bulkloader_hc_profile $HOME
  1. Edit $HOME/.bulkloader_hc_profile to add following variables
export ASAKUSA_HOME=$HOME/asakusa
export JAVA_HOME=/usr/java/default
export HADOOP_HOME=/usr/lib/hadoop
  1. Edit $ASAKUSA_HOME/bulkloader/conf/bulkloader-conf-hc.properties
hdfs-protocol-host=hdfs://(MASTERNODE_HOSTNAME):8020
* For hdfs-protocol-host, Use the same value as fs.dfault.name in $HADOOP_HOME/conf/core-site.xml
  1. Edit the log configuration files for Hadoop bulk loader. Edit $ASAKUSA_HOME/bulkloader/conf/log4j.xml to specify the log directory of your choice.

    • Do not change the log file name "${logfile.basename}.log"
  2. Create a log directory specified in 7. The log directory must be writeable by ASAKUSA_USER.

$ mkdir $ASAKUSA_HOME/log

##4.2. Deploy Asakusa framework to the database node Deploy Asakusa framework to the database node.

  1. Add the environment variable ASAKUSA_HOME to ~/.bash_profile of ASAKUSA_USER
export ASAKUSA_HOME=$HOME/asakusa

Load the environment variables into current shell environment.

$ source ~/.bash_profile
  1. Create a directory ASAKUSA_HOME, then extract the archive file for the database node named "asakusa-distribution-${version}-prod-db.tar.gz".After extraction, add execute permission to *.sh files in ASAKUSA_HOME.
$ mkdir $ASAKUSA_HOME
$ mv asakusa-distribution-*-prod-db.tar.gz $ASAKUSA_HOME
$ cd $ASAKUSA_HOME
$ tar -xzf asakusa-distribution-*-prod-db.tar.gz
$ find $ASAKUSA_HOME -name "*. sh" | xargs chmod u+x
  1. Move $ASAKUSA_HOME/bulkloader/bin/.bulkloader_db_profile to $HOME.
$ mv $ASAKUSA_HOME/bulkloader/bin/.bulkloader_db_profile $HOME
  1. Edit $HOME/.bulkloader_hc_profile to configure following variables according to your environment.
export ASAKUSA_HOME=$HOME/asakusa
export JAVA_HOME=/usr/java/default
  1. Edit $ASAKUSA_HOME/bulkloader/conf/bulkloader-conf-db.properties to configure following properties according to your envirinment.
hadoop-cluster.host=(HADOOP_MASTER_NODE_HOSTNAME)
hadoop-cluster.user=(ASAKUSA_USER)

import.tsv-create-dir=/var/tmp/asakusa/importer
import.extractor-shell-name=asakusa/bulkloader/bin/extractor.sh

export.tsv-create-dir=/var/tmp/asakusa/exporter
export.collector-shell-name=asakusa/bulkloader/bin/collector.sh
* Specify Hadoop client node hostname to "hadoop-cluster.host"
* Specify ASAKUSA_USER name to "hadoop-cluster.user"
* Specify directory path to "import.tsv-create-dir" and "export.tsv-create-dir"Please refer step 9 in mind.
* If you change $ASAKUSA_HOME value from "$HOME/asakusa", you must specify "Import.extractor-shell-name" and "export.collector-shell-name"
    * Specify the path of extractor.sh and collector.sh by absolute path or relative path from $HOME
  1. Edit the log configuration file for Hadoop bulk loader log.Edit "$ASAKUSA_HOME/bulkloader/conf/log4j.xml" to specify a log directory.

  2. Create a log directory specified in step 7.That directory must be writable for ASAKUSA_USER.

$ mkdir $ASAKUSA_HOME/log
  1. Create the directories specified in "import.tsv-create-dir" and "export-create-dir" at step 6. Permissions for these directories must be readable and writable for both ASAKUSA_USER and MySQL user.This work is needed in the root (or using sudo).
$ mkdir -p -m 777 /var/tmp/asakusa/importer
$ mkdir -p -m 777 /var/tmp/asakusa/exporter
$ chown -R mysql:mysql /var/tmp/asakusa

#5. Deploy and test sample applications Execute sample applications by experimental.sh command. Sample applications are in the project which is generated from the archetype given by Asakusa framework for application development. Confirm Map Reduce applications generated by Asakusa and ThunderGate at work.

You can skip this chapter. But we recommend you to do this to check the correctness of deployment of Asakusa framework.
In addition, when you do these steps, we recommend you to create a new project only for sample applications from archetype, aside from the project for production environment. Then deploy batch-compiled from only sample applications.

##5.1. Deploy sample applications to Hadoop client node

  1. Move sample application file to $ASAKUSA_HOME/batchapps directory.The following is a deployment example that deploys a jar file which you compiled from sample project "batchapp" on $HOME/work.
$ cp batchapp-batchapps-*.jar $ASAKUSA_HOME/batchapps
$ cd $ASAKUSA_HOME/batchapps
$ jar -xf batchapp-batchapps-*.jar
$ find . -name "*.sh" | xargs chmod u+x
$ rm -f batchapp-batchapps-*.jar
$ rm -fr META-INF

NOTE: Do not deploy wrong jar files.Deployment targets are named as "${artifactId}-batchapps-{version}.jar". They are jar files which have "batchapps" in their name after artifact ID.

For example, If you compiled for sample project "batchapp", You will have three files under target directory.

  • batchapp-{version}-sources.jar: Not for deployment
  • batchapp-{version}.jar:Not for deployment
  • batchapp-batchapps-{version}.jar:Deploy this file.

##5.2. Deploy a sample application to database node

  1. Move sample sapplication file to $ASAKUSA_HOME/batchapps directory.Do the same steps as you did for Hadoop client node with same files.
$ cp batchapp-batchapps-*.jar $ASAKUSA_HOME/batchapps
$ cd $ASAKUSA_HOME/batchapps
$ jar -xf batchapp-batchapps-*.jar
$ find . -name "*.sh" | xargs chmod u+x
$ rm -f batchapp-batchapps-*.jar
$ rm -fr META-INF
  1. copy $ASAKUSA_HOME/bulkloader/conf/[targetname]-jdbc.properties as asakusa-jdbc.properties in the same directory.
$ cp $ASAKUSA_HOME/bulkloader/conf/[targetname]-jdbc.properties \
  $ASAKUSA_HOME/bulkloader/conf/asakusa-jdbc.properties 
  1. Create a database for the sample application.Execute following SQL against MySQL.
DROP DATABASE IF EXISTS asakusa;
CREATE DATABASE asakusa DEFAULT CHARACTER SET utf8;
GRANT ALL PRIVILEGES ON *.* TO 'asakusa'@'localhost'
  IDENTIFIED BY 'asakusa' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'asakusa'@'%'
  IDENTIFIED BY 'asakusa' WITH GRANT OPTION;

DROP TABLE IF EXISTS asakusa.EX1;
CREATE TABLE asakusa.EX1 (
  SID BIGINT AUTO_INCREMENT,
  VALUE INT NULL,
  STRING VARCHAR(255) NULL,
  VERSION_NO BIGINT NULL,
  RGST_DATETIME DATETIME NULL,
  UPDT_DATETIME DATETIME NULL,
  PRIMARY KEY (SID)) type=InnoDB;
DROP TABLE IF EXISTS asakusa.EX1_RL;
CREATE TABLE asakusa.EX1_RL (
  SID BIGINT PRIMARY KEY,
  JOBFLOW_SID BIGINT NULL
) Type=InnoDB;
DROP TABLE IF EXISTS asakusa.EX1_RC;
CREATE TABLE asakusa.EX1_RC (
  SID BIGINT PRIMARY KEY,
  CACHE_FILE_SID VARCHAR (45) NULL,
  CREATE_DATE DATETIME NULL
) Type=InnoDB;

TRUNCATE TABLE asakusa.EX1;
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (1,111, 'hoge1', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (2,222, 'fuga2', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (3,333, 'bar3', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (4,111, 'hoge4', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (5,222, 'fuga5', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (6,333, 'bar6', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (7,111, 'hoge7', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (8,222, 'fuga8', null, null, null);
INSERT INTO asakusa.EX1 (SID, VALUE, STRING, VERSION_NO, RGST_DATETIME, UPDT_DATETIME)
  VALUES (9,444, 'bar9', null, null, null);
-- END;
  1. Execute a script to create management table for ThunderGate
$ cd $ASAKUSA_HOME/bulkloader/sql
$ mysql -u asakusa -pasakusa -D asakusa < create_table.sql 
$ mysql -u asakusa -pasakusa -D asakusa < insert_import_table_lock.sql

** If you use the same host for database node and Hadoop client node, skip following steps. (steps 5 through Step 8)**

  1. Copy ssh bridge-script for experimental.sh and hadoop_job_run, ($ASAKUSA_HOME/experimental/bin/hadoop_job_run_ssh_bridge.sh)
$ cp $ASAKUSA_HOME/experimental/bin/hadoop_job_run_ssh_bridge.sh \
  $ASAKUSA_HOME/experimental/bin/hadoop_job_run.sh
  1. Edit hadoop_job_run.sh copied at step 5, Specify following variables.
REMOTE_HADOOP_JOB_RUN_SH=$ASAKUSA_HOME/experimental/bin/hadoop_job_run.sh
SSHPATH=/usr/bin/ssh
HCHOST=(MASTERNODE_HOSTNAME) # Specify Hadoop client node host name.
HCUSER=(ASAKUSA_USER)
  1. Create ssh bridge-script for experimental.sh and clean_hadoop_work from hadoop_job_run.sh edited at step 6.
$ cp $ASAKUSA_HOME/experimental/bin/hadoop_job_run.sh \
  $ASAKUSA_HOME/experimental/bin/clean_hadoop_work.sh
  1. Edit clean_hadoop_work.sh copied at step 7, specify following variables.
REMOTE_HADOOP_JOB_RUN_SH=$ASAKUSA_HOME/experimental/bin/clean_hadoop_work.sh

##5.3. Execute the Sample Application Execute the sample application you deployed, check it works.

  1. Execute experimental.sh for sample application
$ $ASAKUSA_HOME/batchapps/ex/bin/experimental.sh
  1. If experimental.sh ends successfully and values at columns "VALUE" and "UPDT_DATETIME" of the records contained in MySQL table "asakusa.EX1" were updated, the test is succeeded.
  • [6. Deploying the batch application made in development environment and test Deploy batch applications built in development environment and test it.Basically, deploy as the same steps as chapter 5, "Deploy and test sample applications". But some applications has its own steps, please refer following descriptions as you deploy.

##6.1. Deploy batch applications to hadoop client nodes

  1. Move batch application files to $ASAKUSA_HOME/batchapps.Following is the deployment example of jar files built on batch application project "abcapp" from $HOME/work.
$ cp abcapp-batchapps-*.jar $ASAKUSA_HOME/batchapps
$ cd $ASAKUSA_HOME/batchapps
$ jar -xf abcapp-batchapps-*.jar
$ find . -name "*.sh" | xargs chmod u+x
$ rm -f abcapp-batchapps-*.jar
$ rm -fr META-INF
  1. Copy the application common library.If you use common libraries with batch applications that is not given by apache Hadoop, eg. Apache Commons Lang, copy the jar files into $ASAKUSA_HOME/ext/lib directory. Here is an example to copy the Apache Commons Lang.
$ cp commons-lang-2.6.jar $ASAKUSA_HOME/ext/lib

Note: In step 1, do not deploy wrong jar files.Files for deployment is named like ${artifactId}-batchapps-{version}.jar which have "batchapps" after artifact ID in their name.

For example, if you compiled sample project "abcapp", there are 3 files created in target directory.

  • abcapp-{version}-sources.jar: Not for deployment.
  • abcapp-{version}.jar:Not for deployment.
  • abcapp-batchapps-{version}.jar: Deploy this file.

##6.2. Deploy batch applications to database node

  1. Move batch application files to $ASAKUSA_HOME/batchapps.Move the same files as you deployed to hadoop cluster by the same steps.
$ cp abcapp-batchapps-*.jar $ASAKUSA_HOME/batchapps
$ cd $ASAKUSA_HOME/batchapps
$ jar -xf abcapp-batchapps-*.jar
$ find . -name "*.sh" | xargs chmod u+x
$ rm -f abcapp-batchapps-*.jar
$ rm -fr META-INF
  1. Copy $ASAKUSA_HOME/bulkloader/conf/[targetname]-jdbc.properties and create a data source configuration file according to the data source (target).Following is an example of creating a data source configuration file according to the target, "appdb".
$ cp $ASAKUSA_HOME/bulkloader/conf/[targetname]-jdbc.properties \
  $ASAKUSA_HOME/bulkloader/conf/appdb-jdbc.properties 
  1. Edit the data source configuration file you created in step 2. Configure the database connection according to your environment.
# JDBC driver's name (required)
jdbc.driver = com.mysql.jdbc.Driver
# URL of connected data base (required)
jdbc.url = jdbc:mysql://dbserver/appdb
# User of connected data base (required)
jdbc.user = appuser
# Password of connected data base (required)
jdbc.password = appuser
※ No need to change the following items.
  1. Create a database for an application.Execute the DDL managed by the application

  2. Create a system information table for ThunderGate.

    • Target tables used by ThunderGate for import/export, you'll need system tables, 'tablename_RC' and 'tablename_RL' associated with Import/export process.
    • The DDL to create these tables will be generated, when you execute model generator on development environment, on the path (default path is "target/sql/bulkloader_generated_table.sql" of application projects) specified by "asakusa.bulkloader.genddl" key in "build.properties".
    • However, this DDL includes to create a model for storing intermediate data. So we recommend you to manage DDL for tables needed for applications and use that separately.
  3. Edit $ASAKUSA_HOME/sql/insert_import_table_lock.sql. Change database name in WHERE clause from "asakusa" to particular one.The following is an example supposing you named database "appdb"

DELETE FROM IMPORT_TABLE_LOCK;
INSERT INTO IMPORT_TABLE_LOCK (TABLE_NAME) 
  SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA="appdb";
  1. Execute a script to create a table of ThunderGateSQL you execute here includes a process to create records with all table names in the database. So if you already did this in Chapter 5, you must do this step again.
$ cd $ASAKUSA_HOME/bulkloader/sql
$ mysql -u asakusa -pasakusa -D asakusa < create_table.sql
$ mysql -u appuser -pappuser -D appdb < insert_import_table_lock.sql
  1. ※Do the step 5 to 8 of Chapter 5 "Deploy and test sample applications" to prepare ssh-bridge script for experimental.sh, Only if you didn't do Chapter 5."Deploy and test sample applications" and database node and Hadoop client node is not the same machine

##6.3. Execute the batch application Execute the batch application you deployed and check it works.

  1. Import the application's input data into MySQL.

  2. Execute batch application's experimental.sh

$ $ASAKUSA_HOME/batchapps/(batch ID)/bin/experimental.sh
  1. Check the output tables in MySQL.