From c9ba4b3bf12a590f3a6326de9e93586f1d83a756 Mon Sep 17 00:00:00 2001 From: ns1123 <45953825+ns1123@users.noreply.github.com> Date: Mon, 15 Jul 2019 16:08:56 -0700 Subject: [PATCH] Fix AWS EMR boostrap script and doc Includes Bin's fixes to the shellscript Remove curl from Master Status check logic (AWS updated curl version which caused an error) Remove monitor to prevent ssh attempt pr-link: Alluxio/alluxio#9467 change-id: cid-7f7aef2f0ccf8cf43be2e07c991ad5b7640490bf --- docs/en/compute/AWS-EMR.md | 57 ++++++++++++++++++++++++---------- integration/emr/alluxio-emr.sh | 38 +++++++++++------------ 2 files changed, 59 insertions(+), 36 deletions(-) diff --git a/docs/en/compute/AWS-EMR.md b/docs/en/compute/AWS-EMR.md index 8e6d7d3c6da6..0f5d58e97a76 100644 --- a/docs/en/compute/AWS-EMR.md +++ b/docs/en/compute/AWS-EMR.md @@ -25,7 +25,7 @@ different cloud provider's storage i.e. GCS, Azure Blob Store. * IAM Account with the default EMR Roles * Key Pair for EC2 * An S3 Bucket -* AWS CLI +* AWS CLI: Make sure that the AWS CLI is also set up and ready with the required AWS Access/Secret key The majority of the pre-requisites can be found by going through the [AWS EMR Getting Started](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html) guide. An S3 bucket @@ -34,32 +34,43 @@ the root UFS can be reconfigured to be HDFS. ## Basic Setup -To begin with, download an Alluxio release and unzip it. In the `integration/emr/` directory, copy the `alluxio-emr.sh` -and `alluxio-emr.json` files to the current directory. These files will serve as the main mechanisms to change the -Alluxio configuration in the future. Make sure that the AWS CLI is also set up and ready -with the required AWS Access/Secret key. +To begin with, [download an Alluxio release](https://www.alluxio.io/download) and unzip it. -1. Run `aws emr create-default-roles`. This will set up the required IAM roles for the account to be able to use the EMR -service. -2. Make sure that the `alluxio-emr.sh` script is uploaded to a location in S3 and `alluxio-presto.json` is saved somewhere on your local filesystem. +1. Set up the required IAM roles for the account to be able to use the EMR service. +```bash +aws emr create-default-roles +``` +2. Upload `integration/emr/alluxio-emr.sh` script to your S3 bucket. 3. The input arguments for the bootstrap script are in the order below: - - The URI from where to download the Alluxio Release .tar.gz file. This can be an `s3://` URI or an `http://` URI. - This is a mandatory property. + - The URI from where to download the Alluxio Release `.tar.gz` file. This can be an `http://` + URI, `s3://` URI or an `http://` URI. + This is a mandatory property. - The root-ufs-uri. This should be an `s3://` or `hdfs://` URI designating the root mount of the Alluxio file system. This is a mandatory property. - Extra alluxio options. These are specified as a comma-separated list of key-values in the format `=`. For example, `alluxio.user.file.writetype.default=CACHE_THROUGH` - -```bash -aws emr create-cluster --release-label emr-5.23.0 --instance-count --instance-type --applications Name=Presto Name=Hive Name=Spark --name '' --bootstrap-actions Path=s3://bucket/path/to/alluxio-emr.sh,Args=[,,] --configurations file:///path/to/file/alluxio-emr.json --ec2-attributes KeyName= ``` - +aws emr create-cluster \ +--release-label emr-5.23.0 \ +--instance-count \ +--instance-type \ +--applications Name=Presto Name=Hive Name=Spark \ +--name '' \ +--bootstrap-actions \ +Path=s3://bucket/path/to/alluxio-emr.sh,\ +Args=[,,] \ +--configurations file://${ALLUXIO_HOME}/integration/emr/alluxio-emr.json \ +--ec2-attributes KeyName= +``` 4. On the [EMR Console](https://console.aws.amazon.com/elasticmapreduce/home), you should be able to see the cluster going through the different stages of setup. Once the cluster is in the 'Waiting' stage, click on the cluster details to get the 'Master public DNS'. SSH into this instance using the keypair provided in the previous command. If a security group isn't specified via CLI, the default EMR security group will not allow inbound SSH. To SSH into the machine, a new rule will need to be added. -5. Test that Alluxio is running as expected by running `sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio runTests"` +5. Test that Alluxio is running as expected +``` +sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio runTests +``` Alluxio is installed in `/opt/alluxio/` by default. Hive and Presto are already configured to connect to Alluxio. The cluster also uses AWS Glue as the default metastore for both Presto and Hive. This will allow you to maintain table @@ -67,8 +78,20 @@ definitions between multiple runs of the Alluxio cluster. See the below sample command for reference. -```bash -aws emr create-cluster --release-label emr-5.23.0 --instance-count 3 --instance-type m4.xlarge --applications Name=Presto Name=Hive --name 'Test cluster' --bootstrap-actions Path=s3://alluxio-test/emr/bootstrap-actions/alluxio-emr.sh,Args=[http://downloads.alluxio.io/downloads/files/2.0.0-preview/alluxio-2.0.0-preview-bin.tar.gz,s3a://alluxio-test/emr/mount/,alluxio.underfs.s3.owner.id.to.username.mapping=f1234123412341234123412341234123412341234123412341234123412341234=hadoop] --configurations file:///Users/foo/emr/alluxio/alluxio-emr.json --ec2-attributes KeyName=admin-key +``` +aws emr create-cluster \ +--release-label emr-5.23.0 \ +--instance-count 3 \ +--instance-type m4.xlarge \ +--applications Name=Presto Name=Hive \ +--name 'Test cluster' \ +--bootstrap-actions \ +Path=s3://alluxio-test/emr/bootstrap-actions/alluxio-emr.sh,\ +Args=[http://downloads.alluxio.io/downloads/files/{{site.ALLUXIO_RELEASED_VERSION}}/alluxio-{{site.ALLUXIO_RELEASED_VERSION}}-bin.tar.gz,\ +s3://alluxio-test/emr/mount/,\ +alluxio.underfs.s3.owner.id.to.username.mapping=f1234123412341234123412341234123412341234123412341234123412341234=hadoop] \ +--configurations file://${ALLUXIO_HOME}/integration/emr/alluxio-emr.json \ +--ec2-attributes KeyName=admin-key ``` Notes: The default Alluxio Worker memory is set to 20GB. If the instance type has less than 20GB of memory, change diff --git a/integration/emr/alluxio-emr.sh b/integration/emr/alluxio-emr.sh index 2bc18a24921f..36fd090e7c25 100644 --- a/integration/emr/alluxio-emr.sh +++ b/integration/emr/alluxio-emr.sh @@ -11,8 +11,8 @@ # # This script is meant for bootstrapping the Alluxio service to an EMR cluster. Arguments for the script are listed below. -# Arg 1. Download URI (ex. http://downloads.alluxio.io/downloads/files/2.0.0-preview/alluxio-2.0.0-preview-bin.tar.gz) -# Arg 2. Root UFS URI (ex. s3a://my-bucket/alluxio-emr/mount) +# Arg 1. Download URI (ex. http://downloads.alluxio.io/downloads/files/2.0.0/alluxio-2.0.0-bin.tar.gz) +# Arg 2. Root UFS URI (ex. s3://my-bucket/alluxio-emr/mount) # Arg 3. Extra Alluxio Options. These will be appended to alluxio-site.properties. Multiple options can be specified using ';' as a delimiter # (ex. alluxio.user.file.writetype.default=CACHE_THROUGH;alluxio.user.file.readtype.default=CACHE) @@ -22,7 +22,7 @@ sudo useradd alluxio -u 600 -g 600 #Download the release #TODO Add metadata header tag to the wget for filtering out in download metrics. -if [ -z $1] +if [[ -z $1 ]] then echo "No Download URL Provided. Please go to http://downloads.alluxio.io to see available release downloads." else @@ -44,7 +44,7 @@ sudo tar -xvf /opt/$RELEASE -C /opt/ sudo rm -R /opt/$RELEASE sudo mv /opt/$RELEASE_UNZIP /opt/alluxio sudo chown -R alluxio:alluxio /opt/alluxio -rm $RELEASE +rm ${RELEASE} sudo runuser -l alluxio -c "cp /opt/alluxio/conf/alluxio-site.properties.template /opt/alluxio/conf/alluxio-site.properties" #Get hostnames and load into masters/workers file @@ -52,10 +52,10 @@ EMR_CLUSTER=`jq '.jobFlowId' /mnt/var/lib/info/job-flow.json | sed -e 's/^"//' - HOSTLIST=`aws emr list-instances --cluster-id $EMR_CLUSTER --region us-east-1 | jq '.Instances[].PrivateDnsName' | sed -e 's/^"//' -e 's/"$//'` MASTER=`jq '.masterHost' /mnt/var/lib/info/extraInstanceData.json | sed -e 's/^"//' -e 's/"$//' | nslookup | awk -v ip="$ip" '/name/{print substr($NF,1,length($NF)-1),ip}'` -if [ -z "$MASTER"] +if [[ -z "$MASTER" ]] then MASTER=`hostname` - MASTER=$MASTER".ec2.internal" + MASTER=${MASTER}".ec2.internal" fi WORKERS=`printf '%s\n' "${HOSTLIST//$MASTER/}"` @@ -68,14 +68,15 @@ IS_MASTER=`jq '.isMaster' /mnt/var/lib/info/instance.json` #Set up alluxio-site.properties sudo runuser -l alluxio -c "echo 'alluxio.master.hostname=$MASTER' > /opt/alluxio/conf/alluxio-site.properties" +sudo runuser -l alluxio -c "echo 'alluxio.master.journal.type=UFS' >> /opt/alluxio/conf/alluxio-site.properties" sudo runuser -l alluxio -c "echo 'alluxio.master.mount.table.root.ufs=$2' >> /opt/alluxio/conf/alluxio-site.properties" +sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.hive.users=*' >> /opt/alluxio/conf/alluxio-site.properties" +sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.presto.users=*' >> /opt/alluxio/conf/alluxio-site.properties" +sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.yarn.users=*' >> /opt/alluxio/conf/alluxio-site.properties" sudo runuser -l alluxio -c "echo 'alluxio.worker.memory.size=20GB' >> /opt/alluxio/conf/alluxio-site.properties" -sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.levels=1' >> /opt/alluxio/conf/alluxio-site.properties" sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.level0.alias=MEM' >> /opt/alluxio/conf/alluxio-site.properties" sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.level0.dirs.path=/mnt/ramdisk' >> /opt/alluxio/conf/alluxio-site.properties" -sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.hive.users=*' >> /opt/alluxio/conf/alluxio-site.properties" -sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.yarn.users=*' >> /opt/alluxio/conf/alluxio-site.properties" -sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.presto.users=*' >> /opt/alluxio/conf/alluxio-site.properties" +sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.levels=1' >> /opt/alluxio/conf/alluxio-site.properties" #Inject user defined properties (semicolon separated) IFS=';' @@ -83,21 +84,20 @@ conf=($3) printf "%s\n" "${conf[@]}" | sudo tee -a /opt/alluxio/conf/alluxio-site.properties #No ssh -if [ $IS_MASTER = "true" ] +if [[ ${IS_MASTER} = "true" ]] then - sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh master" - sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh job_master" - sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh proxy" + sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a master" + sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a job_master" + sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a proxy" else /opt/alluxio/bin/alluxio-mount.sh SudoMount local - while [ $MASTER_STATUS -ne "200" ] + until /opt/alluxio/bin/alluxio fsadmin report do - MASTER_STATUS=`curl -s -o /dev/null -w "%{http_code}" $MASTER:19999` sleep 5 done - sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh worker" - sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh job_worker" - sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh proxy" + sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a worker" + sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a job_worker" + sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a proxy" fi #Compute configs