Skip to content

Commit

Permalink
Fix AWS EMR boostrap script and doc
Browse files Browse the repository at this point in the history
Includes Bin's fixes to the shellscript
Remove curl from Master Status check logic (AWS updated curl version
which caused an error)
Remove monitor to prevent ssh attempt

pr-link: #9467
change-id: cid-7f7aef2f0ccf8cf43be2e07c991ad5b7640490bf
  • Loading branch information
ns1123 authored and alluxio-bot committed Jul 15, 2019
1 parent c3b1c1f commit c9ba4b3
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 36 deletions.
57 changes: 40 additions & 17 deletions docs/en/compute/AWS-EMR.md
Expand Up @@ -25,7 +25,7 @@ different cloud provider's storage i.e. GCS, Azure Blob Store.
* IAM Account with the default EMR Roles
* Key Pair for EC2
* An S3 Bucket
* AWS CLI
* AWS CLI: Make sure that the AWS CLI is also set up and ready with the required AWS Access/Secret key

The majority of the pre-requisites can be found by going through the
[AWS EMR Getting Started](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html) guide. An S3 bucket
Expand All @@ -34,41 +34,64 @@ the root UFS can be reconfigured to be HDFS.

## Basic Setup

To begin with, download an Alluxio release and unzip it. In the `integration/emr/` directory, copy the `alluxio-emr.sh`
and `alluxio-emr.json` files to the current directory. These files will serve as the main mechanisms to change the
Alluxio configuration in the future. Make sure that the AWS CLI is also set up and ready
with the required AWS Access/Secret key.
To begin with, [download an Alluxio release](https://www.alluxio.io/download) and unzip it.

1. Run `aws emr create-default-roles`. This will set up the required IAM roles for the account to be able to use the EMR
service.
2. Make sure that the `alluxio-emr.sh` script is uploaded to a location in S3 and `alluxio-presto.json` is saved somewhere on your local filesystem.
1. Set up the required IAM roles for the account to be able to use the EMR service.
```bash
aws emr create-default-roles
```
2. Upload `integration/emr/alluxio-emr.sh` script to your S3 bucket.
3. The input arguments for the bootstrap script are in the order below:
- The URI from where to download the Alluxio Release .tar.gz file. This can be an `s3://` URI or an `http://` URI.
This is a mandatory property.
- The URI from where to download the Alluxio Release `.tar.gz` file. This can be an `http://`
URI, `s3://` URI or an `http://` URI.
This is a mandatory property.
- The root-ufs-uri. This should be an `s3://` or `hdfs://` URI designating the root mount of the Alluxio file system.
This is a mandatory property.
- Extra alluxio options. These are specified as a comma-separated list of key-values in the format `<key>=<value>`.
For example, `alluxio.user.file.writetype.default=CACHE_THROUGH`

```bash
aws emr create-cluster --release-label emr-5.23.0 --instance-count <num-instances> --instance-type <instance-type> --applications Name=Presto Name=Hive Name=Spark --name '<cluster-name>' --bootstrap-actions Path=s3://bucket/path/to/alluxio-emr.sh,Args=[<download-url>,<root-ufs-uri>,<additional-properties>] --configurations file:///path/to/file/alluxio-emr.json --ec2-attributes KeyName=<ec2-keypair-name>
```

aws emr create-cluster \
--release-label emr-5.23.0 \
--instance-count <num-instances> \
--instance-type <instance-type> \
--applications Name=Presto Name=Hive Name=Spark \
--name '<cluster-name>' \
--bootstrap-actions \
Path=s3://bucket/path/to/alluxio-emr.sh,\
Args=[<download-url>,<root-ufs-uri>,<additional-properties>] \
--configurations file://${ALLUXIO_HOME}/integration/emr/alluxio-emr.json \
--ec2-attributes KeyName=<ec2-keypair-name>
```
4. On the [EMR Console](https://console.aws.amazon.com/elasticmapreduce/home), you should be able to see the cluster
going through the different stages of setup. Once the cluster is in the 'Waiting' stage, click on the cluster details
to get the 'Master public DNS'. SSH into this instance using the keypair provided in the previous command. If a
security group isn't specified via CLI, the default EMR security group will not allow inbound SSH. To SSH into the
machine, a new rule will need to be added.
5. Test that Alluxio is running as expected by running `sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio runTests"`
5. Test that Alluxio is running as expected
```
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio runTests
```

Alluxio is installed in `/opt/alluxio/` by default. Hive and Presto are already configured to connect to Alluxio. The
cluster also uses AWS Glue as the default metastore for both Presto and Hive. This will allow you to maintain table
definitions between multiple runs of the Alluxio cluster.

See the below sample command for reference.

```bash
aws emr create-cluster --release-label emr-5.23.0 --instance-count 3 --instance-type m4.xlarge --applications Name=Presto Name=Hive --name 'Test cluster' --bootstrap-actions Path=s3://alluxio-test/emr/bootstrap-actions/alluxio-emr.sh,Args=[http://downloads.alluxio.io/downloads/files/2.0.0-preview/alluxio-2.0.0-preview-bin.tar.gz,s3a://alluxio-test/emr/mount/,alluxio.underfs.s3.owner.id.to.username.mapping=f1234123412341234123412341234123412341234123412341234123412341234=hadoop] --configurations file:///Users/foo/emr/alluxio/alluxio-emr.json --ec2-attributes KeyName=admin-key
```
aws emr create-cluster \
--release-label emr-5.23.0 \
--instance-count 3 \
--instance-type m4.xlarge \
--applications Name=Presto Name=Hive \
--name 'Test cluster' \
--bootstrap-actions \
Path=s3://alluxio-test/emr/bootstrap-actions/alluxio-emr.sh,\
Args=[http://downloads.alluxio.io/downloads/files/{{site.ALLUXIO_RELEASED_VERSION}}/alluxio-{{site.ALLUXIO_RELEASED_VERSION}}-bin.tar.gz,\
s3://alluxio-test/emr/mount/,\
alluxio.underfs.s3.owner.id.to.username.mapping=f1234123412341234123412341234123412341234123412341234123412341234=hadoop] \
--configurations file://${ALLUXIO_HOME}/integration/emr/alluxio-emr.json \
--ec2-attributes KeyName=admin-key
```

Notes: The default Alluxio Worker memory is set to 20GB. If the instance type has less than 20GB of memory, change
Expand Down
38 changes: 19 additions & 19 deletions integration/emr/alluxio-emr.sh
Expand Up @@ -11,8 +11,8 @@
#

# This script is meant for bootstrapping the Alluxio service to an EMR cluster. Arguments for the script are listed below.
# Arg 1. Download URI (ex. http://downloads.alluxio.io/downloads/files/2.0.0-preview/alluxio-2.0.0-preview-bin.tar.gz)
# Arg 2. Root UFS URI (ex. s3a://my-bucket/alluxio-emr/mount)
# Arg 1. Download URI (ex. http://downloads.alluxio.io/downloads/files/2.0.0/alluxio-2.0.0-bin.tar.gz)
# Arg 2. Root UFS URI (ex. s3://my-bucket/alluxio-emr/mount)
# Arg 3. Extra Alluxio Options. These will be appended to alluxio-site.properties. Multiple options can be specified using ';' as a delimiter
# (ex. alluxio.user.file.writetype.default=CACHE_THROUGH;alluxio.user.file.readtype.default=CACHE)

Expand All @@ -22,7 +22,7 @@ sudo useradd alluxio -u 600 -g 600

#Download the release
#TODO Add metadata header tag to the wget for filtering out in download metrics.
if [ -z $1]
if [[ -z $1 ]]
then
echo "No Download URL Provided. Please go to http://downloads.alluxio.io to see available release downloads."
else
Expand All @@ -44,18 +44,18 @@ sudo tar -xvf /opt/$RELEASE -C /opt/
sudo rm -R /opt/$RELEASE
sudo mv /opt/$RELEASE_UNZIP /opt/alluxio
sudo chown -R alluxio:alluxio /opt/alluxio
rm $RELEASE
rm ${RELEASE}
sudo runuser -l alluxio -c "cp /opt/alluxio/conf/alluxio-site.properties.template /opt/alluxio/conf/alluxio-site.properties"

#Get hostnames and load into masters/workers file
EMR_CLUSTER=`jq '.jobFlowId' /mnt/var/lib/info/job-flow.json | sed -e 's/^"//' -e 's/"$//'`
HOSTLIST=`aws emr list-instances --cluster-id $EMR_CLUSTER --region us-east-1 | jq '.Instances[].PrivateDnsName' | sed -e 's/^"//' -e 's/"$//'`
MASTER=`jq '.masterHost' /mnt/var/lib/info/extraInstanceData.json | sed -e 's/^"//' -e 's/"$//' | nslookup | awk -v ip="$ip" '/name/{print substr($NF,1,length($NF)-1),ip}'`

if [ -z "$MASTER"]
if [[ -z "$MASTER" ]]
then
MASTER=`hostname`
MASTER=$MASTER".ec2.internal"
MASTER=${MASTER}".ec2.internal"
fi

WORKERS=`printf '%s\n' "${HOSTLIST//$MASTER/}"`
Expand All @@ -68,36 +68,36 @@ IS_MASTER=`jq '.isMaster' /mnt/var/lib/info/instance.json`

#Set up alluxio-site.properties
sudo runuser -l alluxio -c "echo 'alluxio.master.hostname=$MASTER' > /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.journal.type=UFS' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.mount.table.root.ufs=$2' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.hive.users=*' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.presto.users=*' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.yarn.users=*' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.worker.memory.size=20GB' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.levels=1' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.level0.alias=MEM' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.level0.dirs.path=/mnt/ramdisk' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.hive.users=*' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.yarn.users=*' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.master.security.impersonation.presto.users=*' >> /opt/alluxio/conf/alluxio-site.properties"
sudo runuser -l alluxio -c "echo 'alluxio.worker.tieredstore.levels=1' >> /opt/alluxio/conf/alluxio-site.properties"

#Inject user defined properties (semicolon separated)
IFS=';'
conf=($3)
printf "%s\n" "${conf[@]}" | sudo tee -a /opt/alluxio/conf/alluxio-site.properties

#No ssh
if [ $IS_MASTER = "true" ]
if [[ ${IS_MASTER} = "true" ]]
then
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh master"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh job_master"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh proxy"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a master"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a job_master"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a proxy"
else
/opt/alluxio/bin/alluxio-mount.sh SudoMount local
while [ $MASTER_STATUS -ne "200" ]
until /opt/alluxio/bin/alluxio fsadmin report
do
MASTER_STATUS=`curl -s -o /dev/null -w "%{http_code}" $MASTER:19999`
sleep 5
done
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh worker"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh job_worker"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh proxy"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a worker"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a job_worker"
sudo runuser -l alluxio -c "/opt/alluxio/bin/alluxio-start.sh -a proxy"
fi

#Compute configs
Expand Down

0 comments on commit c9ba4b3

Please sign in to comment.