Permalink
Browse files

Added CORS Support and EC2 Support

This commit mainly adds support for 2 features. First, it enables CORS support
so that requests can be made across domain names. Second, it adds a script
for deploying the job-server to EC2. The commit also includes an example to
run on the EC2 cluster. Finally, there are improvements in the bin/ scripts
to better support paths with spaces in them. Previously, these paths had
caused issues with the bin/sever_deploy.sh script.
  • Loading branch information...
David-Durst committed Sep 10, 2015
1 parent 4951a45 commit dd25ea1bedd25ba9c0041abfe2f322ea2c0c2678
View
@@ -13,4 +13,14 @@ config/*.conf
config/*.sh
job-server/config/*.conf
job-server/config/*.sh
metastore_db/
metastore_db/
#ignore generated config
bin/ec2_example.sh
# ignore spark binaries
spark-1.5.0-bin-hadoop2.6.tgz

This comment has been minimized.

Show comment
Hide comment
@velvia

velvia Oct 19, 2015

This seems local to your installation and not necessary...

@velvia

velvia Oct 19, 2015

This seems local to your installation and not necessary...

This comment has been minimized.

Show comment
Hide comment
@David-Durst

David-Durst Oct 20, 2015

Owner

It is necessary. I download the file as part of my script to launch on ec2. I have made it more generic though to match any version.

@David-Durst

David-Durst Oct 20, 2015

Owner

It is necessary. I download the file as part of my script to launch on ec2. I have made it more generic though to match any version.

spark-1.5.0-bin-hadoop2.6/
# don't ignore the ec2 config and sh files
!job-server/config/ec2.sh
View
@@ -62,6 +62,10 @@ For release notes, look in the `notes/` directory. They should also be up on [l
The easiest way to get started is to try the [Docker container](doc/docker.md) which prepackages a Spark distribution with the job server and lets you start and deploy it.
## EC2 Start
Follow the instructions in [EC2](doc/EC2.md) to spin up a Spark cluster with job server and an example application.
## Development mode
The example walk-through below shows you how to use the job server with an included example job, by running the job server in local development mode in SBT. This is not an example of usage in production.
View
@@ -0,0 +1,38 @@
#!/bin/bash
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
. "$bin"/../config/user-ec2-settings.sh
#get spark binaries if they haven't been downloaded and extracted yet
if [ ! -d "$bin"/../spark-1.5.0-bin-hadoop2.6 ]; then
wget -P "$bin"/.. http://apache.arvixe.com/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz

This comment has been minimized.

Show comment
Hide comment
@velvia

velvia Oct 19, 2015

Might be good to make the exact version (such as hadoop2.6) an env var so users can override them

@velvia

velvia Oct 19, 2015

Might be good to make the exact version (such as hadoop2.6) an env var so users can override them

tar -xvzf "$bin"/../spark-1.5.0-bin-hadoop2.6.tgz -C "$bin"/..
fi
#run spark-ec2 to start ec2 cluster
EC2DEPLOY="$bin"/../spark-1.5.0-bin-hadoop2.6/ec2/spark-ec2
"$EC2DEPLOY" --copy-aws-credentials --key-pair=$KEY_PAIR --hadoop-major-version=yarn --identity-file=$SSH_KEY --region=us-east-1 --zone=us-east-1a --instance-type=$INSTANCE_TYPE --slaves $NUM_SLAVES launch $CLUSTER_NAME
#There is only 1 deploy host. However, the variable is plural as that is how Spark Job Server named it.
#To minimize changes, I left the variable name alone.
export DEPLOY_HOSTS=$("$EC2DEPLOY" get-master $CLUSTER_NAME | tail -n1)
#This line is a hack to edit the ec2.conf file so that the master option is correct. Since we are allowing Amazon to
#dynamically allocate a url for the master node, we must update the configuration file in between cluster startup
#and Job Server deployment
cp "$bin"/../config/ec2.conf.template "$bin"/../config/ec2.conf
sed -i -E "s/master = .*/master = \"spark:\/\/$DEPLOY_HOSTS:7077\"/g" "$bin"/../config/ec2.conf
#also get ec2_example.sh right
cp "$bin"/ec2_example.sh.template "$bin"/ec2_example.sh
sed -i -E "s/DEPLOY_HOSTS=.*/DEPLOY_HOSTS=\"$DEPLOY_HOSTS:8090\"/g" "$bin"/ec2_example.sh
#open all ports so the master for Spark Job Server to work and you can see the results of your jobs
aws ec2 authorize-security-group-ingress --group-name $CLUSTER_NAME-master --protocol tcp --port 0-65535 --cidr 0.0.0.0/0
cd "$bin"/..
bin/server_deploy.sh ec2
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" root@$DEPLOY_HOSTS "(echo 'export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID' >> spark/conf/spark-env.sh)"
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" root@$DEPLOY_HOSTS "(echo 'export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY' >> spark/conf/spark-env.sh)"
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" root@$DEPLOY_HOSTS "(cd job-server; nohup ./server_start.sh < /dev/null &> /dev/null &)"
echo "The Job Server is listening at $DEPLOY_HOSTS:8090"
View
@@ -0,0 +1,7 @@
#!/bin/bash
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
. "$bin"/../config/user-ec2-settings.sh
"$bin"/../spark-1.5.0-bin-hadoop2.6/ec2/spark-ec2 destroy $CLUSTER_NAME
@@ -0,0 +1,15 @@
DEPLOY_HOSTS=ENTER_DEPLOY_HOST_HERE
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
. "$bin"/../config/ec2.sh
ssh_key_to_use=""
if [ -n "$SSH_KEY" ] ; then
ssh_key_to_use="-i $SSH_KEY"
fi
wget -O- --post-file "$bin"/../job-server-extras/target/scala-2.10/job-server-extras_2.10-0.5.3-SNAPSHOT.jar "$DEPLOY_HOSTS/jars/km"

This comment has been minimized.

Show comment
Hide comment
@velvia

velvia Oct 19, 2015

Should get the version string from the root version.sbt file, not hard coded

@velvia

velvia Oct 19, 2015

Should get the version string from the root version.sbt file, not hard coded

scp -rp -o StrictHostKeyChecking=no $ssh_key_to_use "$bin"/../job-server-extras/src/main/KMeansExample/* ${APP_USER}@"${DEPLOY_HOSTS%:*}:/var/www/html/"
echo "The example is running at ${DEPLOY_HOSTS%:*}:5080"
View
@@ -18,7 +18,7 @@ if [ ! -f "$configFile" ]; then
echo "Could not find $configFile"
exit 1
fi
. $configFile
. "$configFile"
majorRegex='([0-9]+\.[0-9]+)\.[0-9]+'
if [[ $SCALA_VERSION =~ $majorRegex ]]
@@ -42,7 +42,6 @@ FILES="job-server-extras/target/scala-$majorVersion/spark-job-server.jar
bin/server_start.sh
bin/server_stop.sh
bin/kill-process-tree.sh
$CONFIG_DIR/$ENV.conf
config/shiro.ini
config/log4j-server.properties"
@@ -53,7 +52,9 @@ fi
for host in $DEPLOY_HOSTS; do
# We assume that the deploy user is APP_USER and has permissions
ssh $ssh_key_to_use ${APP_USER}@$host mkdir -p $INSTALL_DIR
scp $ssh_key_to_use $FILES ${APP_USER}@$host:$INSTALL_DIR/
scp $ssh_key_to_use $configFile ${APP_USER}@$host:$INSTALL_DIR/settings.sh
ssh -o StrictHostKeyChecking=no $ssh_key_to_use ${APP_USER}@$host mkdir -p $INSTALL_DIR
scp -o StrictHostKeyChecking=no $ssh_key_to_use $FILES ${APP_USER}@$host:$INSTALL_DIR/
scp -o StrictHostKeyChecking=no $ssh_key_to_use "$CONFIG_DIR/$ENV.conf" ${APP_USER}@$host:$INSTALL_DIR/
scp -o StrictHostKeyChecking=no $ssh_key_to_use "$configFile" ${APP_USER}@$host:$INSTALL_DIR/settings.sh
done
View
@@ -0,0 +1,20 @@
## Setting Up The EC2 Cluster
1. Sign up for an Amazon AWS account.
2. Assign your access key ID and secret access key to the bash variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
* I recommend doing this by placing the following export statements in your .bashrc file.
* export AWS_ACCESS_KEY_ID=accesskeyId
* export AWS_SECRET_ACCESS_KEY=secretAccessKey
3. Copy job-server/config/user-ec2-settings.sh.template to job-server/config/user-ec2-settings.sh and configure it. In particular, set KEY_PAIR to the name of your EC2 key pair and SSH_KEY to the location of the pair's private key.
* I recommend using an ssh key that does not require entering a password on every use. Otherwise, you will need to enter the password many times
4. Run bin/ec2_deploy.sh to start the EC2 cluster. Go to the url printed at the end of the script to view the Spark Job Server frontend. Change the port from 8090 to 8080 to view the Spark Standalone Cluster frontend.
5. Run bin/ec2_example.sh to setup the example. Go to the url printed at the end of the script to view the example.
4. Run bin/ec2_destroy.sh to shutdown the EC2 cluster.
## Using The Example
1. Start a Spark Context by pressing the "Start Context" button.
2. Load data by pressing the "Resample" button. The matrix of scatterplots and category selection dropdown will only appear after loading data from the server.
* It will take approximately 30-35 minutes the first time you press resample after starting a new context. The cluster spends 20 minutes pulling data from an S3 bucket. It spends the rest of the time running the k-means clustering algorithm.
* Subsequent presses will refresh the data in the scatterplots. These presses will take about 10 seconds as the data is reloaded from memory using a NamedRDD.
3. After performing the data analysis, shutdown the context by pressing the "Stop Context" button.

Some generated files are not rendered by default. Learn more.

Oops, something went wrong.
@@ -0,0 +1,58 @@
svg {
font: 10px sans-serif;
padding: 10px;
}
.axis,
.frame {
shape-rendering: crispEdges;
}
.axis line {
stroke: #ddd;
}
.axis path {
display: none;
}
.frame {
fill: none;
stroke: #aaa;
}
circle {
fill-opacity: .4;
}
circle.hidden {
fill: #ccc !important;
fill-opacity: .2;
}
.extent {
fill: #000;
fill-opacity: .125;
stroke: #fff;
}
.palette {
//cursor: pointer;
display: inline-block;
vertical-align: top;
margin: 200px 0 4px 6px;
padding: 4px;
background: #fff;
//border: solid 1px #aaa;
}
.swatch {
cursor: pointer;
display: block;
vertical-align: middle;
width: 40px;
color: white;
text-align: center;
padding-top: 8px;
padding-bottom: 8px;
}
@@ -0,0 +1,54 @@
<!DOCTYPE html>
<meta charset="utf-8">
<body>
<div>
<div>
<input name="startButton"
type="button"
value="Start Context"
onclick="startContext()"
class = "btn btn-default enableWhileStopped"
disabled />
<input name="updateButton"
type="button"
value="Resample"
onclick="runSampling()"
class = "btn btn-default enableWhileRunning"
disabled />
<input name="stopButton"
type="button"
value="Stop Context"
onclick="stopContext()"
class = "btn btn-default enableWhileRunning"
disabled />
<input name="filterButton"
type="button"
value="Filter Categories"
onclick="drawData()"
class = "btn btn-default" />
<select multiple="multiple" id="multiSelect">
</select>
</div>
<div id="state">
Syncing with server.
</div>
<div id="filter_options">
</div>
</div>
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.6/d3.min.js"></script>
<script src="js/colorbrewer.v1.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/URI.js/1.16.0/URI.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.0.0-alpha1/jquery.min.js"></script>
<link rel="stylesheet" property='stylesheet' href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/css/bootstrap.min.css" type="text/css"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js"></script>
<script type="text/javascript" src="js/bootstrap-multiselect.js"></script>
<link rel="stylesheet" property='stylesheet' href="css/bootstrap-multiselect.css" type="text/css"/>
<link rel="stylesheet" property='stylesheet' href="css/scatterplot.css" type="text/css"/>
<script type="text/javascript" src="js/graphics.js"></script>
<script type="text/javascript" src="js/jobserver.js"></script>
</body>
Oops, something went wrong.

0 comments on commit dd25ea1

Please sign in to comment.