-
Notifications
You must be signed in to change notification settings - Fork 299
Elastic spark cluster #39
base: branch-1.6
Are you sure you want to change the base?
Conversation
…e to setup a new slave.
…ch remaining instance when slaves are removed.
…d to keep at least one slave in the cluster.
This is great. Thanks @Tirami-su -- I'll take a look at this soon. @nchammas Would be great if you also took a look. |
Thanks @shivaram ! |
@@ -787,9 +916,47 @@ def get_instances(group_names): | |||
return (master_instances, slave_instances) | |||
|
|||
|
|||
def transfert_SSH_key(opts, master, slave_nodes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transfert
(with a t
at the end)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooupss... Sorry a french reflex...
I won't be able to review this PR in detail since I'm struggling to find time to complete even my own implementation of this feature for Flintrock. However, some high-level questions:
|
Ho... How does add-slaves know what version of Spark to install and other setup to do on the new slaves? Can the user inadvertently create a broken cluster with a bad call to add-slaves?There are three types of deploy folders:
deploy.generic which contains ec2-variables (SPARK_VERSION, HADOOP_MAJOR_VERSION, MODULES, ...) will be deployed only when the cluster is created. There is a problem, if you use different instance types. For example, if you start a cluster with some big instances and after that you add new small instances. As the settings for memory and CPU are evaluated only on the master and the first slave in Another "problem" is that if you modify files present in templates, like Can you explain the high-level purpose of entities.generic and how it fits in with the current code base?The purpose of Thank you for this review and for the questions. If the answers aren't clear, feel free to ask me for details and same if you have other questions :) ! |
Elastic Spark Cluster
Prerequisite
A running Spark cluster.
add-slaves
./spark-ec2 -k key-name -i file.pem add-slaves cluster_name
This command adds a new slave to the cluster. To add more slave use
-s
or--slaves
option. You could precise to start the slaves as spot instances using the option--spot-price
and specify the type of instances using--instance-type
or-t
.remove-slaves
./spark-ec2 -k key-name -i file.pem remove-slaves cluster_name
The last command removes a slave if this is not the last slave in the cluster. To remove more slaves indicates the number using
-s
or--slaves
option.If the defined number is more than the number of slaves present on the cluster or the number is equal to -1, every slave will be removed but one slave will be kept by the cluster.
Note:
Testing command shortcuts
Creating a cluster
./spark-ec2 -k $KEY_NAME -i $ID_FILE --spark-ec2-git-repo=$SPARK_EC2_GIT_REPO --spark-ec2-git-branch=$SPARK_EC2_GIT_BRANCH launch $CLUSTER_NAME
Trying to remove a slave
./spark-ec2 -k $KEY_NAME -i $ID_FILE remove-slaves $CLUSTER_NAME
Adding a slave
./spark-ec2 -k $KEY_NAME -i $ID_FILE add-slaves $CLUSTER_NAME
Adding three more slaves as spot instances
./spark-ec2 -s 3 -k $KEY_NAME -i $ID_FILE --spot-price=0.05 add-slaves $CLUSTER_NAME
Removing two slaves
./spark-ec2 -s 2 -k $KEY_NAME -i $ID_FILE remove-slaves $CLUSTER_NAME
Removing all slaves (except one slave)
./spark-ec2 -s -1 -k $KEY_NAME -i $ID_FILE remove-slaves $CLUSTER_NAME
Destroy cluster
./spark-ec2 -s -1 -k $KEY_NAME -i $ID_FILE detroy $CLUSTER_NAME