Elastic spark cluster #39

ar-ms · 2016-07-18T10:05:05Z

Elastic Spark Cluster

Prerequisite

A running Spark cluster.

add-slaves

./spark-ec2 -k key-name -i file.pem add-slaves cluster_name

This command adds a new slave to the cluster. To add more slave use -s or --slaves option. You could precise to start the slaves as spot instances using the option --spot-price and specify the type of instances using --instance-type or -t.

remove-slaves

./spark-ec2 -k key-name -i file.pem remove-slaves cluster_name

The last command removes a slave if this is not the last slave in the cluster. To remove more slaves indicates the number using -s or --slaves option.
If the defined number is more than the number of slaves present on the cluster or the number is equal to -1, every slave will be removed but one slave will be kept by the cluster.

Note:

The slaves are removed randomly.
One slave will be kept.

Testing command shortcuts

Environment variables

KEY_NAME="key-name"
ID_FILE="file.pem"
SPARK_EC2_GIT_REPO="https://github.com/tirami-su/spark-ec2"
SPARK_EC2_GIT_BRANCH="elastic-spark-cluster"
CLUSTER_NAME="cluster_name"

Creating a cluster
./spark-ec2 -k $KEY_NAME -i $ID_FILE --spark-ec2-git-repo=$SPARK_EC2_GIT_REPO --spark-ec2-git-branch=$SPARK_EC2_GIT_BRANCH launch $CLUSTER_NAME

slave = 1
Trying to remove a slave
./spark-ec2 -k $KEY_NAME -i $ID_FILE remove-slaves $CLUSTER_NAME

slave = 1
Adding a slave
./spark-ec2 -k $KEY_NAME -i $ID_FILE add-slaves $CLUSTER_NAME

slave = 2
Adding three more slaves as spot instances
./spark-ec2 -s 3 -k $KEY_NAME -i $ID_FILE --spot-price=0.05 add-slaves $CLUSTER_NAME

slave = 5
Removing two slaves
./spark-ec2 -s 2 -k $KEY_NAME -i $ID_FILE remove-slaves $CLUSTER_NAME

slave = 3
Removing all slaves (except one slave)
./spark-ec2 -s -1 -k $KEY_NAME -i $ID_FILE remove-slaves $CLUSTER_NAME

slave = 1
Destroy cluster
./spark-ec2 -s -1 -k $KEY_NAME -i $ID_FILE detroy $CLUSTER_NAME

…park_ec2.py

…e to setup a new slave.

…ch remaining instance when slaves are removed.

…d to keep at least one slave in the cluster.

shivaram · 2016-07-18T17:08:57Z

This is great. Thanks @Tirami-su -- I'll take a look at this soon.

@nchammas Would be great if you also took a look.

ar-ms · 2016-07-18T18:24:29Z

Thanks @shivaram !
Super :D !

nchammas · 2016-07-21T17:03:50Z

spark_ec2.py

@@ -787,9 +916,47 @@ def get_instances(group_names):
    return (master_instances, slave_instances)


+def transfert_SSH_key(opts, master, slave_nodes):


transfert (with a t at the end)?

Ooupss... Sorry a french reflex...

nchammas · 2016-07-21T17:07:04Z

I won't be able to review this PR in detail since I'm struggling to find time to complete even my own implementation of this feature for Flintrock.

However, some high-level questions:

How does add-slaves know what version of Spark to install and other setup to do on the new slaves? Can the user inadvertently create a broken cluster with a bad call to add-slaves?
Can you explain the high-level purpose of entities.generic and how it fits in with the current code base?

ar-ms · 2016-07-25T11:39:46Z

Ho...

How does add-slaves know what version of Spark to install and other setup to do on the new slaves? Can the user inadvertently create a broken cluster with a bad call to add-slaves?

There are three types of deploy folders:

deploy.generic:

deploy.generic/
└── root
    └── spark-ec2
        └── ec2-variables.sh

entities.generic:

entities.generic/
└── root
    └── spark-ec2
        ├── masters
        └── slaves

new_slaves.generic

new_slaves.generic/
└── root
    └── spark-ec2
        └── new_slaves

deploy.generic which contains ec2-variables (SPARK_VERSION, HADOOP_MAJOR_VERSION, MODULES, ...) will be deployed only when the cluster is created. add-slaves uses the settings from ec2-variables.sh to create new slaves so it will not break the cluster since it will use the same versions.

There is a problem, if you use different instance types. For example, if you start a cluster with some big instances and after that you add new small instances. As the settings for memory and CPU are evaluated only on the master and the first slave in deploy_templates.py, if the new slaves don't have the sufficient memory or CPU, the jobs will fail.

Another "problem" is that if you modify files present in templates, like core-site.xml, etc.. directly in a running cluster, the modifications will be overridden by the original version present if you launch new instances, because the setup_new_slaves launches deploy_templates.py.

Can you explain the high-level purpose of entities.generic and how it fits in with the current code base?

The purpose of entities.generic is to separate the settings that will changes (masters, slaves(mostly)) from the others. entities.generic deploys only masters and slaves files, this separation enables us to not override the ec2-variables.sh when launch new slaves and so not to corrupt the cluster.

Thank you for this review and for the questions. If the answers aren't clear, feel free to ask me for details and same if you have other questions :) !

tirami-su added 5 commits July 15, 2016 15:39

project restructured, add-slaves and remove-slaves actions added to s…

76843c8

…park_ec2.py

new generics files added

8c696bc

setup_new_slaves script added along with scripts needed by each modul…

c346112

…e to setup a new slave.

update_entities will be used to update slaves and masters files on ea…

2ce9eaa

…ch remaining instance when slaves are removed.

Bug corrected: selection slaves to remove. remove_slaves func modifie…

ff11a07

…d to keep at least one slave in the cluster.

nchammas reviewed Jul 21, 2016
View reviewed changes

transfert -> transfer renaming

d73aa85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic spark cluster #39

Elastic spark cluster #39

ar-ms commented Jul 18, 2016

shivaram commented Jul 18, 2016

ar-ms commented Jul 18, 2016

nchammas Jul 21, 2016 •

edited

Loading

ar-ms Jul 25, 2016

nchammas commented Jul 21, 2016 •

edited

Loading

ar-ms commented Jul 25, 2016

		@@ -787,9 +916,47 @@ def get_instances(group_names):
		return (master_instances, slave_instances)


		def transfert_SSH_key(opts, master, slave_nodes):

Elastic spark cluster #39

Are you sure you want to change the base?

Elastic spark cluster #39

Conversation

ar-ms commented Jul 18, 2016

Elastic Spark Cluster

Prerequisite

add-slaves

remove-slaves

Testing command shortcuts

slave = 1

slave = 1

slave = 2

slave = 5

slave = 3

slave = 1

shivaram commented Jul 18, 2016

ar-ms commented Jul 18, 2016

nchammas Jul 21, 2016 • edited Loading

Choose a reason for hiding this comment

ar-ms Jul 25, 2016

Choose a reason for hiding this comment

nchammas commented Jul 21, 2016 • edited Loading

ar-ms commented Jul 25, 2016

How does add-slaves know what version of Spark to install and other setup to do on the new slaves? Can the user inadvertently create a broken cluster with a bad call to add-slaves?

Can you explain the high-level purpose of entities.generic and how it fits in with the current code base?

nchammas Jul 21, 2016 •

edited

Loading

nchammas commented Jul 21, 2016 •

edited

Loading