Skip to content

Conversation

@sean-smith
Copy link
Contributor

@sean-smith sean-smith commented Jul 20, 2018

New script, scaling.py, scales up based on number of nodes requested in the scheduler's queue. This gets rid of slow scale up times.

I merged in the boto3 branch, only files that need review are scaling/*

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@@ -0,0 +1,5 @@
[scaling]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a sample file? Could you add a comment at the top?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added. This file is used for local testing.

cfn_region=us-east-1
cfn_sqs_queue=https://sqs.us-east-1.amazonaws.com/822857487308/AS-SQS
cfn_instance_slots=3
instance_id=i-04fcafee570cd7073
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this instance_id? Is this file a sample?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this file. It was just used for testing.

args = shlex.split(command)
pending = 0
try:
_output = subprocess.check_output(args)
Copy link
Contributor

@mauri-melato mauri-melato Jul 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas check_output was introduced in Python 2.7 and this is not going to work on Centos6 with Python 2.6.6.
We could make an helper module to cope with this. See also PR #32
(The same comment applies to all the subprocess.check_output below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I can change this function to use subprocess.Popen

slots = run_command(command)
return nodes(slots, instance_type)

# get nodes requested from pending jobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment needs to be changed (the same for the running() functions of the other modules).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

Copy link
Contributor

@mauri-melato mauri-melato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment about the current implementation, I'd suggest to better decouple the scheduler plugin modules from other implementation details and choices.
In particular I wouldn't pass the instance_type down to the sge/slurm/torque but a dict of properties, today just the 'cpus' or 'processors' property.
The scaling module should then cope with the details of retrieving the AWS instance details - today the json file put somewhere it knows, etc.... and then pass the properties downstream,

line_arr = line.split()
print(line_arr)
if len(line_arr) >= 10 and line_arr[9] in status:
pending += int(line_arr[5])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable should be renamed to something like 'njobs' since they are not necessarily pending jobs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

for line in output:
line_arr = line.split()
if len(line_arr) >= 8:
pending += int(line_arr[7])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename in 'njobs' or similar, see similar comment below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

for line in output:
line_arr = line.split()
if len(line_arr) == 2 and line_arr[0] == status:
pending += int(line_arr[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename it, see comment for 'torque'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

Copy link
Contributor

@fnubalaj fnubalaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we no longer supporting openlava.py? Don't see the changes to that file?

On reviewing the cfncluster.cfn.json, I think maybe we should delete openlava.py from the plugins in a separate task?

@sean-smith sean-smith changed the base branch from develop to scale July 25, 2018 17:38
try:
vcpus = int(instances[instance_type]["vcpus"])
log.info("Instance %s has %s slots." % (instance_type, vcpus))
return {'compute_instance': vcpus}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the name of the property of the instance. Tomorrow we may need to pass memory or gpus etc. Avoiding to call it vcpu, we can call it processors, slots ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to slots.

instance_properties = get_instance_properties(instance_type)

# Get number of nodes requested
pending = s.pending(instance_properties)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd name the methods for what they actually return, get_required_nodes() and get_busy_nodes() or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# check to make sure it's in limits
desired = running + pending
if desired > max:
log.info("%d requested nodes is greater than max %d. Requesting max %d." % (desired, max, max))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worth to be a warning level message since we are touching the max limit and some of the pending jobs are not going to be served in the next scale up round..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is normal behaviour. I wouldn't expect it to be a warning.

log = logging.getLogger(__name__)

# get nodes requested from pending jobs
def pending(instance_properties):
Copy link
Contributor

@mauri-melato mauri-melato Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider to "raise an exception" (log an error, have some sort of user notification...) when a pending job requires a single node with a number of processors greater than of the num of processors provided by the instance type?

return -(-slots // vcpus)

# get nodes reserved by running jobs
def running(instance_properties):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're returning the theoretical number of busy hosts computed from the busy slots.
Consider the case we have 2 instances of 2 slots each and 1-slot-job running on each of them. Current ASG-desired = 2.
Then assume a 2-slots-job is submitted:
#busy_hosts = 2/2 = 1
#requested_hosts = 2/2 = 1
==> #new_desired = 1 + 1 = 2 === ASG_current_desired
==> no scaling

We should discuss about the possible scenarios and the best effort approach we want to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. New behaviour counts a whole node as busy if any of the slots are used/reserved.

For example, in the following, there's 8 slots filled which previously would report 2 nodes. With the change it'll report 3 nodes as busy.

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@ip-172-31-16-37.ec2.inte BIP   0/2/4          0.00     lx-amd64
---------------------------------------------------------------------------------
all.q@ip-172-31-19-96.ec2.inte BIP   0/4/4          0.00     lx-amd64
---------------------------------------------------------------------------------
all.q@ip-172-31-20-40.ec2.inte BIP   0/0/4          0.27     lx-amd64
---------------------------------------------------------------------------------
all.q@ip-172-31-92-215.ec2.int BIP   0/2/4          0.00     lx-amd64

_command = shlex.split(command)
try:
DEV_NULL = open(os.devnull, "rb")
process = sub.Popen(_command, env=dict(os.environ), stdout=sub.PIPE, stderr=sub.STDOUT, stdin=DEV_NULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add /usr/bin and the likes to this environment. I see that qstat is not being used with the fullpath, how will qstat be resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

process = sub.Popen(_command, env=dict(os.environ), stdout=sub.PIPE, stderr=sub.STDOUT, stdin=DEV_NULL)
_output = process.communicate()[0]
exitcode = process.poll()
if exitcode != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an issue if there is nothing written to stdout. I think in such cases exitcode is almost always 1. Could you please verify that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just for grep. For qstat, it returns a 0 exit code even with no output.

fnubalaj
fnubalaj previously approved these changes Aug 15, 2018
Copy link
Contributor

@fnubalaj fnubalaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at the comments.

Sean Smith and others added 2 commits August 15, 2018 17:41
Changes the scaling logic from a cloudwatch metric produced
every two minutes to a cronjob on the node that directly sets the
asg values.

Signed-off-by: Sean Smith <seaam@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants