Infrastructure cloud auto-scaling for Torque clusters using Phantom.
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
cloud
cluster
etc
lib
policy
.gitignore
LICENSE.txt
README.md
setup.py

README.md

Phorque

Phorque monitors a Torque cluster, executes a policy to determine how many instances to launch or terminate, and then provisions instances on infrastructure clouds via __Ph__antom, an open source auto-scaling service that uses Amazon's auto-scaling API. You can find out more about Phantom here:

http://www.nimbusproject.org/phantom

And, please, fork Phorque.

Running Phorque

Install it:

python setup.py build
python setup.py install

For a list of options:

phorque.py -h

Start it up:

phorque.py

However, usually I run it like so (-d is for debug mode):

phorque.py -d 2>&1 | tee phorque.log

Configuring Phorque

Phorque's configuration file is divided into three sections: [Phorque], [Policy], and [Cloud-Name].

[Phorque] has the following options:

loop_sleep_secs = 120
cluster_directory = /opt/torque-3.0.6/
queue_name = default

loop_sleep_secs is the number of seconds to sleep between each iteration when it queries the cluster queue and the cloud for updates.

cluster_directory is the directory for the cluster software.

queue_name is the name of the queue to query.

[Policy] has the following options:

name = OnDemandPlusPlus
price_per_hour = 5
multiplier = 1

name is the name of the policy to use. It must map to a class name in policy/policies.py.

price_per_hour is the maximum amount of money the policy is allowed to spend per hour (if applicable).

multiplier is a value that's multiplied by the number of instances the policy attempts to launch. So if, for example, the policy determines it should launch 2 instance but multiplier is set to be 8 then 16 instances are launched.

[Cloud-Name] can be specified any number of times (make sure to change Name) and has the following options:

cloud_uri = svc.uc.futuregrid.org
cloud_port = 8444
autoscale_uri = svc.uc.futuregrid.org
autoscale_port = 8445
image_id = debian-6.0.5.gz
price = 0
access_id = $ACCESS_ID
secret_key = $SECRET_KEY
launch_config_name = hotellc@hotel
autoscale_group_name = hotelasg
cloud_type = nimbus
availability_zone = us-east-1
instance_type = m1.large
instance_cores = 2
max_instances = 1024
charge_time_secs = 3600
user_data_file = /etc/phorque/user-data

cloud_uri is the URI for the cloud.

cloud_port is the port for the cloud.

autoscale_uri is the uri for the auto-scale service.

autoscale_port is the port for the auto-scale service.

image_id is the name of the image to launch.

price is the price of the image that will be launched.

access_id is the access ID key for the cloud.

secret_key is the secret key for the cloud.

launch_config_name is the name of the launch configuration to create.

autoscale_group_name is the name of the auto-scale group to create.

cloud_type is the type of cloud (e.g., nimbus).

availability_zone is the cloud availability zone to use.

instance_type is the size of the instance to launch.

instance_cores is the number of cores the instance_type will launch.

max_instances is the maximum number of instances Phorque can launch.

charge_time_secs is the time (in seconds) that instances are charged by the cloud provider (if applicable).

Assumptions

Obviously, because Phorque dynamically launches and terminates instances on infrastructure clouds, you need a mechanism to ensure all nodes in the Torque cluster know about each other and trust each other. Typically this is done via exchanging IP addresses, hostnames, and SSH public keys. Unfortunately, Phorque does not currently provide this capability and therefore you must use your own solution (e.g., burn keys into a image on a trusted cloud, develop a set of scripts to exchange this information at boot, etc.).