- Usage
- Methods
- General
- Packaging
- Command execution
- File operations
- Services
- Spinner
- Tasks
- ZooKeeper
- Marathon
- Masters
- Agents
- Network
from shakedown import *
Authenticate against an EE DC/OS cluster using a username and password.
parameter | description | type | default |
---|---|---|---|
username | the username used for DC/OS authentication | str | |
password | the password used for DC/OS authentication | str |
# Authenticate against DC/OS, receive an ACS token
token = authenticate('root', 's3cret')
The URL to the DC/OS cluster under test.
None.
# Print the DC/OS dashboard URL.
dcos_url = dcos_url()
print("Dashboard located at: " + dcos_url)
The URI to a named service.
parameter | description | type | default |
---|---|---|---|
service | the name of the service | str |
# Print the location of the Jenkins service's dashboard
jenkins_url = dcos_service_url('jenkins')
print("Jenkins dashboard located at: " + jenkins_url)
A JSON hash containing DC/OS state information.
None.
# Print state information of DC/OS slaves.
state_json = json.loads(dcos_json_state())
print(state_json['slaves'])
The DC/OS version number.
None.
# Print the DC/OS version.
dcos_version = dcos_version()
print("Cluster is running DC/OS version " + dcos_version)
The DC/OS ACS token (if authenticated).
None.
# Print the DC/OS ACS token.
token = dcos_acs_token()
print("Using token " + token)
The current Mesos master's IP address.
None.
# What's our Mesos master's IP?
master_ip = master_ip()
print("Current Mesos master: " + master_ip)
Install a package.
parameter | description | type | default |
---|---|---|---|
package_name | the name of the package to install | str | |
package_version | the version of the package to install | str | latest |
service_name | custom service name | str | None |
options_file | a file containing options in JSON format | str | None |
options_json | a dict containing options in JSON format | dict | None |
wait_for_completion | wait for service to become healthy before completing? | bool | False |
timeout_sec | how long in seconds to wait before timing out | int | 600 |
# Install the 'jenkins' package; don't wait the service to register
install_package('jenkins')
Install a package, and wait for the service to register.
This method uses the same parameters as install_package()
Uninstall a package.
parameter | description | type | default |
---|---|---|---|
package_name | the name of the package to install | str | |
service_name | custom service name | str | None |
all_instances | uninstall all instances? | bool | False |
wait_for_completion | wait for service to become healthy before completing? | bool | False |
timeout_sec | how long in seconds to wait before timing out | int | 600 |
# Uninstall the 'jenkins' package; don't wait for the service to unregister
uninstall_package('jenkins')
Uninstall a package, and wait for the service to unregister.
This method uses the same parameters as uninstall_package()
Check whether a specified package is currently installed.
parameter | description | type | default |
---|---|---|---|
package_name | the name of the package to install | str | |
service_name | custom service name | str | None |
# Is the 'jenkins' package installed?
if package_installed('jenkins'):
print('Jenkins is installed!')
Add a repository to the list of package sources.
parameter | description | type | default |
---|---|---|---|
repo_name | the name of the repository | str | |
repo_url | the location of the repository | str | |
index | the repository index order | int | -1 |
# Search the Multiverse before any other repositories
add_package_repo('Multiverse', 'https://github.com/mesosphere/multiverse/archive/version-2.x.zip', 0)
Remove a repository from the list of package sources.
parameter | description | type | default |
---|---|---|---|
repo_name | the name of the repository | str |
# No longer search the Multiverse
remove_package_repo('Multiverse')
Retrieve a dictionary describing the configured package source repositories.
None
# Which repository am I searching through first?
repos = get_package_repos()
print("First searching " + repos['repositories'][0]['name'])
Run a command on a remote host via SSH.
parameter | description | type | default |
---|---|---|---|
host | the hostname or IP to run the command on | str | |
command | the command to run | str | |
username | the username used for SSH authentication | str | core |
key_path | the path to the SSH keyfile used for authentication | str | None |
# I wonder what /etc/motd contains on the Mesos master?
exit_status, output = run_command(master_ip(), 'cat /etc/motd')
Run a command on the Mesos master via SSH.
parameter | description | type | default |
---|---|---|---|
command | the command to run | str | |
username | the username used for SSH authentication | str | core |
key_path | the path to the SSH keyfile used for authentication | str | None |
# What kernel is our Mesos master running?
exit_status, output = run_command_on_master('uname -a')
Run a command on a Mesos agent via SSH, proxied via the Mesos master.
This method uses the same parameters as run_command()
Run a command using the dcos
CLI.
parameter | description | type | default |
---|---|---|---|
command | the command to run | str |
# What's the current version of the Jenkins package?
stdout, stderr, return_code = run_dcos_command('package search jenkins --json')
result_json = json.loads(stdout)
print(result_json['packages'][0]['currentVersion'])
Copy a file via SCP.
parameter | description | type | default |
---|---|---|---|
host | the hostname or IP to copy the file to/from | str | |
file_path | the local path to the file to be copied | str | |
remote_path | the remote path to copy the file to | str | . |
username | the username used for SSH authentication | str | core |
key_path | the path to the SSH keyfile used for authentication | str | None |
action | 'put' (default) or 'get' | str | put |
# Copy a datafile onto the Mesos master
copy_file(master_ip(), '/var/data/datafile.txt')
Copy a file to the Mesos master.
parameter | description | type | default |
---|---|---|---|
file_path | the local path to the file to be copied | str | |
remote_path | the remote path to copy the file to | str | . |
username | the username used for SSH authentication | str | core |
key_path | the path to the SSH keyfile used for authentication | str | None |
# Copy a datafile onto the Mesos master
copy_file_to_master('/var/data/datafile.txt')
Copy a file to a Mesos agent, proxied through the Mesos master.
This method uses the same parameters as copy_file()
Copy a file from the Mesos master.
parameter | description | type | default |
---|---|---|---|
remote_path | the remote path of the file to copy | str | |
file_path | the local path to copy the file to | str | . |
username | the username used for SSH authentication | str | core |
key_path | the path to the SSH keyfile used for authentication | str | None |
# Copy a datafile from the Mesos master
copy_file_from_master('/var/data/datafile.txt')
Copy a file from a Mesos agent, proxied through the Mesos master.
parameter | description | type | default |
---|---|---|---|
host | the hostname or IP to copy the file from | str | |
remote_path | the remote path of the file to copy | str | |
file_path | the local path to copy the file to | str | . |
username | the username used for SSH authentication | str | core |
key_path | the path to the SSH keyfile used for authentication | str | None |
# Copy a datafile from an agent running Jenkins
service_ips = get_service_ips('marathon', 'jenkins')
for host in service_ips:
assert copy_file_from_agent(host, '/home/jenkins/datafile.txt')
Retrieve a dictionary describing a named service.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str | |
inactive | include inactive services? | bool | False |
completed | include completed services? | bool | False |
# Tell me about the 'jenkins' service
jenkins = get_service('jenkins')
Get the framework ID of a named service.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str | |
inactive | include inactive services? | bool | False |
completed | include completed services? | bool | False |
# What is the framework ID for the 'jenkins' service?
jenkins_framework_id = get_framework_id('jenkins')
Get a dictionary describing a named service task.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str | |
task_name | the name of the task | str | |
inactive | include inactive services? | bool | False |
completed | include completed services? | bool | False |
# Tell me about marathon's 'jenkins' task
jenkins_tasks = get_service_task('marathon', 'jenkins')
Get a list of task IDs associated with a named service.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str | |
inactive | include inactive services? | bool | False |
completed | include completed services? | bool | False |
# What's marathon doing right now?
service_tasks = get_service_tasks('marathon')
Get a dictionary describing a named Marathon task.
parameter | description | type | default |
---|---|---|---|
task_name | the name of the task | str | |
inactive | include inactive services? | bool | False |
completed | include completed services? | bool | False |
# Tell me about marathon's 'jenkins' task
jenkins_tasks = get_marathon_task('jenkins')
Get a list of Marathon tasks.
parameter | description | type | default |
---|---|---|---|
inactive | include inactive services? | bool | False |
completed | include completed services? | bool | False |
# What's marathon doing right now?
service_tasks = get_marathon_tasks()
Get a set of the IPs associated with a service.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str | |
task_name | the name of the task to limit results to | str | None |
inactive | include inactive services? | bool | False |
completed | include completed services? | bool | False |
# Get all IPs associated with the 'chronos' task running in the 'marathon' service
service_ips = get_service_ips('marathon', 'chronos')
print('service_ips: ' + str(service_ips))
Check whether a specified service is currently healthy.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str |
# Is the 'jenkins' service healthy?
if service_healthy('jenkins'):
print('Jenkins is healthy!')
Checks the service url returns HTTP 200 within a timeout if available it returns true on expiration it returns false.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str | |
timeout_sec | how long in seconds to wait before timing out | int | 120 |
# will wait
wait_for_service_endpoint("marathon-user")
Checks the service url returns HTTP 500 within a timeout if available it returns true on expiration it returns time to remove.
parameter | description | type | default |
---|---|---|---|
service_name | the name of the service | str | |
timeout_sec | how long in seconds to wait before timing out | int | 120 |
# will wait
wait_for_service_endpoint_removal("marathon-user")
Waits for a function to return true or times out.
parameter | description | type | default |
---|---|---|---|
predicate | the predicate function | fn | |
timeout_seconds | how long in seconds to wait before timing out | int | 120 |
sleep_seconds | time to sleep between multiple calls to predicate | int | 1 |
ignore_exceptions | ignore exceptions thrown by predicate | bool | True |
inverse_predicate | if True look for False from predicate | bool | False |
# simple predicate
def deployment_predicate(client=None):
...
wait_for(deployment_predicate, timeout)
# predicate with a parameter
def service_available_predicate(service_name):
...
wait_for(lambda: service_available_predicate(service_name), timeout_seconds=timeout_sec)
Waits for a function to return true or times out. Returns the elapsed time of wait.
parameter | description | type | default |
---|---|---|---|
predicate | the predicate function | fn | |
timeout_seconds | how long in seconds to wait before timing out | int | 120 |
sleep_seconds | time to sleep between multiple calls to predicate | int | 1 |
ignore_exceptions | ignore exceptions thrown by predicate | bool | True |
inverse_predicate | if True look for False from predicate | bool | False |
# simple predicate
def deployment_predicate(client=None):
...
time_wait(deployment_predicate, timeout)
# predicate with a parameter
def service_available_predicate(service_name):
...
time_wait(lambda: service_available_predicate(service_name), timeout_seconds=timeout_sec)
returns the time difference with a given precision.
parameter | description | type | default |
---|---|---|---|
start | the start time | time | |
end | end time, if not provided current time is used | time | None |
precision | the number decimal places to maintain | int | 3 |
# will wait
elapse_time("marathon-user")
Get information about a task.
This method uses the same parameters as get_tasks()
Get a list of tasks, optionally filtered by task ID.
parameter | description | type | default |
---|---|---|---|
task_id | task ID | str | |
completed | include completed tasks? | True |
# What tasks have been run?
tasks = get_tasks()
for task in tasks:
print("{} has state {}".format(task['id'], task['state']))
Get a list of active tasks, optionally filtered by task name.
parameter | description | type | default |
---|---|---|---|
task_id | task ID | str | |
completed | include completed tasks? | False |
# What tasks are running?
tasks = get_active_tasks()
for task in tasks:
print("{} has state {}".format(task['id'], task['state']))
Check whether a task has completed.
parameter | description | type | default |
---|---|---|---|
task_id | task ID | str |
# Wait for task 'driver-20160517222552-0072' to complete
while not task_completed('driver-20160517222552-0072'):
print('Task not complete; sleeping...')
time.sleep(5)
Wait for a task to be reported running by Mesos. Returns the elapsed time of wait.
parameter | description | type | default |
---|---|---|---|
service | framework service name | str | |
task | task name | str | |
timeout_sec | timeout | int | 120 |
wait_for_task('marathon', 'marathon-user')
Wait for a task to be report having a specific property. Returns the elapsed time of wait.
parameter | description | type | default |
---|---|---|---|
service | framework service name | str | |
task | task name | str | |
prop | property name | str | |
timeout_sec | timeout | int | 120 |
wait_for_task_property('marathon', 'chronos', 'resources')
Wait for a task to be reported having a property with a specific value. Returns the elapsed time of wait.
parameter | description | type | default |
---|---|---|---|
service | framework service name | str | |
task | task name | str | |
prop | property name | str | |
value | value of property | str | |
timeout_sec | timeout | int | 120 |
wait_for_task_property_value('marathon', 'marathon-user', 'state', 'TASK_RUNNING')
Wait for a task dns. Returns the elapsed time of wait.
parameter | description | type | default |
---|---|---|---|
name | dns name | str | |
timeout_sec | timeout | int | 120 |
wait_for_dns('marathon-user.marathon.mesos')
Delete a named ZooKeeper node.
parameter | description | type | default |
---|---|---|---|
node_name | the name of the node | str |
# Delete a 'universe/marathon-user' ZooKeeper node
delete_zk_node('universe/marathon-user')
Get data for a Zookeeper node.
parameter | description | type | default |
---|---|---|---|
node_name | the name of the node | str |
# Get data for a 'universe/marathon-user' ZooKeeper node
get_zk_node_data('universe/marathon-user')
Waits for Marathon Deployment to complete or times out.
parameter | description | type | default |
---|---|---|---|
timeout | max time to wait for deployment | int | 120 |
# assuming a client.add_app() or similar
deployment_wait()
Deletes all apps running on Marathon.
None.
delete_all_apps()
Deletes all apps running on Marathon and waits for deployment to finish.
None.
delete_all_apps_wait()
Separates the master from the cluster by disabling inbound and/or outbound traffic.
parameter | description | type | default |
---|---|---|---|
incoming | disable incoming traffic? | bool | True |
outgoing | disable outgoing traffic? | bool | True |
# Disable incoming traffic ONLY to the DC/OS master.
partition_master(True, False)
Reconnect a previously partitioned master to the network
None.
# Reconnect the master.
reconnect_master()
Retrieve a list of all agent node IP addresses.
None
# What do I look like in IP space?
nodes = get_agents()
print("Node IP addresses: " + nodes)
Retrieve a list of all private agent node IP addresses.
None
# What do I look like in IP space?
private_nodes = get_private_agents()
print("Private IP addresses: " + private_nodes)
Retrieve a list of all public agent node IP addresses.
None
# What do I look like in IP space?
public_nodes = get_public_agents()
print("Public IP addresses: " + public_nodes)
Separates the agent from the cluster by adjusting IPTables with the following:
sudo iptables -F INPUT
sudo iptables -I INPUT -p tcp --dport 22 -j ACCEPT
sudo iptables -I INPUT -p icmp -j ACCEPT
sudo iptables -I OUTPUT -p tcp --sport 5051 -j REJECT
sudo iptables -A INPUT -j REJECT
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# Partition all the public nodes
public_nodes = get_public_agents()
for public_node in public_nodes:
partition_agent(public_node)
Reconnects a previously partitioned agent by reversing the IPTable changes.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# Reconnect the public agents
for public_node in public_nodes:
reconnect_agent(public_node)
Restarts an agent process at the host.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# Reconnect the public agents
for public_node in public_nodes:
restart_agent(public_node)
Stops an agent process at the host.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# Reconnect the public agents
for public_node in public_nodes:
stop_agent(public_node)
Start an agent process at the host.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# Reconnect the public agents
for public_node in public_nodes:
start_agent(public_node)
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# Delete agent logs on the public agents
for public_node in public_nodes:
delete_agent_log(public_node)
Kill the process(es) matching pattern at ip. This will potentially kill infrastructure processes.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str | |
pattern | A regular expression matching the name of the process to | ||
kill | str |
# kill java on the public agents
for public_node in public_nodes:
kill_process_on_host(public_node, "java")
Managed context which will disconnect an agent for the duration of the context then restore the agent
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# disconnects agent
with disconnected_agent(host):
service_delay()
# agent is reconnected
wait_for_service_url(PACKAGE_APP_ID)
Managed context which will disconnect the master for the duration of the context then restore the master
None
# disconnects agent
with disconnected_master(host):
service_delay()
# master is reconnected
wait_for_service_url(PACKAGE_APP_ID)
Managed context which will save the firewall rules then restore them at the end of the context for the host.
It calls save_iptables
before the context and restore_iptables
and the end of the context.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# disconnects agent
with iptable_rules(shakedown.master_ip()):
block_port(host, port)
time.sleep(7)
# firewalls restored
wait_for_service_url(PACKAGE_APP_ID)
Reverses and restores saved iptable rules. It works with save_iptables
.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# disconnects agent
restore_iptables(host)
Saves the current iptables to a file on the host.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# disconnects agent
save_iptables(host)
Flushes the iptables rules for the host. sudo iptables -F INPUT
. Consider using save_iptables
prior to use.
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# disconnects agent
flush_all_rules(host)
Removes iptable rules allow full access. Consider using save_iptables
prior to using.
sudo iptables --policy INPUT ACCEPT && sudo iptables --policy OUTPUT ACCEPT && sudo iptables --policy FORWARD ACCEPT'
parameter | description | type | default |
---|---|---|---|
hostname | the hostname or IP of the node | str |
# disconnects agent
allow_all_traffic(host)