Ansible Roles: Mesosphere DC/OS
A set of Ansible Roles that manage a DC/OS cluster lifecycle on RedHat/CentOS Linux.
To make best use of these roles, your nodes should resemble the Mesosphere recommended way of setting up infrastructure. Depending on your setup, it is expected to deploy to:
- One or more master node ('masters')
- One bootstrap node ('bootstraps')
- Zero or more agent nodes, used for public facing services ('agents_public')
- One or more agent nodes, not used for public facing services ('agents_private')
An example inventory file is provided as shown here:
[bootstraps] bootstrap1-dcos112s.example.com [masters] master1-dcos112s.example.com master2-dcos112s.example.com master3-dcos112s.example.com [agents_private] agent1-dcos112s.example.com remoteagent1-dcos112s.example.com [agents_public] publicagent1-dcos112s.example.com [agents:children] agents_private agents_public [common:children] bootstraps masters agents agents_public
The Mesosphere DC/OS Ansible roles make use of two sets of variables:
- A set of per-node type
- A multi-level dictionary called
dcos, that should be available to all nodes
Per group vars
[bootstraps:vars] node_type=bootstrap [masters:vars] node_type=master dcos_legacy_node_type_name=master [agents_private:vars] node_type=agent dcos_legacy_node_type_name=slave [agents_public:vars] node_type=agent_public dcos_legacy_node_type_name=slave_public
dcos: download: "https://downloads.dcos.io/dcos/stable/1.13.4/dcos_generate_config.sh" download_checksum: "sha256:a3d295de33ad55b10f5dc66c9594d9175a40f5aaec7734d664493968a9f751fd" version: "1.13.4" enterprise_dcos: false selinux_mode: enforcing config: cluster_name: "examplecluster" security: strict bootstrap_url: http://int-bootstrap1-examplecluster.example.com:8080 exhibitor_storage_backend: static master_discovery: static master_list: - 172.31.42.1
Cluster wide variables
|download||REQUIRED||(https) URL to download the Mesosphere DC/OS install from|
|download_checksum||no||Checksum to check the download against. It should start with the method being used. E.g. "sha256:"|
|version||REQUIRED||Version string that reflects the version that the installer (given by
|version_to_upgrade_from||for upgrades||Version string of Mesosphere DC/OS the upgrade procedure expectes to upgrade FROM. A per-version upgrade script will be generated on the bootstrap machine, each cluster node downloads the proper upgrade for its currenly running DC/OS version.|
|image_commit||no||Can be used to force same version / same config upgrades. Mostly useful for deploying/upgrading non-released versions, e.g.
|enterprise_dcos||REQUIRED||Specifies if the installer (given by
|selinux_mode||REQUIRED||Indicates the cluster nodes operating sytems SELinux mode. Mesosphere DC/OS supports running in
|config||yes||Yaml structure that represents a valid Mesosphere DC/OS config.yml, see below.|
DC/OS config.yml parameters
Please see the official Mesosphere DC/OS configuration reference for a full list of possible parameters. There are a few parameters that are used by these roles outside the DC/OS config.yml, specifically:
bootstrap_url: Should point to http://your bootstrap node:8080. Will be used internally and conviniently overwritten for the installer/upgrader to point to a version specific sub-directory.
ip_detect_contents: Is used to determine a user-supplied IP detection script. Overwrites the build-in enviroment detection and usage of a generic AWS and/or on premise script. Official Mesosphere DC/OS ip-detect reference
ip_detect_public_contents: Is used to determine a user-supplied public IP detection script. Overwrites the build-in enviroment detection and usage of a generic AWS and/or on premise script. Official Mesosphere DC/OS ip-detect reference
fault_domain_detect_contents: Is used to determine a user-supplied fault domain detection script. Overwrites the build-in enviroment detection and usage of a generic AWS and/or on premise script.
Ansible dictionary merge behavior caveat
Due to the nested structure of the
dcos configuration, it might be required to set Ansible to 'merge' instead of 'replace', when combining config from multiple places.
# ansible.cfg hash_behaviour = merge
Safeguard during interactive use:
When invoking these roles interactively (for example from the operator's machine), the
DCOS.bootstrap role will require a manual confirmation of the cluster to run against. This is a safeguarding mechanism to avoid unintentional upgrade or config changes. In non-interactive plays, a variable can be set to skip this step, e.g.:
ansible-playbook -e 'dcos_cluster_name_confirmed=True' dcos.yml
Mesosphere DC/OS is a complex system, spanning multiple nodes to form a full multi-node cluster. There are some constraints in making a playbook use the provided roles:
- Order of groups to run their respective roles on (e.g. bootstrap node first, then masters, then agents)
- Concurrency for upgrades (e.g.
serial: 1for master nodes)
dcos.yml playbook can be used as-is for installing and upgrading Mesosphere DC/OS.
Tested OS and Mesosphere DC/OS versions
- CentOS 7, RHEL 7
- DC/OS 1.12, both open as well as enterprise version
This role was created by team SRE @ Mesosphere and others in 2018, based on multiple internal tools and non-public Ansible roles that have been developed internally over the years.