Skip to content

Latest commit



81 lines (64 loc) · 17.1 KB

File metadata and controls

81 lines (64 loc) · 17.1 KB

Supported configuration parameters

The playbook in the provision-nginx.yml file in this repository pulls in a set of default values for many of the configuration parameters that are needed to deploy NGINX from the vars/nginx.yml file and the default configuration file (the config.yml file). The parameters defined in these files define a reasonable set of defaults for a fairly generic NGINX deployment, either to a single node or an ensemble, including defaults for the ports that the NGINX instances should listen on, whether or not the instances should support HTTP connections, and the packages that must be installed on the node before the nginx service can be started.

While you may never need to change most of these values from their defaults, there are a fairly large number of these parameters, so a brief summary of what each is and how it is used could be helpful. In this section, we summarize all of these options, breaking them out into:

  • parameters used to control the Ansible playbook run
  • parameters used to configure new nodes that are created in a cloud (AWS or OpenStack) environment
  • parameters used during the deployment process itself, and
  • parameters used to configure our NGINX nodes once NGINX has been installed locally.

Each of these sets of parameters are described in their own section, below.

Parameters used to control the playbook run

The following parameters can be used to control the ansible-playbook run itself, defining things like how Ansible should connect to the nodes involved in the playbook run, which nodes should be targeted, where the NGINX distribution should be downloaded from, which packages must be installed during the deployment process, and where those packages should be obtained from:

  • cloud: this parameter is used to indicate the target cloud for the deployment (either aws or osp); this controls both the role that is used to create new nodes (when a matching set of nodes does not exist in the target environment) and how the build-app-host-groups role retrieves the list of target nodes for the deployment; if unspecified this parameter defaults to the aws value specified in the default configuration file
  • region: this parameter is used to indicate the region that should be searched for matching nodes (and, if no matching nodes are found, the region in which at set of nodes should be created for use as a NGINX ensemble); if unspecified the default value of us-west-2 specified in the config.yml file is used
  • zone: this parameter is used to indicate the availability zone that should be used when creating new nodes in an OpenStack environment; since this parameter is not needed for AWS deployments, there is no default value for this parameter (and any value provided during an AWS deployment will be silently ignored)
  • tenant: this parameter is used to indicate the tenant name to use, either when creating new nodes (when a matching set of nodes does not exist in the target environment) or when searching for a matching set of nodes in the build-app-host-groups role; if unspecified this parameter defaults to the datanexus value specified in the default configuration file
  • project: this parameter is used to indicate the project name to use, either when creating new nodes (when a matching set of nodes does not exist in the target environment) or when searching for a matching set of nodes in the build-app-host-groups role; if unspecified this parameter defaults to the demo value specified in the default configuration file
  • dataflow: this parameter is used to indicate the dataflow name to use, either when creating new nodes (when a matching set of nodes does not exist in the target environment) or when searching for a matching set of nodes in the build-app-host-groups role; the dataflow tag is used to link together the clusters/ensembles (Cassandra, nginx, Kafka, Solr, etc.) that are involved in a given dataflow; if this value is not specified, it defaults to a value of none during the playbook run
  • domain: this parameter is used to indicate the domain name to use (eg. test, production, preprod), either when creating new nodes (when a matching set of nodes does not exist in the target environment) or when searching for a matching set of nodes in the build-app-host-groups role; if unspecified this parameter defaults to the production value specified in the default configuration file
  • cluster: this parameter is used to indicate the cluster name to use, either when creating new nodes (when a matching set of nodes does not exist in the target environment) or when searching for a matching set of nodes in the build-app-host-groups role; this value is used to differentiate clusters of the same type from each other when multiple clusters are deployed for a given application for the same tenant, project, dataflow, and domain; if this value is not specified it defaults to a value of a during the playbook run
  • user: the username that should be used when connecting to the target nodes via SSH; the value for this parameter will likely change from one target environment to the next; if unspecified a value of centos will be used
  • config_file: used to define the location of a configuration file (see the discussion of this topic, below); this file is a YAML file containing definitions for any of the configuration parameters that are described in this section and is more than likely a file that will be created to manage the process of creating a specific ensemble. Storing the settings for a given ensemble in such a file makes it easy to guarantee that all of the nodes in that ensemble are configured consistently. If a value is not specified for this parameter then the default configuration file (the config.yml file) will be used; to override this behavior (and not load a configuration file of any kind), one can simply set the value of this parameter to /dev/null and specify all of the other, non-default parameters that are needed as extra variables during the playbook run
  • private_key_path: used to define the directory where the private keys are maintained when the inventory for the playbook run is being managed dynamically; in these cases, the scripts used to retrieve the dynamic inventory information will return the names of the keys that should be used to access each node, and the playbook will search the directory specified by this parameter to find the corresponding key files. If this value is not specified then the current working directory will be searched for those keys by default

Parameters used to configure nodes created in a cloud environment

When the inventory for the playbook run is being controlled dynamically (i.e. when the deployment is targeting nodes in an AWS or OpenStack environment) and no matching nodes are found, the playbook will actually create a new set of nodes (using the tags that were passed into the playbook run) and configure those nodes as a NGINX ensemble. In that case, there are a number of parameters that must be provided to control the process of node creation:

  • type: the type of node that should be created; if this value is unspecified then a default value of t2.small (suitable for use in the default, AWS deployment) specified in the config.yml file is used
  • image: the image (AMI ID in the case of an AWS deployment or image UUID in the case of an OpenStack deployment) that should be used when creating new nodes; if this parameter is unspecified in an AWS deployment, then the playbook will search for a suitable image to use for the deployment; this parameter must be specified for an OpenStack deployment (and it's value must be the UUID of a pre-existing image that is suitable for use in the playbook run)
  • cidr_block: the CIDR block of the VPC where the nodes should be created in an AWS deployment (or the equivalent in an OpenStack deployment); it is assumed that this VPC (or OpenStack equivalent) already exists; if it is not specified, then the default value of from the config.yml file is used
  • node_map: a list of dictionary entries where each entry specifies the number of nodes to create (the count) for a that application (or for each role in a given aapplication deployment if deployment of the cluster involves the deployment of nodes with different roles, like the seed and non-seed nodes in a Cassandra cluster); for the playbook in this repository the default value for this parameter (which appears in the vars/nginx.yml file) will result in the creation of a three-node NGINX ensemble if no matching nodes were found based on the tags that were passed into the playbook run
  • root_volume: the size (in GB) of the root volume that should be created when building new nodes in an AWS or OpenStack environment; this parameter has a default value that depends on the whether or not there is a corresponding definition for the data_volume parameter (see below):
    • if there is no defined value for the data_volume parameter, then a root volume that is 40GB in size will be created if this parameter is not defined
    • if there is a defined value for the data_volume parameter, then a root volume that is 11GB in size will be created if this parameter is not defined
  • data_volume: the size (in GB) of the data volume that will be created when building new nodes in an AWS or OpenStack environment; if a value is defined for this parameter, a data volume with the corresponding size will be created for each of the instances that are created by the playbook run and those data volumes will then be mounted under the /data directory for each of those instances; if a value is not defined for this parameter then no corresponding data volume will be created (and the nodes that created by the playbook run will only have a single, root volume).
  • application_sg_rules: a list of rules used to configure the firewall associated with the internal and external subnets; for the playbook in this repository the default rules (which should not need to be changed) will result in three ports being open on the internal subnet to support internode communication, election of a new leader, and client connections for any other services that need to use this NGINX ensemble (eg. Kafka clusters, Solr/Fusion clusters, and multi-master Spark deployments all require an external Zookeeper ensemble to manage their state).

Parameters used during the deployment process

These parameters are used to control the deployment process itself, defining things like which packages to install.

  • nginx_package_list: the list of packages that should be installed on the NGINX nodes; typically this parameter is left unchanged from the default (which installs the epel-release and httpd-tools packages), but if it is modified the default, these two packages must be included as part of the new package list or an error will result when attempting to install and configure the nginx package (the first is used to configure the nodes so that the nginx package is available; the second is used to setup privately signed certificates that are needed to support HTTPS requests)

Parameters used to configure the NGINX nodes

These parameters are used configure the NGINX nodes themselves during a playbook run, defining things like the interfaces that NGINX should be listening on for requests and the directory where NGINX should store its data.

  • internal_subnet: the CIDR block describing the subnet that any nodes being created by the playbook run should attach as a private network (eth0); this network is used for internode communications between the nodes of the clusters/ensembles that make up the dataflow being deployed; if it is not specified, then the default value of from the config.yml file is used; if the deployment is an OpenStack deployment then a value for the associated internal_uuid parameter must also be provided, and that value must be the UUID for an existing internal network in the targeted OpenStack environment
  • external_subnet: the CIDR block describing the subnet that any nodes being created by the playbook run should attach as a "public" network (eth1); this network is used to support client connections to the various services that make up the dataflow being deployed; if it is not specified, then the default value of from the config.yml file is used; if the deployment is an OpenStack deployment then a value for the associated external_uuid parameter must also be provided, and that value must be the UUID for an existing external network in the targeted OpenStack environment
  • nginx_virtual_ip: the virtual IP address that the active NGINX instance will be configured to listen on in an active-passive NGINX cluster; this parameter must be specified for multi-node NGINX deployments
  • nginx_admin_user: the name to use for the administrative user account that is created during provisioning when basic authentication is enabled; defaults to admin
  • nginx_power_user: the name to use when constructing an power user account that is created during provisioning when basic authentication is enabled; defaults to power_user
  • nginx_basic_auth: if this flag is set to true, then basic authentication will be enabled on all of the NGINX instances targeted by the deployment for access control
    • in this scenario, a password will be generated for the nginx_admin_user and nginx_power_user, respectively (see above)
    • those auto-generated passwords will be saved in the credentials/{{user}}/password.txt file under the directory where the ansible-playbook command is run (note that in this example, the string {{user}} represents one of the usernames being setup, eg. admin or power_user); defaults to false
  • nginx_https_only: if this flag is set to true, then any HTTP requests received will be redirected as HTTPS requests; defaults to false
  • nginx_country: the country used when constructing the self-signed certificates needed to support HTTPS requests; defaults to US if not specified
  • nginx_state: the state used when constructing the self-signed certificates needed to support HTTPS requests; defaults to CO if not specified
  • nginx_location: the location used when constructing the self-signed certificates needed to support HTTPS requests; defaults to Denver if not specified
  • nginx_org: ; the organization used when constructing the self-signed certificates needed to support HTTPS requests; defaults to IT if not specified

Determining interface names

The playbook in this repository will dynamically determine the names of the interfaces that correspond to the defined internal_subnet and external_subnet CIDR block values and configure the members of the ensemble being deployed to listen on those interfaces, either for communication between the nodes that make up the ensemble or for client requests. This is accomplished by dynamically constructing an iface_description_array parameter within the playbook, then using that parameter to determine the names of the corresponding interfaces and their IP addresses.

Put quite simply, the iface_description_array lets you specify a description for each of the networks that you are interested in, then retrieve the names of those networks on each machine in a variable that can be used elsewhere in the playbook. To accomplish this, the iface_description_array is defined as an array of hashes (one per interface), each of which include the following fields:

  • type: the type of description being provided, currently only the cidr type is supported
  • val: a value describing the network in question; since only cidr descriptions are currently supported, a CIDR value that looks something like should be used for this field
  • as_var: the name of the variable that you would like the interface name returned as

With these values in hand, the playbook will search the available networks on each machine and return a list of the interface names for each network that was described in the iface_description_array as the value of the fact named in the as_var field for that network's entry. For example, given this description:

    iface_description_array: [
        { as_var: 'data_iface', type: 'cidr', val: '' },
        { as_var: 'api_iface', type: 'cidr', val: '' },

In this example, the playbook will determine the name of the network that matches the CIDR blocks and, returning those interface names as the values of the data_iface and api_iface facts, respectively (eg. eth0 and eth1). These two facts are then used later in the playbook to correctly configure the nodes to talk to each other (over the data_iface network) and listen on the proper interfaces for user requests (on the api_iface network).