Skip to content
evangoer edited this page Oct 14, 2010 · 1 revision

Concepts

Pogo is an agent-based system for running interruptive commands safely on thousands of machines in parallel. It is designed to maximize parallelism and minimize service interruptions during deployments, with a minimum of babysitting.

You invoke remote commands using the pogo command line client. Remote commands operate:

  • on multiple groups of remote hosts. The most basic way to configure your hosts is to explicitly list them in a configuration file. You can also fetch host names from remote systems using a Pogo plugin.
  • according to constraints guaranteed by the Pogo server, which is responsible for dispatching and managing Pogo commands. For example, you can declare that for a certain type of host, only 50% of the machines in a cluster can be down at once.

Dispatching Commands

In Pogo, the most fundamental way to group your hosts is by application and environment. An application represents the basic software functionality that the host provides, such as "front end webserver" or "database server" or "load balancer". An environment can represent the host's location ("the 'foo' cluster"), or perhaps its mode of operation ("staging", "production").

The Pogo server uses this information to populate a matrix of applications and environments. Intersections of application and environment are called buckets. The server then applies limits to each bucket based on the constraints you provided. If the Pogo server has room in all applicable buckets, it allows the Pogo client to invoke its remote command. If no room is available, the server instructs the client to wait and try again later. After the command exits, the client propagates the exit status to the server, which (upon success) removes the host from all applicable buckets, making room for another host to restore.

If the command fails, the host continues to hold its buckets. This prevents a failed installation from reducing service availability beyond the constraints configured in the Pogo server. If enough hosts fail to fill all the available buckets, the job becomes deadlocked to prevent an entire cluster of machines going down, and the Pogo job fails early without running the bad command everywhere.

Global locking

Pogo maintains lock slots globally. This means if two Pogo jobs run on the same set of hosts at the same time, they will run in parallel and the constraints will be maintained not only within the jobs but also between them. For example, if a farm is configured for only three hosts down at once, then the two jobs will "race" for the three available slots; only three hosts will be down, no matter how many jobs are vying to run commands on them. The order in which they execute is thus arbitrary.

Hooks

If run with the --hooks option, Pogo can run pre- and post-command hooks on a remote host. Before pogo runs your command, it opens [TODO fix path]/etc/pogo/pre.d/ and executes any scripts that exist there in order, similar to rc.d. If any pre-command script exits non-zero, the host is marked as failed, and Pogo does not invoke its main command.

Similarly, after a command runs, Pogo opens [TODO fix path]/etc/pogo/post.d/ and executes any scripts that exist there in order. If any post-command script exits non-zero, the host is marked as failed, even if the main Pogo command ran successfully. If the main Pogo command itself exits non-zero, Pogo does not run post-command scripts.

A common use-case for a pre-hook is taking a host out of rotation. This should be done with no user impact, and a post-hook script should then be used to return the host to rotation. A common use-case for post-hook is checking service health, ensuring that the host is up and serving traffic. Ideally you would verify service health before returning the host to rotation.

Since Pogo runs as the user that issued the command, you must use sudo if you need elevated privileges to run your hook scripts. One trick to remember is that you can use sudo in a hash-bang: #!/usr/local/bin/sudo /usr/local/bin/perl