Skip to content

Commit

Permalink
Updating readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Digant C Kasundra committed May 20, 2015
1 parent 588c781 commit ded1b79
Show file tree
Hide file tree
Showing 2 changed files with 96 additions and 5 deletions.
89 changes: 87 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,87 @@
# hermes
An event and autotasking system for SRE.
= Introduction =

Hermes logs events, generates tasks, and tracks tasks in logical groups.

= Terminology =

Rather than mimic the overloaded and overused terminology typically used, and in keeping with the Dropbox principal of "cupcake," Hermes adopts a more interesting language.

== Events and Event Types ==

Events double as journal entries, logging system activities like server restarts, and requests for action, such as a need to restart or turn off a server.

As journal entries, events provide an audit trail and can potentially be used to track a range of activities. As request entries, events can initialize labors and subsequent events would close these labors.

Each event must be of a predefined event type. An event type consists of a category and state, the combination of which provides meaningful grouping and definition:

```
ID CATEGORY STATE
[1] system-reboot required
[2] system-reboot completed
[3] system-maintenance required
[4] system-maintenance ready
[5] system-maintenance completed
```

Event types are often written simply as `category-state`, such as `system-reboot-required`.

An individual event entry consists of the event type, the host, and the time of occurrence.

== Labors ==

Labors represent tasks that need to be performed or outstanding issues that need to be addressed for a host. All labors are created and closed as the result of events.

Labors are usually referred to by the event which triggered its creation, so a `system-reboot-required` event creates a `system-reboot-required` labor.

== Fates ==
=== Basics ===
The fates define how labors are created and completed. A typical fate will specify which event type will result in the creation of a labor for the host, and which event type will close labors for a host.

```
[1] system-reboot-required => system-reboot-completed
```

=== Chained Fates ===
An `intermediate` flag in the definition of a fate indicates if the fate only applies to existing labors. This allows fates to be chained together to essentially create a workflow engine.

For example:
```
[1] system-maintenance-required => system-maintenance-ready
[2] system-maintenance-ready => system-maintenance-completed
```

(with the second fate being flagged as an intermediate) would essentially mean:

```
system-maintenance-required => system-maintenance-ready => system-maintenance-completed
```

In this example, an event of type `system-maintenance-ready` only creates a labor if an existing labor created by an event of type `system-maintenance-required` was present.

=== Choose Your Own Adventure ===

Fates can allow multiple ways to resolve a labor.

```
[1] puppet-restart-required => puppet-restart-completed
[2] puppet-restart-required => system-restart-completed
```

In this example, a labor created by the event `puppet-restart-required` can be completed by either a `puppet-restart-completed` event, or a `system-restart-completed` event.

== Quests ==

Quests are collections of labors, making tracking and reporting of progress much easier.

For example, when a security fix is released that requires all web servers to be restarted, a quest can be created with a `system-restart-required` labor for all the hosts.

Quests will eventually contain information to outside references, such as Jira tickets.

= Status =

Development is in the early phases. The first production roll-out of Hermes will offer:

* **Hermes server:** a central server, run by SysEng, with a REST API
* **Hermes CLI:** a command line interface to the Hermes server available on any and all necessary servers

Development can be tracked at [[ https://github.com/dropbox/hermes | GitHub ]] and [[ https://travis-ci.org/dropbox/hermes | Travis CI ]]
12 changes: 9 additions & 3 deletions hermes/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,8 @@ def question_the_fates(cls, session, event):
# Examine all the Fates.
for fate in fates:
# If this type of Event is a creation type for a Fate,
# flag that we need to create labors
# flag that we need to create labors. We also need to track if
# this is a creation type for an intermediate fate or not.
if event_type == fate.creation_event_type:
if fate.intermediate:
should_create_if_intermediate = True
Expand All @@ -479,12 +480,17 @@ def question_the_fates(cls, session, event):
labors_to_close.append(open_labor)
should_close = True

# If we need to create a labor because of a non-intermediate fate,
# create that now
if should_create:
print "**** CREATING RAW LABOR"
new_labor = Labor.create(session, host, event)

# If we need to close some labors, lets do that now
if should_close:
print "*** SHOULD CLOSE"
# We will examine each labor that needs to get closed. If we are
# also supposed to create a labor because of an intermediate fate,
# we will do that now and tie the new labor to the quest of the
# labor we are closing, if it exists
for labor in labors_to_close:
if should_create_if_intermediate:
new_labor = Labor.create(session, host, event)
Expand Down

0 comments on commit ded1b79

Please sign in to comment.