Alarm Subsystem

Ben Langfeld edited this page Apr 11, 2012 · 7 revisions

DEPRECATION NOTICE: This is old documentation relevant to Adhearsion 1.x and will soon be removed. See the main documentation for up-to-date info.

This page details a draft feature under development which cannot be found in any released version of Adhearsion. We find documenting sophisticated features before their implementation helps the design process of those features. Material here can and will change.

Telephony applications are often mission-critical applications. Within Adhearsion, there are things which can go wrong for which the recovery logic is application logic.

The alarm subsystem will allow this recovery logic to be predicted and prepared on a per-application basis. The application developer will have the ability to dispatch notifications of the alarms in the form of XMPP messages, emails, etc. For some applications, an alarm may go out to an operations center and a human uses a web interface to choose the recovery mode from a dropdown menu.

##Types of Alarms##

###Non-blocking alarms###

A non-blocking alarm is mostly available today with the events.rb file. With events.rb, you can register callbacks to handle certain named events that happen within your application. A non-blocking alarm is a kind of event which should be available to a human if need be.

A normal blocking alarm could, by policy, become a non-blocking alarm if a default recovery mode is specified.

###Blocking alarms###

A blocking alarm is one which needs the intervention of a human and usually indicate a tricky failure.

####Examples:####

  • What if the Asterisk Manager Interface connection drops and Adhearsion cannot reconnect?
    • Should Adhearsion itself stop gracefully (so it can be restarted) or should it attempt a reconnect?
    • How frequently should it reconnect?
    • At what point does it give up trying to reconnect?
    • What if a call comes in during the time it's reconnecting?
  • What if the events.rb subsystem's average event processing rate is too slow (possibly because of IO latency) and the events are enormously backlogged?
  • What if one of your components has a serious problem when it's initialized?

These are tricky questions that can only be answered on a per-application basis.

##Framework-level implications##

  • Would be nice to aggregate system state in one big dump so an ops center can handle things with more transparency.
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.