-
Notifications
You must be signed in to change notification settings - Fork 0
CheckDeploymentWorkflow
A unsuspecting user uses somaadm checks create
to send a
check_configuration
to SOMA. This check_configuration
is the entity
that users can interact with, all other derived objects are internally
managed.
The user request is saved as a pending job in the database and
acknowledged to the user 202/Accepted
. It is then put into the
appropriate job queue and processed asynchronously.
Based on the specification in the check_configuration
a check
is
created on the selected object of the tree. If inheritance
was true
and the object has children, the check creation request is passed down
and every child object creates a check as well.
Every tree object evaluates all its checks and their constraints and
creates the appropriare arbitrary amount of check_instances
. Together
with the check_instance
, a check_instance_configuration
is also
created in state awaiting_computation
. Check instance configurations
are versioned and a check instance can have multiple configurations.
For every check_instance_configuration
in state awaiting_computation
the deployment details are assembled. The state transitions into
computed
.
If a check_instance_configuration
is the first configuration for the
instance, it transistions into state awaiting_rollout
and the check
instance is updated with the id of its current configuration.
The update_available
flag is set.
If a previous configuration was found, that configuration is loaded and the deployment details of both are compared. If the new version is the same as the current one, the new configuration is discarded. This deep compare ignores:
- values that must be different, ie. the check instance configuration id and the version number
- array element order, i.e.
[a, b] == [b, a]
is true
If a difference between the two versions was found, the new
configuration is moved into state blocked
. The registered unblocking
condition is the old configuration in state deprovisioned
.
The old configuration is transitioned into state awaiting_deprovision
.
The update_available
flag is set.
This ordering step means that SOMA never sends out an update deployment. If there is a change, the destination monitoring system first receives an undeployment of the exact same deployment details used for the deployment. Due to this, the deployment/undeployment on the client side can be completely stateless. It should also be order independent and idempotent.
This concludes the part of the workflow that is ran as part of the
add_check_to_${foo}
user requested job.
This is the first step by the internal life cycle component that activates every 20 seconds.
It performs three tasks:
- configurations in state
awaiting_rollout
flagged as deleted with activeupdate_available
flag are transitioned directly toawaiting_deletion
since the destination monitoring system has not yet picked them up - configurations flagged as deleted in state
rollout_failed
are transitioned directly toawaiting_deletion
since there is nothing to deprovision - configurations flagged as deleted in state
deprovisioned
are transitioned toawaiting_deletion
During this next step, all configurations in state blocked that belong
to a deleted check instance are transitioned directly to
awaiting_deletion
and their registered unblocking condition deleted.
This step is only executed if there was no error during the previous
remove blocked deleted
step.
Every registered unblock condition is evaluated. If the condition is
true, the condition is deleted and the configuration transitioned to
either awaiting_rollout
or awaiting_deprovision
.
The update_available
flag for the check instance is set.
This step transitions all configurations flagged as deleted in state
active
to state awaiting_deprovision
and sets the update_available
flag on the instance.
This steps takes all check instances with the update_available
flag
set, that are provisioned on a monitoring system with a notification
callback registered. For every available check instance, the monitoring
system receives a poke on its callback.
The update_available
flag is cleared if the poke was successful.
This step is the transition point where a check instance deployment leaves the SOMA application server.
Using the id received with the poke, the destination monitoring system fetches the deployment information from SOMA. This GET request has a side effect and transitions the workflow!
The following transitions can be triggered by request:
awaiting_rollout -> rollout_in_progress
rollout_in_progress -> rollout_in_progress
active -> active
rollout_failed -> rollout_in_progress
awaiting_deprovision -> deprovision_in_progress
deprovision_in_progress -> deprovision_in_progress
deprovision_failed -> deprovision_in_progress
The destination monitoring system must, after processing the deployment request, send feedback about the deployment result. This transitions the check instances as follows:
Feedback: success
rollout_in_progress -> active
deprovision_in_progress -> deprovisioned
Feedback: failed
rollout_in_progress -> rollout_failed
deprovision_in_progress -> deprovision_failed
Monitoring systems that do not have a registered callback address, which requires a REST'ish service that can be contacted to be implemented, can poll SOMA for updates.
This request returns all instance ids that have the update_available
flag set and clears it. This means every deployment is only exactly once
part of of this list.
With this list of IDs, the destination monitoring system can fetch the deployments the same way as if it had received pokes for it.
This request returns all instance ids with configurations in one of the
following states, regardless of the update_available
flag. If the flag
is active, it is cleared.
awaiting_rollout
rollout_in_progress
awaiting_deprovision
deprovision_in_progress
This request can be used to resynchronize pending requests.
A REST'ish configuration service can use it on startup, clean or after a crash, the fetch all pending deployments again. This allows these services to be fully stateless with regards to which deployments they have already fetched.
Sometimes users wish to delete a check configuration via somaadm checks delete
.
The check deletion job deletes the following objects from the in-memory tree:
- all checks for the configuration
- all check instances spawned by those checks
This results in the following objects to be flagged as deleted in the database:
- the check configuration
- all checks derived from the check configuration
- all check instances derived from the checks
At this point the lifeccycle component will pick this up and deprovision
all currently active configurations, ultimately moving them into state
awaiting_deletion
.
At some point we may have to clean up the database of all the things
either in state awaiting_deletion
or simply flagged as deleted. At
that point, we also need to decide how much deleted history to keep
around and whether to simply delete or archive these old records.
That point has not yet come.
- Data Model
- Check Deployment Workflow
- init
- attributes
- create
- delete
- list
- show
- buckets
- create
- delete
- restore
- purge
- freeze
- thaw
- rename
- list
- show
- tree
- property
- add
- delete
- capabilities
- declare
- revoke
- list
- show
- checks
- clusters
- create
- delete
- rename
- list
- show
- tree
- members
- add
- delete
- list
- property
- add
- delete
- datacenters
- add
- remove
- rename
- list
- show
- synclist
- environments
- add
- remove
- rename
- list
- show
- groups
- create
- delete
- rename
- list
- show
- tree
- members
- add
- delete
- list
- property
- add
- delete
- jobs
- list
- show
- local
- outstanding
- update
- list
- prune
- levels
- create
- delete
- list
- show
- metrics
- create
- delete
- list
- show
- modes
- create
- delete
- list
- show
- monitoring
- create
- delete
- list
- show
- nodes
- create
- delete
- purge
- restore
- update
- rename
- repossess
- relocate
- online
- offline
- assign
- list
- synclist
- show
- tree
- config
- property
- add
- delete
- oncall
- add
- remove
- rename
- update
- list
- show
- member
- add
- remove
- list
- permissions
- category
- add
- remove
- list
- show
- add
- remove
- list
- show
- category
- predicates
- create
- delete
- list
- show
- property
- create
- delete
- show
- list
- providers
- create
- delete
- list
- show
- rights
- grant
- global
- system
- revoke
- global
- system
- grant
- repository
- create
- delete
- restore
- purge
- clear
- rename
- repossess
- activate
- list
- show
- tree
- property
- servers
- states
- add
- remove
- rename
- list
- show
- status
- create
- delete
- list
- show
- teams
- add
- remove
- rename
- migrate
- list
- synclist
- show
- update
- types
- add
- remove
- rename
- list
- show
- units
- create
- delete
- list
- show
- users
- create
- delete
- purge
- update
- activate
- password
- list
- show
- synclist
- validity
- create
- delete
- list
- show
- views
- add
- remove
- rename
- list
- show
- ops