Skip to content

Commit

Permalink
down the rabbit hole.
Browse files Browse the repository at this point in the history
  • Loading branch information
Sumit Jamgade committed Mar 30, 2017
1 parent 5d362d3 commit 7f5ba7d
Showing 1 changed file with 107 additions and 52 deletions.
159 changes: 107 additions & 52 deletions doc/barclamp.md
Original file line number Diff line number Diff line change
Expand Up @@ -1357,91 +1357,146 @@ and/or
[qa_crowbarsetup.sh](https://github.com/SUSE-Cloud/automation/blob/master/scripts/qa_crobarsetup.sh)
to take care of your barclamp's needs.

## High Availability (HA) Integration

Using the [barbican barclamp](https://github.com/crowbar/crowbar-openstack/tree/9805c7ddb81ffbf6f2b79061c6ceb332659cd614/chef/cookbooks/barbican)
as an example. We will try to investigate what changes are required to make a barclamp
HA compatible

[This commit](https://github.com/crowbar/crowbar-openstack/pull/512/files) is
the first place we start looking. It was not entirely correct and needed
[more commits](https://github.com/crowbar/crowbar-openstack/commits/master/chef/cookbooks/barbican)
to correct.

- [default.rb](https://github.com/crowbar/crowbar-openstack/commits/master/chef/cookbooks/barbican/attributes/default.rb)

This file defines the default for all the values of the vars required by barclamp.
It allows you to set defaults (or overrides) for any attributes in the node object.
In this case we will use that capability to set a bunch of HA related attributes that
are rarely changed an, hence need not be put int the data bag.

###############IGNORE######################

```ruby
default[:barbican][:ha]

# HA attributes
[:enabled] = false

## High Availability (HA) Integration
# Ports to bind to when haproxy is used for the real ports
[:ports][:api] = 5621

# TODO:
Things you need:

- Already tested barclamp waiting to be come HA compliant.
- A set of nodes ready to become cluster.
# pacemaker definitions
[:api][:op][:monitor][:interval] = "10s"
[:worker][:op][:monitor][:interval] = "10s"
[:keystone_listener][:op][:monitor][:interval] = "10s"

[:worker][:agent] = "systemd:openstack-barbican-worker"
[:keystone_listener][:agent] = "systemd:openstack-barbican-keystone-listener"

```
> - Resource Agent: In 'LayMan' terms, resource agent is a script that the
cluster manager uses to manage the resource.
> - LSB: is a specification of writing resource agent.
> - OCF: Pacemaker specific extension to LSB.
> - systemd: Use systemd unit files for managing the resource.
# TODO:
Define
- HA
- Pacemaker
- LSB
- [barbican_service.rb(cookbook)](https://github.com/crowbar/crowbar-openstack/commits/master/chef/cookbooks/barbican/definitions)

Makes the installation of the service HA aware, by using [```provider Chef::Provider::CrowbarPacemakerService```](https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb)
(Beware internal documentation on link) By adding that line we tell chef to
use **service resource** from that provider and not the default one. The file has huge documentation
and is good read for some internal interactions between chef and pacemaker. Chef tries to manage
resources locally, while pacemaker operations are cluster-wide. So we need special ways
to interact with pacemaker about status of clone/group/service and its configuration.


[This commit](https://github.com/crowbar/crowbar-openstack/pull/512/files) has more or less everything that makes a barclamp HA compatible. So we can just base our understanding around that commit.
- [helpers.rb](https://github.com/crowbar/crowbar-openstack/commits/master/chef/cookbooks/barbican/libraries/)

- [default.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-f9a1e9d161032b6c0015336ab6812c14) This file defines the default for all the values of the vars required by barclamp.
Here we define a helper class to initialize the variables/ports correctly ```if ha_enabled```

```ruby
default[:barbican][:ha]
# port for ha listner.
[:ports][:api]
# check interval for services
[:api][:op][:monitor][:interval] = "10s"
[:worker][:op][:monitor][:interval] = "10s"
[:keystone_listener][:op][:monitor][:interval] = "10s"

# pacemaker resource agent. (SearchEngine:Pacemaker lsb)
[:api][:agent] = "lsb:openstack-barbican-api"
[:worker][:agent] = "lsb:openstack-barbican-worker"
[:keystone_listener][:agent] = "lsb:openstack-barbican-keystone-listener"
- [api.rb](https://github.com/crowbar/crowbar-openstack/commits/master/chef/cookbooks/barbican/recipes/api.rb)

```
First make sure the service and user are correctly registered witht keystone, use keystone_setting_helper to get keystone endpoint details. Then register the endpoint with keystone, using appropriate port.
```
If !ha => use the standard port
else => use port where ha is listening
...
bind api on correct port accordingly
```
If using ha, host address will be from admin network, as HAProxy will listen
virtual-ip-address form admin network, as the node is part of the cluster.
Otherwise just register on '*'

- [common.rb](https://github.com/crowbar/crowbar-openstack/commits/fb4cd02a09ba35182e401cac48858e2e5d1d5508/chef/cookbooks/barbican/recipes/common.rb)

Common recipe is executed for all services and on all nodes of a cluster in
parallel. So in case of ha, the db/user should only be created once, so make
sure only ```cluster_founder``` allowed to do these tasks and create sychronization
markers around them. Also be sure to listen for request to VIP, as request will be
sent to VIP in case of HA.(which is actually the cluster address where HAproxy is
listening)




- [barbican_service.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-3f95710598349c27e4d1e79ba4cd7826)
Makes the installation of the service HA aware, by using [```provider Chef::Provider::CrowbarPacemakerService```](https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb) (Beware internal documentation on link) By adding that line we tell chef to use **service resource** from that provider and not the default one.
> To make sure the other nodes do not start accessing the database before the
founder has created the database. **sync_marks** are created. This is part of
the **crowbar_pacemaker**
([provider](https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/sync_mark.rb),
[resource](https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/resources/sync_mark.rb))
cookbook on the [crowbar-ha](https://github.com/crowbar/crowbar-ha)
> These marks help provide synchronization between the nodes while chef is
applying the states on the nodes. This synchronization can be of two types.
> - Wait and watch (**wait**-_ActionToBeDone_ - **create**-_ActionToBeDone_) ([example](https://github.com/crowbar/crowbar-openstack/blob/1718ebdcd5071ceac796744a33faaaddbdf71d73/chef/cookbooks/barbican/recipes/common.rb#L44-L101))
To be more precise these markers make sure that certain section of a cookbook
which should be **entered and finished by founder node** and other nodes should
wait until then.
All non-founder nodes wait till they see _ActionToBeDone_ while founder node
keep going. It is upto the founder to do **create-_ActionToBeDone_**. So
wait and create always go in **pairs**. And if founder does not do create-_ActionToBeDone_
non-founders will timeout.
> - Wait and start together ([example](https://github.com/crowbar/crowbar-openstack/blob/1718ebdcd5071ceac796744a33faaaddbdf71d73/chef/cookbooks/glance/recipes/ha.rb#L46))
As the name says, let everyone reach here. No node can proceed unless all the
other nodes have reached here.

- [helpers.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-744d98e238ae12f454d5f6e0781624d5)
Here we define a helper class to initialize the variables/ports correctly ```if ha_enabled```

- [api.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-f9e37f7c8ba0f34831a43d2f89d87722), here we register the endpoint with keystone, using appropriate port.
```bash
If : not using ha, use the standard port
else: use port where ha is listening
...
bind api on correct port accordingly
```
- [ha.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-0605880109e91323b0a511b5506433ee)

This is a complicated piece because in case of chef the state of the object is
changed only if it needs to change. So in case of upgrade we need to figure out
what has changed and ask chef to apply those changes. In most cases since we are
using the built-in resources and so chef is able to figure out most of the changes
and apply them. In case of pacemaker we need to figure what primitives, groups,
and clones have been created/changed. So for this custom resources and providers
are created considering pacemaker as the cluster manager.

[common.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-fb3b92d9c323babdb9c28ba1a019c1de), common recipe is executed for all services and on all nodes of a cluster in parallel. So in case of ha, the db/user should only be created once, so make sure only ```cluster_founder```

To make sure the other nodes do not start accessing the database before the founder has created the database. **sync_marks** are created. This is part of the **crowbar_pacemaker** ([provider](https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/sync_mark.rb), [resource](https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/resources/sync_mark.rb)) cookbook on the [crowbar-ha](https://github.com/crowbar/crowbar-ha)
- [role_barbican_controller.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-62c9844abda53dc328542cea19d7a1ef)

These marks help provide synchronization between the nodes while chef is applying the states on the nodes. This synchronization can be of two types.
Include this recipe by default while applying the cookbook. The recipe(ha.rb) has a
check for ```ha_enaled``` at the top. If not the it just "logs-out"

- Wait and watch (**wait**-_ActionToBeDone_ - **create**-_ActionToBeDone_) ([example](https://github.com/crowbar/crowbar-openstack/blob/1718ebdcd5071ceac796744a33faaaddbdf71d73/chef/cookbooks/barbican/recipes/common.rb#L44-L101))
To be more precise these markers make sure that certain section of a cookbook which should be **entered and finished by founder node** and other nodes should wait until then.
All non-founder nodes wait till they see _ActionToBeDone_ while founder node keep going. It is upto the founder to do **create-_ActionToBeDone_**. So wait and create always go in **pairs**. And if founder does not do create-_ActionToBeDone_ non-founders will timeout.
- [barbican_service.rb(webapp)](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-5da3f802a2cc89670c00fd0f53ff10b2)

- Wait and start together ([example](https://github.com/crowbar/crowbar-openstack/blob/1718ebdcd5071ceac796744a33faaaddbdf71d73/chef/cookbooks/glance/recipes/ha.rb#L46))
As the name says, let everyone reach here. No node can proceed unless all the other nodes have reached here.
Almost all of it is book keeping. And all of it is documented [here in
pacemaker_service_object](https://github.com/crowbar/crowbar-ha/blob/2adc8529c215c32772a09fcb8e9b84fb01b1a1dc/crowbar_framework/app/models/pacemaker_service_object.rb)

- [role_expand_elements](https://github.com/crowbar/crowbar-ha/blob/2adc8529c215c32772a09fcb8e9b84fb01b1a1dc/crowbar_framework/app/models/pacemaker_service_object.rb#L187)
Get elements(list of nodes. including clusters) for this role

[ha.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-0605880109e91323b0a511b5506433ee)
- [Openstack::HA.set_controller_role](https://github.com/sjamgade/crowbar-openstack/blob/87947ecb4cdb56a6ecf8cfc3fec1bde7c77762fb/chef/cookbooks/crowbar-openstack/libraries/ha_helpers.rb)
set server_nodes as controller so that, when the either of the nodes become
cluster-founder, it is capable to run
[apache2](https://github.com/crowbar/crowbar-ha/blob/2adc8529c215c32772a09fcb8e9b84fb01b1a1dc/chef/cookbooks/crowbar-pacemaker/recipes/apache.rb#L85)/[haproxy](https://github.com/crowbar/crowbar-ha/blob/2adc8529c215c32772a09fcb8e9b84fb01b1a1dc/chef/cookbooks/crowbar-pacemaker/recipes/haproxy.rb#L87).
The links point to the recipes locations where pacemaker
[location](https://github.com/sjamgade/crowbar-openstack/blob/87947ecb4cdb56a6ecf8cfc3fec1bde7c77762fb/chef/cookbooks/crowbar-openstack/definitions/openstack_pacemaker_controller_only_location_for.rb) [constraint](https://github.com/sjamgade/crowbar-openstack/blob/87947ecb4cdb56a6ecf8cfc3fec1bde7c77762fb/chef/cookbooks/crowbar-openstack/libraries/ha_helpers.rb)
has been set.

- [prepare_role_for_ha_with_proxy](https://github.com/crowbar/crowbar-ha/blob/2adc8529c215c32772a09fcb8e9b84fb01b1a1dc/crowbar_framework/app/models/pacemaker_service_object.rb#L187)

[role_barbican_controller.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-62c9844abda53dc328542cea19d7a1ef)

[barbican_service.rb](https://github.com/crowbar/crowbar-openstack/pull/512/files#diff-5da3f802a2cc89670c00fd0f53ff10b2)


## Help patch the existing barclamp
## Help patch the existing barclamp (???)
Things you need:
- Already tested barclamp waiting to be come HA compliant.
- A set of nodes ready to become cluster.

0 comments on commit 7f5ba7d

Please sign in to comment.