HALoadBalancer

Status

The page provides documentation for a load balancer for the HA replication cluster that is currently under development in the RDR branch. We anticipate GA for this feature in bigdata 1.3.1.

See the HAJournalServer for how to configure and deploy an HA replication cluster.

Background

The HA replication cluster uses a quorum model. Once a quorum (bare majority or better) has been established, one of the services is elected as the leader. The others are elected followers. All update requests must be directed to the quorum leader. Any service that is joined with the met quorum can service reads.

The HALoadBalancerServlet provides a transparent proxy. Clients can use any LBS end point on the cluster and requests will be automatically proxied to the leader (for updates) and load balanced across the joined services (leader + followers) for query. This greatly simplifies deployments and reduced the coupling within the client to the HA architecture.

If the quorum is NOT met, then any request made against the load balancer will fail.

Configuration

The HALoadBalancerServlet is declared in web.xml. By default, it responds at

/bigdata/LBS/*

and rewrites any request to

/bigdata

The behavior is configured in web.xml.

This sets the external URL at which the load balancer will accept requests. Any request directed to the pattern in this servlet mapping will be automatically proxied to an appropriate service.

Load Balancer /LBS/*

The following init-param must be consistent with the serlvet mapping. It specifies the prefix that will be stripped from the requestURL when the request is rewritten.

prefix /bigdata/LBS The prefix that will be stripped from the request. This must correspond to the URL-pattern, without the trailing wildcard. The standard value is "/bigdata/LBS".

Service Discovery

The HALoadBalancerServlet automatically identifies the HAJournalServers in the HA replication cluster, the hostname, and the configured HTTP port. All such services must have the same ContentPath, which is typically /bigdata.� No configuration is required for the load balancer to correctly form requests to reach the other services in the replication cluster.

Host Metrics

The HALoadBalancerServlet relies on the ganglia peer-to-peer metric reporting system to collect information about the actual load on the hosts in the cluster. It makes a load balancing decision by consulting a maintained list of hosts and their associated load and directs read requests to hosts based on that model. Custom load balancer policies may be declared using the policy init-param for the HALoadBalancerServlet.

In order to collect host metrics, both the PlatformStatsPlugIn and the GangliaPlugIn must be enabled. If you are running the ganglia gmond process on your hosts, then you SHOULD run the GangliaPlugIn in its listen-only mode.

com.bigdata.journal.jini.ha.HAJournal {

properties = new NV[] {

// ...

new NV(com.bigdata.journal.PlatformStatsPlugIn.Options.COLLECT_PLATFORM_STATISTICS,"true"), new NV(com.bigdata.journal.GangliaPlugIn.Options.GANGLIA_LISTEN,"true"), new NV(com.bigdata.journal.GangliaPlugIn.Options.GANGLIA_REPORT,"true"), // iff gmond is not running. new NV(com.bigdata.journal.GangliaPlugIn.Options.REPORT_DELAY,"10000"),

}