This adds a new type of namenode: observer. A observer is like a standby
NN (in fact they share most of the code), EXCEPT it doesn't participate
in either NN failover (i.e., it is not part of the HA), or check pointing.
A observer can be specified through configuration. First, it needs to be
added into the config: dfs.ha.namenodes, just like a normal namenode,
together with other configs such as dfs.namenode.rpc-address,
dfs.namenode.http-address, etc. Second, it needs to be specified in a new
config: dfs.ha.observer.namenodes. This differentiate it from the ordinary
A observer can be used to serve read-only requests from HDFS client,
when the following two conditions are satisfied:
1. the config dfs.client.failover.proxy.provider.<nameservice> is
set to org.apache.hadoop.hdfs.server.namenode.ha.StaleReadProxyProvider.
2. the config dfs.client.enable.stale-read is set to true
This also changes the way edit logs are loaded from the standby/observer NNs.
Instead of loading them all at once, the new implementation loads them
one batch at a time (default batch size is 10K edits) through multiple
iterations, while waiting for a short amount of time in between the
iterations (default waiting time is 100ms). This is to make sure the global
lock won't be held too long during loading edits. Otherwise, the RPC
processing time would suffer.
This patch does not include a mechanism for clients to specify the bound of
the staleness using journal transction ID: excluding this allows us to
deploy the observer more easily. In more specific, the deployment involves:
1. restarting all datanodes with the updated configs. No binary change on
datanodes is required.
2. bootstraping and starting the observer namenode, with the updated
configs. Existing namenodes do not need to change.
1. allow client to set a bound on staleness in observer in terms of time
(e.g., 2min). If for some reason the lagging in edit tailing is larger
than the bound, the client-side proxy provider will fail over all the
RPCs to the active namenode.
2. use journal transaction ID to ensure bound on staleness. This can be
embedded in the RPC header.
3. allow new standby/observer to be deployed without datanode restart.