layout | title | nickname | group | priority |
---|---|---|---|---|
global |
Tiered Locality |
Tiered Locality |
Features |
1 |
- Table of Contents {:toc}
In Alluxio, each entity (masters, workers, clients) has a Tiered Identity. A tiered
identity is an address tuple such as (node=..., rack=...)
. Each pair in the tuple is called
a locality tier. The locality tiers are ordered from most specific to least specific, so if
two entities have the same node name, they are assumed to also have the same rack name. Alluxio
uses tiered identities to optimize for locality. For example, when the client wants to read a
file from the UFS, it will prefer to read through an Alluxio worker on the same node.
If there is no local worker in the first tier (node), the next tier (rack) will be checked
to prefer rack-local data transfer. If no workers share a node or rack with the client, an
arbitrary worker will be chosen.
If the user does nothing to provide tiered identity info, each entity will perform a localhost lookup to set its node-level identity info. If other locality tiers are left unset, they will not be used to inform locality decisions. To set the value for a locality tier, set the configuration property
alluxio.locality.[tiername]=...
See the Configuration-Settings page for details on how to set configuration properties.
It is also possible to configure tiered identity info via script. By default Alluxio looks
for a script at ${ALLUXIO_HOME}/conf/tiered_identity.sh
. You can override this location
by setting
alluxio.locality.script=/path/to/script
The script should be executable and output a comma-separated list of tierName=...
pairs. Here is an example script for reference:
#!/bin/bash
echo "host=$(hostname),rack=/rack1"
If the no script exists at alluxio.locality.script
, the property will be ignored. If
the script returns a nonzero exit code or returns malformed output, an error will be
raised in Alluxio.
There are multiple ways to configure node locality. This is the order of precedence, from highest priority to lowest priority:
- Set
alluxio.locality.node
- Set
node=...
in the output of the script configured byalluxio.locality.script
- Set
alluxio.worker.hostname
on a worker,alluxio.master.hostname
on a master, oralluxio.user.hostname
on a client. - If none of the above are configured, node locality is determined by localhost lookup.
- When choosing a worker to read through during UFS reads.
- When choosing a worker to read from when multiple Alluxio workers hold a block.
- If using the
LocalFirstPolicy
orLocalFirstAvoidEvictionPolicy
, tiered locality is used to choose which worker to write to when writing data to Alluxio.
By default, Alluxio has two locality tiers: node
and rack
. Users may customize the
set of locality tiers to take advantage of more advanced topologies. To change the set
of tiers available, set alluxio.locality.order
. The order should go from most specific
to least specific. For example, to add availability zone locality to a cluster, set
alluxio.locality.order=node,rack,availability_zone
Note that this configuration must be set for all entities, including Alluxio clients.
Now to set the availability zone for each entity, either set the
alluxio.locality.availability_zone
property key, or use a locality script and include
availability_zone=...
in the output.