Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
relates #321
- Loading branch information
Showing
17 changed files
with
458 additions
and
44 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
include::features.adoc[] | ||
|
||
include::requirements.adoc[] | ||
|
||
include::download.adoc[] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
[[ey-install]] | ||
== Installation | ||
|
||
{ey} binaries can be obtained either by downloading them from the http://elasticsearch.org[elasticsearch.org] site as a ZIP (containing project jars, sources and documentation) or by using any http://maven.apache.org/[Maven]-compatible tool with the following dependency: | ||
|
||
[source,xml] | ||
---- | ||
<dependency> | ||
<groupId>org.elasticsearch</groupId> | ||
<artifactId>elasticsearch-yarn</artifactId> | ||
<version>2.1.0.Beta3</version> | ||
</dependency> | ||
---- | ||
|
||
The jar above contains {ey} and does not require any other dependencies at runtime; in other words it can be used as is. | ||
|
||
[[ey-download-dev]] | ||
=== Development Builds | ||
|
||
Development (or nightly or snapshots) builds are published daily at 'sonatype-oss' repository (see below). Make sure to use snapshot versioning: | ||
|
||
[source,xml] | ||
---- | ||
<dependency> | ||
<groupId>org.elasticsearch</groupId> | ||
<artifactId>elasticsearch-yarn</artifactId> | ||
<version>2.1.0.BUILD-SNAPSHOT</version> <1> | ||
</dependency> | ||
---- | ||
|
||
<1> notice the 'BUILD-SNAPSHOT' suffix indicating a development build | ||
|
||
but also enable the dedicated snapshots repository : | ||
|
||
[source,xml] | ||
---- | ||
<repositories> | ||
<repository> | ||
<id>sonatype-oss</id> | ||
<url>http://oss.sonatype.org/content/repositories/snapshots</url> <1> | ||
<snapshots><enabled>true</enabled></snapshots> <2> | ||
</repository> | ||
</repositories> | ||
---- | ||
|
||
<1> add snapshot repository | ||
<2> enable 'snapshots' capability on the repository otherwise these will not be found by Maven |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
[[es-yarn]] | ||
= Elasticsearch on YARN | ||
|
||
[[es-yarn-intro]] | ||
[partintro] | ||
-- | ||
Distributed as part of {ehtm}, {ey} is a separate, stand-alone, self-container CLI (command-line interface) that allows {es} to run (and thus be managed) within a http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html[YARN] environment. | ||
|
||
In other words, {es} can use YARN to allocate its resources and be started and stopped, on said resources, through YARN infrastructure. | ||
-- | ||
|
||
include::requirements.adoc[] | ||
|
||
include::download.adoc[] | ||
|
||
include::setup.adoc[] | ||
|
||
include::usage.adoc[] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
[[yarn-requirements]] | ||
== Requirements | ||
|
||
Before using {ey}, please pay attention to the requirements below - ignoring them can lead to abnormal behavior, error and ultimately a poor experience and data loss. | ||
|
||
NOTE: make sure to verify *all* nodes in a cluster when checking the version of a certain artifact. | ||
|
||
[[ey-requirements-yarn]] | ||
=== YARN | ||
|
||
A YARN environment running on Hadoop 2.4 (or higher) is recommended. This can be easily checked by verifying the Hadoop version installed on the target nodes: | ||
|
||
[source,bash] | ||
---- | ||
$ hadoop version | ||
Hadoop 2.4.1 | ||
Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1604318 | ||
Compiled by jenkins on 2014-06-21T05:43Z | ||
Compiled with protoc 2.5.0 | ||
From source with checksum bb7ac0a3c73dc131f4844b873c74b630 | ||
This command was run using /opt/share/hadoop/common/hadoop-common-2.4.1.jar | ||
---- | ||
|
||
For Hadoop distros, check the base core YARN/Hadoop version and make sure it is 2.4 compatible. | ||
|
||
As a guide, the table below lists the Hadoop-based distributions that include YARN, against with this version has been tested against at various points in time: | ||
|
||
|=== | ||
| Distribution | Release | ||
|
||
| Apache Hadoop | 2.5.x | ||
| Apache Hadoop | 2.4.x | ||
|
||
| Amazon EMR | 3.3.x | ||
| Amazon EMR | 3.2.x | ||
| Amazon EMR | 3.1.x | ||
|
||
| Cloudera CDH | 5.2.x | ||
| Cloudera CDH | 5.1.x | ||
| Cloudera CDH | 5.0.x | ||
|
||
| Hortonworks HDP | 2.2.x | ||
| Hortonworks HDP | 2.1.x | ||
|
||
| MapR | 4.0.x | ||
|=== | ||
|
||
|
||
[[ey-requirements-es]] | ||
=== {es} | ||
|
||
{ey} uses the same requirements on {es} as {eh} - in other words, using the latest stable {es} is highly recommended for both stability and performance reasons. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
[[ey-setup]] | ||
== Understanding the YARN environment | ||
|
||
[quote, Wikipedia] | ||
____ | ||
http://hadoop.apache.org/[YARN] stands for "Yet Another Resource Negotiator" and was added later as part of Hadoop 2.0. YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines. This also streamlines MapReduce to do what it does best, process data. With YARN, you can now run multiple applications in Hadoop, all sharing a common resource management. As of September, 2014, YARN manages only CPU (number of cores) and memory [..] | ||
____ | ||
|
||
In its current incarnation, {ey} interacts with the YARN APIs in order to start and stop Elasticsearch nodes on YARN infrastructure. In YARN terminology, {ey} has several components: | ||
|
||
Client Application:: | ||
The entity that bootstraps the entire process and controls the life-cycle of the cluster based on user feedback. This is the CLI (Command-Line Interface) that the user interacts with. | ||
|
||
Application Manager:: | ||
Based on the user configuration, the dedicated +ApplicationManager+ negotiates with YARN the number of {es} nodes to be created as YARN containers and their capabilities (memory and CPU). | ||
It oversees the cluster life-cycle and handles the container allocation. | ||
|
||
Node/Container setup:: | ||
Handles the actual {es} invocation for each allocated +Container+. | ||
|
||
As YARN is all about cluster resource management, it is important to properly configure YARN and {es} accordingly since over or under-allocating resources can lead to undesirable consequences. There are plenty of resources | ||
available on how to configure and plan your YARN cluster; the section below will touch on the core components and their impact on {es}. | ||
|
||
=== CPU | ||
|
||
As of Hadoop 2.4, YARN can restrict the amount of CPU allocated to a container: each has a number of so-called +vcores+ (virtual cores) with a minimum of 1. What this translates in practice depends highly on your | ||
underlying hardware configuration and cluster configuration; a good approximation is to map each +vcore+ to an actual core on the CPU; just like with native hardware, expect the core to be shared across the rest of the | ||
applications so depending on system load, the amount of actual CPU available can be considerably lower. Thus, it is recommended to allocate multiple +vcores+ to {es} - a good start number being the number of actual cores | ||
your CPU supports. | ||
|
||
=== Memory | ||
|
||
Simplifying things a lot, YARN requires containers to specify the amount of memory they need within a certain band - specifying more or less memory results in the container allocation request being denied. By default, YARN | ||
enforces a minimum limit of 1 GB (1024 MB) and a maximum of 8 GB (8192 MB). While {es} can work with this amount of RAM, you typically want to increase this amount for performance reasons. | ||
Out of the box, {ey} requests _only_ 2 GB of memory per container so that users can easily try it out even within a testing YARN environment (such as pseudo-distributed or VMs); significantly increase this amount once you get | ||
beyond the YARN `Hello World' stage. | ||
|
||
=== Storage | ||
|
||
Each container inside YARN is responsible for saving its state and storage between container restarts. In general, there are two main strategies one can take: | ||
|
||
Use a globally accessible file-system (like HDFS):: With a storage accessible by all Hadoop nodes, each container can use it as its backing store. For example one can use HDFS to save the data in one container and read it from another. | ||
With {es}, one can simply mount HDFS as a https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html[NFS gateway] and simply point each {es} node to a folder on it. | ||
Note however that performance is going to suffer significantly - HDFS is design as big, network-based storage thus each call is likely to have a significant delay (due to network latency). | ||
|
||
Use the container node local storage:: Each container can currently access its local storage - with proper configuration this can be kept outside the disposable container folder thus allowing the data to _live_ between restarts. | ||
This is the recommended approach as it offers the best performance and due to {es} itself, redundancy as well (through replicas). | ||
|
||
Note that the approaches above require additional configuration to either {es} or your YARN cluster. There are plans to simplify this procedure in the future. | ||
|
||
IMPORTANT: If no storage is configured, out of the box {es} will use its container storage which means when the container is disposed, so is its data. In other words, between restarts any existing data is _destroyed_. | ||
|
||
=== Node affinity | ||
|
||
Currently, {ey} does not provide any option for tying {es} nodes to specify YARN nodes however this will be addressed in the future. In practice this means that unless YARN is specifically configured, there are no guarantees on its topology between restarts, that is on what machines {es} nodes will run each time. |
Oops, something went wrong.