Skip to content
Gezim Sejdiu edited this page Jan 21, 2017 · 11 revisions
Website https://www.elastic.co/
Supported versions 2.3
Current responsible(s) Mohamed Nadjib Mami @ Uni.Bonn -- mami@cs.uni-bonn.de
Docker image(s) bde2020/elasticsearch:latest
More info https://www.elastic.co/products/elasticsearch

Short description

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. Features include:

  • Distributed and Highly Available Search Engine.
    • Each index is fully sharded with a configurable number of shards.
    • Each shard can have one or more replicas.
    • Read / Search operations performed on any of the replica shards.
  • Multi Tenant with Multi Types.
    • Support for more than one index.
    • Support for more than one type per index.
    • Index level configuration (number of shards, index storage, ...).
  • Various set of APIs
    • HTTP RESTful API
    • Native Java API.
    • All APIs perform automatic node operation rerouting.
  • Document oriented
    • No need for upfront schema definition.
    • Schema can be defined per type for customization of the indexing process.
  • Reliable, Asynchronous Write Behind for long term persistency.
  • (Near) Real Time Search.
  • Built on top of Lucene
    • Each shard is a fully functional Lucene index
    • All the power of Lucene easily exposed through simple configuration / plugins.
  • Per operation consistency
    • Single document level operations are atomic, consistent, isolated and durable.
  • Open Source under the Apache License, version 2 ("ALv2")

(From: https://github.com/elastic/elasticsearch)

Example usage

Using Docker Compose

Add the following services to your docker-compose.yml to integrate an Elasticsearch instance in your BDE pipeline:

elasticsearch:
  image: elasticsearch:2.3
  command: elasticsearch -Des.network.host=0.0.0.0
  ports:
    - "9200:9200"
    - "9300:9300"
elasticsearch-mapping-init:
  environment:
    - file_url=https://raw.githubusercontent.com/big-data-europe/pilot-sc4-flink-kafka-consumer/master/elasticsearch_fcd_mapping.json
    - index_name=thessaloniki
    - mappings_name=floating-cars
  build:
    context: .
  links:
    - elasticsearch

In addition to Elasticsearch version (for example, 2.3 used above), set the values of the following variables (see environment above):

  • file_url: the link to the JSON file containing the mappings definition (currently, the file must exist online, so the expected value should look like: http(s)://example.com/path/to/file.json).
  • index_name: give your index a name
  • mappings_name: give your mappings a name

Running the image

Simply run the following command (of course, Docker and Docker Compose are assumed being installed a priori):

sudo docker-compose up -d

In order to verify your installation is properly working, submit the following HTTP request:

http://localhost:9200

If it returns a JSON object, something starting with:

  {  
    "name" : "Allison Blaire",
    ...

... then your Elasticsearch instance is up and running.

Next, to check if your mappings have been successfully received and validated (syntactically), submit the following HTTP request:

http://localhost:9200/{index_name}/_mapping/{mappings_name}

... replacing {index_name} and {mappings_name} with the values set previously in the docker-compose.yml file.

If it returns a JSON object, something starting with:

{"{index_name}":{"mappings":{"{mappings_name}-cars":{"...

... then you are all set.

Scaling

Elasticsearch is built to scale. Each index is broken down into shards, and each shard can have one or more replica. By default, an index is created with 5 shards and 1 replica per shard (5/1). There are many topologies that can be used, including 1/10 (improve search performance), or 20/1 (improve indexing performance, with search executed in a map reduce fashion across shards).

Clone this wiki locally