Skip to content
Es-http-client is a java client library for interacting with Elasticsearch.
Branch: master
Clone or download
Latest commit 5d49dfb Jul 31, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src always consume the entity Jul 25, 2016
.gitignore initial commit Jan 14, 2016
.travis.yml add .travis.yml file Jan 26, 2016
LICENSE initial commit Jan 14, 2016
README.md Update README.md Jul 31, 2018
formatter.xml initial commit Jan 14, 2016
pom.xml [maven-release-plugin] prepare for next development iteration May 30, 2016

README.md

Build Status

This project is dead. I never followed through finishing the work for this because other priorities interfered. By now, Elasticsearch has their own HTTP client. You may be interested in another project I'm working on to add some Kotlin syntactic sugar around that client: es-kotlin-wrapper-client.

Maven

You can add the dependency by copying this bit to your pom file or the equivalent to your gradle file.

<dependency>
  <groupId>io.inbot</groupId>
  <artifactId>inbot-es-http-client</artifactId>
  <version>0.13</version>
</dependency>

I've marked several of the dependencies as optional to avoid locking you into specific things as much as possible. In general go with the versions we use, or newer or compatible. If stuff doesn't work with a newer version than we use, that is a bug and please report it.

Introduction

Presentation on inbot-es-http-client at the Search Berlin meetup on January 26th 2016.

This project represents around three years of work I did using Elasticsearch via its HTTP API. The problem with both the Elasticsearch HTTP and Java API is that they require quite a bit of boilerplate to implement common things such as performing CRUD operations on documents, iterating over documents in Elasticsearch, performing updates with consistency checks, bulk indexing or updating documents, etc. Elasticsearch gives you all the tools you need to perform these tasks but it requires a lot of boiler plate to use them that typically involves making http requests, dealing with various status codes you get back and picking apart json responses to figure out what actually happened and what needs to happen next.

Three years ago when I started using Elasticsearch there were a handful of client libraries for platforms such as php or ruby with varying levels of maturity. The more successful ones tended to closely follow the internal java APIs that Elasticsearch exposes via HTTP. Most Java projects instead simply embedded an Elasticsearch node in their applications. The problem with all this approach is that it does not significantly reduce the amount of boiler plate you need to write relative to just using the HTTP API.

So I started adapting code I had already for another storage engine to talk to Elasticsearch instead. In my mind the problem was mostly similar: I wanted to do Json document CRUD. I already had a library for manipulating Json documents, jsonj, and writing a bit of code around Apache Httpclient to talk to Elasticsearch was easy.

I soon ended up with a library that was easy to use and did the job. That was three years ago. In the years since, I've continuously added to the codebase to add new features as I needed them. Recently I decided to finally do the work to open source this code base and document how it works so others can benefit as well.

Alternatives

When I started out using Elasticsearch there weren't that many http clients for it. But these days there are a few nice alternatives to this client nowadays.

  • Jest. This appears to be the most popular.
  • Elasticsearch Helper. Jörg is a long time Elasticsearch user and contributor to the community and recently launched his own client.
  • Elasticsearch 5.0 may apparently come with a minimalistic http client as well.

The DRY principle by example

The guiding principle for this project is the DRY principle (Do not Repeat Yourself). I've been using this library a lot in the past three years and everytime I felt I was repeating myself, I went in and fixed whatever it was that was causing me to copy paste bits and pieces of boilerplate code. In my mind, most interactions with Elasticsearch should be one liners, and not complicated bits of if else logic surrounded by try catch statements, etc. Whenever that is not the case, I fix it.

Java has a well deserved but unnecessary reputation for boiler plate. It's unnecessary because you can actually do quite a lot with Java these days. While somewhat tedious in its verbosity with e.g. typing or exception handling, you can actually do a lot of things to take that pain away and recent additions to the language such as lambdas, varargs, interfaces with default methods, try with resources, generics, etc. are severely underused in a lot of common projects.

Here's an example of how you can do basic crud on an index with this library.

// typically you'd inject this stuff via a constructor
HttpEsAPIClient httpEsAPIClient=...;
CrudOperationsFactory crudOperationsFactory=...;

ElasticSearchIndex index = ElasticSearchIndex.create("myindex", 1, "mapping-v1.json");
ElasticSearchType type = ElasticSearchType.create(index, "test");

// creates an index and sets up an alias
httpEsAPIClient.migrateIndex(index);

CrudOperations myDao = crudOperationsFactory.builder(type)
   .retryUpdates(2)
   .enableRedisCache(false,10, "test")
   .enableInMemoryCache(10000, 10)
   .dao();

// now lets do some crud
JsonObject object=object(field("message","hello world"));

JsonObject created=myDao.create(object);
String id=created.getString("id");
JsonObject updated=myDao.update(id, o->{
  o.put("message","Hello World!");
  return o;
});
JsonObject retrieved=myDao.get(id);
myDao.delete(id);

Consider this bit of code that accomplishes the complex task of updating a potentially huge amount of documents in an existing index where some of the documents may be updated while you are updating them. It's a common use case and a good example where Elasticsearch provides APIs that require a lot of boiler plate. If you do this wrong, you lose data.

// this is how you quickly update everything in your index without overwriting anything important
try(BulkIndexingOperations bulkIndexer = dao.bulkIndexer()) {
    for(JsonObject hit: dao.iterableSearch(query(matchAll()), 100, 10, true)) {
        // bulk update and falls back to retry update via status handler
        bulkIndexer.update(hit.getString("_id"), hit.getString("_version"), null,
hit.getObject("_source"), o -> {o.put("modified",true);return o;});
    }
}

The library takes care of a lot of things for you here. First, it encapsulates the interaction with documents in your index using a DAO. The DAO provides a lot of functionality related to document CRUD, searching, bulk indexing, etc. So, we start with asking the DAO for our index for a bulk indexer. Bulk indexing is the best way to manipulate large amounts of documents in Elasticsearch. Instead of manipulating documents one at the time you provide a list of bulk operations which then get processed by Elasticsearch. The API for this is complex and the BulkIndexer takes that pain away, The BulkIndexer class can be used with Java's try with resources construct which ensures we can flush any remaining bulk actions to Elasticsearch when we are done. So the first oneliner sets all of that up for you and after that you just call index(..), update(..) and delete.

The next line iterates over all the documents in the index. We do that by simply looping over an iterableSearch with a query that simply returns every document in the index. This returns Iterable<JsonObject> wrapped up as a SearchResponse that we can iterate over with a for loop. This abstracts the tedious job of setting up a scrolling search with Elasticsearch and then requesting page after page of result using a scroll id.

Finally for each search hit, we take the source document and apply a lambda function to update it using the bulk indexer. We use a lambda function here because the update may fail because of concurrent writes to our document and Elasticsearch will tell the bulk indexer that our version of the document is obsolete. In that case it automatically fetches the latest version of the document and re-applies the lambda function. This is done via a default callback that is pre-configured on the bulk indexer that also takes care of logging and keeping track of successful operations.

Minus the comments, that's three lines of code. It doesn't get much simpler than that in Java. In Inbot, most data migrations are similarly 'complex': fetch stuff and modify stuff. There is not and should not be more to it. I've left out the initialization of the DAO since you only do that once per application typically. See this slide on how to set that up.

As outlined in the slides, we also have a concurrent bulk indexer that can use multiple threads. Likewise, the APIs are compatible with the Stream API facilities in Java 8 so you can do parallel processing of search results as well as use e.g. map, reduce, and filter lambda functions.

Features

  • Modern Java API that makes the most of new Java language features to reduce the amount of boilerplate
  • JsonJRestClient http rest client (with get/put/post/delete support). Takes and returns jsonj's JsonObject instances. This client is mainly used as our 'lowest' level of abstraction.
  • EndpointProvider provides a configurable lookup mechanism to control which of your cluster's nodes gets used from your client. A simple round robin implementation with tunable endpoint verification is provided. But you could easily plugin e.g. Amazon EC2 based lookups, zookeeper lookups, or whatever service discovery mechanism you use.
  • Rich EsApiClient that uses the JsonJRestClient. Loads of convenient methods to deal with document CRUD, searches, etc.
  • CrudOperations and ParentChildCrudOperations to provide a convenient DAO abstractions for index and type combinations with or without parent child relations. The API for this makes document CRUD a breeze and facilitates searches and bulk indexing as well.
  • Redis and Guava caching DAO wrappers that you can configure in the CrudOperationsBuilder.
  • Easy document updates with automatic version checking and conflict handling that uses lambda update functions. Works for bulk updates as well.
  • BulkIndexer with support for inserts/deletes/updates, parallelism, error handling, a call back API, and version conflict handling.
  • Support for paging and scrolling searches that plugs right into Java's new Streaming APIs.
  • Simple migration helper that you can use to manage schema changes. Note. this currently should not be considered safe in any environment where documents get updated during the migration.
  • JsonJ based QueryBuilder that provides a minimal DSL for constructing queries. While not complete in terms of API coverage, it is trivially extended to support whatever complex Json structures you need.

For more details, look at our tests and code for an idea on how to use it.

State of development

  • We have been using this in Inbot for several years, it serves our needs and performs & scales well with indices that have millions of documents. It's stable and does what I need it to do in a way that allows me to focus on business logic.
  • We know it works because our internal integration tests use this a lot but test coverage inside this recently OSSed project is obviously less than ideal. This needs to be remedied. We welcome pullrequests for this.
  • It does not currently support the full ES API but this is by design easily addressed with typically only a few lines of code. I tend to add features on a need to have basis. We welcome pull requests for this.
  • This is a pre 1.0 API, we reserve the right to change it.

License

See LICENSE.

The license is the MIT license, a.k.a. the expat license. The rationale for choosing this license is that I want to maximize your freedom to do whatever you want with the code while limiting my liability for the damage this code could potentially do in your hands. I do appreciate attribution but not enough to require it in the license (beyond the obligatory copyright notice).

Contributing

We welcome pullrequests but encourage people to reach out before making major changes. In general, stick to the code conventions (i.e. maven formatter plugin should not create diffs), don't break compatibility (we use this stuff in production), add tests for new stuff, etc.

Changes

  • 0.13 API cleanup, fix bulk update to work reliably and add more tests for this.
  • 0.12 Fixed several API and internal issues with BulkIndexer.
  • 0.11 Refactoring and add Agg and AggBucket classes to support picking apart aggregation results
  • 0.10 cleanup, new functinoality, refactoring, tests added.
  • 0.9 initial import of classes from inbot code base
You can’t perform that action at this time.