Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bulk operations in the native client #24

Closed
michaelklishin opened this issue Mar 29, 2013 · 19 comments
Closed

Support bulk operations in the native client #24

michaelklishin opened this issue Mar 29, 2013 · 19 comments

Comments

@michaelklishin
Copy link
Member

We need to support bulk operations in the native client with the same Clojure API we
have in the REST one.

@michaelklishin
Copy link
Member Author

Desired API

We should try hard to support the exact API our REST client has. It has two key benefits:

  • Switching between two clients will be transparent
  • The Bulk API is notoriously hard to get right in languages such as Clojure, that use core data structures (collections, maps) instead of designated classes.

The REST API already has a decent way of representing bulk requests in JSON, which maps ideally to Clojure
data structures. Lets just use that.

Implementation Details

This is for @dsabanin, who's expressed interest in implementing this feature.

Every operation in the native client has a few functions beyond what is exposed in the public API. Namely,
clojurewerkz.elastisch.native includes functions that mirror org.elasticsearch.client.Client

Specifically for this feature, we need a function that will delegate to Client#bulk.

Then there are conversion functions to bulk request and from bulk response in clojurewerkz.elastisch.native.conversion. Finally, there is a function (or several) that will be our public API for bulk requests. It glues everything together.

Since BulkRequest in the ElasticSearch Java client is actually a collection, we'll have to implement conversion
from Bulk API maps to specific request classes. Largely this is what the functions in clojurewerkz.elastisch.native.conversion already do but they don't take just a single map as inputs. One
good way around this is to add a polymorphic function that takes a map and builds the corresponding request
action from it, using the rest of the conversion functions as needed. Sounds like a job for a multimethod.

The public API should be in clojurewerkz.elastisch.native.bulk, mirroring clojurewerkz.elastisch.rest.bulk.

Because conversion functions are annoying to develop, I highly recommend doing it all in the REPL first, then
adding a few tests for the public API.

ElasticSearch Version

Elastisch 1.1's native client targets ElasticSearch 0.90.0-beta1. Because binary format used by the client and
ES nodes changes over time, using mismatching client/server versions will result in obscure I/O exceptions
(typically saying that a request couldn't be read fully).

@dsabanin
Copy link

Thanks for a really comprehensive description of the task. I'm going to give it a try this weekend.

@dsabanin
Copy link

dsabanin commented May 5, 2013

It looks like I won't get a chance to work on it. Our initial requirements changed and we moved in another direction, ended up not using native bulk API. Sorry about wasting your time on the instruction, although I'm sure some future implementor will find them very useful.

@michaelklishin
Copy link
Member Author

That's ok. I'd be happy to hear what you ended up doing as an alternative (if it involves Elastisch at all).

@dsabanin
Copy link

dsabanin commented May 6, 2013

The problem I was trying to solve is initial indexing of a big amount of data from the DB. This was a one time job so I decided to go with a C program fetching data from DB and generating a limited subset of JSON and then importing it to ES with curl through bulk API.

We are still using Elastisch for real-time indexing -- it's a great piece of software. Too bad I couldn't contribute back.

@mitchelkuijpers
Copy link
Contributor

Just wanted to let you know I started working on this, and I am making some good progress :) Will try to have something ready by the end of this weekend.

@michaelklishin
Copy link
Member Author

@mitchelkuijpers sounds good, thank you.

@mitchelkuijpers
Copy link
Contributor

I have a first version working i currently give this structure back after a bulk operation:

{:took 8,
 :has-failures? false,
 :items
   [{:index "people",
     :type "person",
     :id "AUuxP_sQUECUznwbZ-36",
     :version 1,
     :op-type "create",
     :failed? false}
    {:index "people",
     :type "person",
     :id "AUuxP_sQUECUznwbZ-37",
     :version 1,
     :op-type "create",
     :failed? false}]}

I case of failure the failed? property will be true and a failure-message property will be added. I notice the rest api gives properties back like :_index and :_type. Do you think I should add those @michaelklishin ?

@michaelklishin
Copy link
Member Author

@mitchelkuijpers yes, as aliases for :index and:type`, of course. We do this in other places. Better be compatible than sorry :)

@mitchelkuijpers
Copy link
Contributor

@michaelklishin Added, and also :_index and :_version.

The only thing that I think we should not implement is: bulk-with-index and bulk-with-index-and-type. This is kind of hairy to implement and this would mean I cannot reuse the ->delete-request and the index->request functions from conversion.clj. What are your thoughts on this?

Another idea could be to reuse the bulk-index and the bulk-delete functions from rest/bulk and just parse that structure when users call bulk or the other two functions.

@mitchelkuijpers
Copy link
Contributor

After thinking about it some more I am leaning towards the option of reusing the bulk-index and bulk-delete functions from rest/bulk this way you can just switch the native and rest bulk, bulk-with-index and bulk-with-index-and-type calls without changing anything :)

@michaelklishin
Copy link
Member Author

That's fine. I find those functions a bit odd in the REST API. 100% compatibility is not the goal. If we can keep it at 95%, that's already huge for our users.

@michaelklishin
Copy link
Member Author

@mitchelkuijpers so, native would delegate to REST? This should be made very clear in the docs because the two use different ports and some installations may be locked down to only one protocol at the firewall.

@mitchelkuijpers
Copy link
Contributor

@michaelklishin No Native would only re-use these functions from rest:

(def ^:private special-operation-keys
  [:_index :_type :_id :_retry_on_conflict :_routing :_percolate :_parent :_timestamp :_ttl])

(defn index-operation
  [doc]
  {"index" (select-keys doc special-operation-keys)})

(defn delete-operation
  [doc]
  {"delete" (select-keys doc special-operation-keys)})

(defn bulk-index
  "generates the content for a bulk insert operation"
  ([documents]
     (let [operations (map index-operation documents)
           documents  (map #(apply dissoc % special-operation-keys) documents)]
       (interleave operations documents))))

(defn bulk-delete
  "generates the content for a bulk delete operation"
  ([documents]
     (let [operations (map delete-operation documents)]
       operations)))

And then if you call bulk or the other two bulk functions (those are easy to implement this way) this will be transformed to either and IndexRequest or an DeleteRequest and we can simply call the native api :)

@michaelklishin
Copy link
Member Author

@mitchelkuijpers ah, perfect. Feel free to extract them into a separate internal namespace, e.g. elastisch.common.bulk.

@mitchelkuijpers
Copy link
Contributor

@michaelklishin 👍

@michaelklishin
Copy link
Member Author

@mitchelkuijpers any other way I can help you? I'd be very excited to see this feature added :)

@mitchelkuijpers
Copy link
Contributor

I'll try to have a pull request ready tomorrow, had a busy week ^^

michaelklishin added a commit that referenced this issue Mar 2, 2015
A first stab at Support bulk operations in the native client #24
@michaelklishin
Copy link
Member Author

Contributed by @mitchelkuijpers in #144.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants