Update API: Detect noop updates when using doc #6862

nik9000 · 2014-07-14T19:03:49Z

This allows you to request that Elasticsearch figure out if an update using a document is a noop and then skip it. This is useful for clients where it is difficult to figure out if the request is really a noop.

Also adds two tolerance measures that can be used to deem a request a noop even if it isn't quite. The idea being that it isn't that important to know that a page has 100 vs 101 links - so you may as well wait until the field is out of date by more then, say, 10%.

Close #6822

This should help prevent spurious updates that just cause extra writing and cache invalidation for no real reason. Close elastic#6822

Just numeric tolerances - percentages and absolute numbers. Other things are certainly possible from here.

nik9000 · 2014-07-14T19:04:43Z

I need to add docs and hit it over REST a few more times to make sure it works, but so far so good.

nik9000 · 2014-07-14T19:05:47Z

Might also be good to see what the performance cost is but I don't imagine its huge. Next to 0 if you disable it.

nik9000 · 2014-07-14T20:17:43Z

Tried a quick and dirty performance test

echo '{
    "doc": {"foo": 2},
    "doc_as_upsert": true,
    "detect_noop": {
        "foo": "10%"
    }
}' > /tmp/test2
ab -n 1000 -c 20 -p /tmp/test2 http://localhost:9200/test/test/1/_update

Request dropped from 11ms to 8ms. Cut but not a big deal - I'm more interested in creating less garbage that has to be merged out later.

jpountz · 2014-07-15T13:02:36Z

I understand the point of not scheduling future merges if they are not necessary, but I'm not very happy with the part about tolerances. Eg if you have one update that increments a counter by 10 it could be rejected while two updates that would increment the same counter by 5 would be accepted. I think that can be confusing and should rather be dealt with from client side.

nik9000 · 2014-07-15T13:09:02Z

I understand the point of not scheduling future merges if they are not necessary, but I'm not very happy with the part about tolerances. Eg if you have one update that increments a counter by 10 it could be rejected while two updates that would increment the same counter by 5 would be accepted. I think that can be confusing and should rather be dealt with from client side.

I could certainly document that adding tolerances to counters is unlikely to be a good idea. In my use case I don't have counters but instead recount on the fly. In that case the tolerances make this more interesting by allowing me to trade off a tiny bit of accuracy for performance.

jpountz · 2014-07-16T09:28:03Z

I'd still rather have it implemented on client side, where you have all the flexibility to define what you want to use as a distance between documents to know whether an update is worth it. I'm worried about making the API complicated for something that doesn't bring much value.

nik9000 · 2014-07-16T21:22:29Z

Is it still worth having the noop detection without the tolerances? I'm happy to roll them back. I can't use the noop detection without the tolerances but if its worth having its worth having.

I imagine I'll give scripted updates another shot in 1.3 when I can use groovy. They'll probably be much less brittle then MVEL was.

jpountz · 2014-07-16T21:56:27Z

Detecting no-ops without tolerances sounds ok to me, but I'm wondering how common it is to submit a no-op update.

I was thinking about making it an automatic optimization rather than an option, but it would sometimes not work as expected. For example, it is allowed to update mappings to add a new multi field. So even an update that would re-submit the same document could result in a different inverted index. So if we decide to add such detection, I think we would need to make it an opt-in to avoid surprises.

My current feeling is that it has a high cost (in terms of development/maintenance) compared to the potential speedup but if no-op updates prove to be common I could change my opinion. @clintongormley What do you think?

clintongormley · 2014-07-17T09:21:23Z

@jpountz i agree with you about not liking the tolerances, although what would need to be done to make this work in the application is to retrieve the document, check the tolerances, and then decide whether to reindex or not.

I do hear of people wanting noop updates - I don't think it is a corner case, so I'd be inclined to accept the PR without tolerances, but keeping noop as an option for the reasons you cite. Another reason to make it opt-in: the user updates a synonyms file.

This reverts commit eab5966.

nik9000 · 2014-07-17T13:41:29Z

Removed tolerances. I believe one side effect of the way the PR is implemented now is that updates that are literally empty objects ({}) are considered noops even with detection off. I imagine I can fix that pretty easily if its a problem.

jpountz · 2014-07-17T22:20:19Z

src/main/java/org/elasticsearch/common/xcontent/XContentHelper.java

+            Object old = source.get(changesEntry.getKey());
+            if (old instanceof Map && changesEntry.getValue() instanceof Map) {
+                // recursive merge maps
+                modified = update((Map<String, Object>) source.get(changesEntry.getKey()),


Should it be |= instead of =? Otherwise the last one would always win?

Yup - I'll fix it.

jpountz · 2014-07-17T22:39:24Z

Indeed I think that the behaviour when detect_noop is false should be the same as today, so to always perform an update and increase the version?

nik9000 · 2014-07-17T22:50:53Z

Indeed I think that the behaviour when detect_noop is false should be the same as today, so to always perform an update and increase the version?

Can do.

1. |= for recursive merge 2. Do not consider empty updates noop without detect_noops

nik9000 · 2014-07-18T15:59:50Z

Done with updates from this round of comments. Thanks for reviewing!

jpountz · 2014-07-22T07:19:38Z

src/main/java/org/elasticsearch/index/indexing/ShardIndexingService.java

@@ -218,6 +218,11 @@ public void postDeleteByQuery(Engine.DeleteByQuery deleteByQuery) {
        }
    }

+    public void noopUpdate(String type) {
+        totalStats.noopUpdates.inc();
+        typeStats(type).noopUpdates.dec();


I think you mean .inc ?

jpountz · 2014-07-22T12:57:32Z

Thanks Nik, I just merged this change. FYI I did a minor change while merging to replace dec with inc on the type stats in ShardIndexingService.

nik9000 · 2014-07-22T13:09:32Z

Thanks! I saw the comment and, yeah, it agree it was supposed to be inc.

Thanks again!

Nik

On Tue, Jul 22, 2014 at 8:57 AM, Adrien Grand notifications@github.com
wrote:

Thanks Nik, I just merged this change. FYI I did a minor change while
merging to replace dec with inc on the type stats in ShardIndexingService.

—
Reply to this email directly or view it on GitHub
#6862 (comment)
.

nik9000 added 3 commits July 14, 2014 12:43

Detect noop updates sent with doc_as_upsert

d74d0b8

This should help prevent spurious updates that just cause extra writing and cache invalidation for no real reason. Close elastic#6822

Add noop update count to the stats

b8d87d2

Add tolerance to noop detection

eab5966

Just numeric tolerances - percentages and absolute numbers. Other things are certainly possible from here.

nik9000 changed the title ~~Detect noop updates from doc_as_upsert~~ Detect noop updates when using doc in update API Jul 14, 2014

Docs

0ea5072

clintongormley added review labels Jul 15, 2014

nik9000 added 2 commits July 17, 2014 09:37

Revert "Add tolerance to noop detection"

c845eaa

This reverts commit eab5966.

Remove tolerance from noop detection docs

f3022ac

jpountz reviewed Jul 17, 2014
View reviewed changes

Updates from code review

16ee1d1

1. |= for recursive merge 2. Do not consider empty updates noop without detect_noops

jpountz reviewed Jul 22, 2014
View reviewed changes

jpountz closed this Jul 22, 2014

jpountz removed the enhancement label Jul 22, 2014

jpountz removed the review label Jul 22, 2014

clintongormley changed the title ~~Detect noop updates when using doc in update API~~ Update API: Detect noop updates when using doc Sep 10, 2014

clintongormley added v1.4.0.Beta1 v2.0.0-beta1 >feature release highlight labels Sep 10, 2014

clintongormley added the :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Jun 6, 2015

Update API: Detect noop updates when using doc #6862

Update API: Detect noop updates when using doc #6862

Uh oh!

Conversation

nik9000 commented Jul 14, 2014

Uh oh!

nik9000 commented Jul 14, 2014

Uh oh!

nik9000 commented Jul 14, 2014

Uh oh!

nik9000 commented Jul 14, 2014

Uh oh!

jpountz commented Jul 15, 2014

Uh oh!

nik9000 commented Jul 15, 2014

Uh oh!

jpountz commented Jul 16, 2014

Uh oh!

nik9000 commented Jul 16, 2014

Uh oh!

jpountz commented Jul 16, 2014

Uh oh!

clintongormley commented Jul 17, 2014

Uh oh!

nik9000 commented Jul 17, 2014

Uh oh!

jpountz Jul 17, 2014

Choose a reason for hiding this comment

Uh oh!

nik9000 Jul 17, 2014

Choose a reason for hiding this comment

Uh oh!

jpountz commented Jul 17, 2014

Uh oh!

nik9000 commented Jul 17, 2014

Uh oh!

nik9000 commented Jul 18, 2014

Uh oh!

jpountz Jul 22, 2014

Choose a reason for hiding this comment

Uh oh!

jpountz commented Jul 22, 2014

Uh oh!

nik9000 commented Jul 22, 2014

Uh oh!

Uh oh!