New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update API: Detect noop updates when using doc #6862
Conversation
This should help prevent spurious updates that just cause extra writing and cache invalidation for no real reason. Close elastic#6822
Just numeric tolerances - percentages and absolute numbers. Other things are certainly possible from here.
I need to add docs and hit it over REST a few more times to make sure it works, but so far so good. |
Might also be good to see what the performance cost is but I don't imagine its huge. Next to 0 if you disable it. |
Tried a quick and dirty performance test echo '{
"doc": {"foo": 2},
"doc_as_upsert": true,
"detect_noop": {
"foo": "10%"
}
}' > /tmp/test2
ab -n 1000 -c 20 -p /tmp/test2 http://localhost:9200/test/test/1/_update Request dropped from 11ms to 8ms. Cut but not a big deal - I'm more interested in creating less garbage that has to be merged out later. |
I understand the point of not scheduling future merges if they are not necessary, but I'm not very happy with the part about tolerances. Eg if you have one update that increments a counter by 10 it could be rejected while two updates that would increment the same counter by 5 would be accepted. I think that can be confusing and should rather be dealt with from client side. |
I could certainly document that adding tolerances to counters is unlikely to be a good idea. In my use case I don't have counters but instead recount on the fly. In that case the tolerances make this more interesting by allowing me to trade off a tiny bit of accuracy for performance. |
I'd still rather have it implemented on client side, where you have all the flexibility to define what you want to use as a distance between documents to know whether an update is worth it. I'm worried about making the API complicated for something that doesn't bring much value. |
Is it still worth having the noop detection without the tolerances? I'm happy to roll them back. I can't use the noop detection without the tolerances but if its worth having its worth having. I imagine I'll give scripted updates another shot in 1.3 when I can use groovy. They'll probably be much less brittle then MVEL was. |
Detecting no-ops without tolerances sounds ok to me, but I'm wondering how common it is to submit a no-op update. I was thinking about making it an automatic optimization rather than an option, but it would sometimes not work as expected. For example, it is allowed to update mappings to add a new multi field. So even an update that would re-submit the same document could result in a different inverted index. So if we decide to add such detection, I think we would need to make it an opt-in to avoid surprises. My current feeling is that it has a high cost (in terms of development/maintenance) compared to the potential speedup but if no-op updates prove to be common I could change my opinion. @clintongormley What do you think? |
@jpountz i agree with you about not liking the tolerances, although what would need to be done to make this work in the application is to retrieve the document, check the tolerances, and then decide whether to reindex or not. I do hear of people wanting noop updates - I don't think it is a corner case, so I'd be inclined to accept the PR without tolerances, but keeping noop as an option for the reasons you cite. Another reason to make it opt-in: the user updates a synonyms file. |
Removed tolerances. I believe one side effect of the way the PR is implemented now is that updates that are literally empty objects ( |
Object old = source.get(changesEntry.getKey()); | ||
if (old instanceof Map && changesEntry.getValue() instanceof Map) { | ||
// recursive merge maps | ||
modified = update((Map<String, Object>) source.get(changesEntry.getKey()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be |=
instead of =
? Otherwise the last one would always win?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - I'll fix it.
Indeed I think that the behaviour when |
Can do. |
1. |= for recursive merge 2. Do not consider empty updates noop without detect_noops
Done with updates from this round of comments. Thanks for reviewing! |
@@ -218,6 +218,11 @@ public void postDeleteByQuery(Engine.DeleteByQuery deleteByQuery) { | |||
} | |||
} | |||
|
|||
public void noopUpdate(String type) { | |||
totalStats.noopUpdates.inc(); | |||
typeStats(type).noopUpdates.dec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean .inc ?
Thanks Nik, I just merged this change. FYI I did a minor change while merging to replace |
Thanks! I saw the comment and, yeah, it agree it was supposed to be inc. Thanks again! Nik On Tue, Jul 22, 2014 at 8:57 AM, Adrien Grand notifications@github.com
|
This allows you to request that Elasticsearch figure out if an update using a document is a noop and then skip it. This is useful for clients where it is difficult to figure out if the request is really a noop.
Also adds two tolerance measures that can be used to deem a request a noop even if it isn't quite. The idea being that it isn't that important to know that a page has 100 vs 101 links - so you may as well wait until the field is out of date by more then, say, 10%.
Close #6822