Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API: Detect noop updates when using doc #6862

Closed
wants to merge 7 commits into from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jul 14, 2014

This allows you to request that Elasticsearch figure out if an update using a document is a noop and then skip it. This is useful for clients where it is difficult to figure out if the request is really a noop.

Also adds two tolerance measures that can be used to deem a request a noop even if it isn't quite. The idea being that it isn't that important to know that a page has 100 vs 101 links - so you may as well wait until the field is out of date by more then, say, 10%.

Close #6822

This should help prevent spurious updates that just cause extra writing
and cache invalidation for no real reason.

Close elastic#6822
Just numeric tolerances - percentages and absolute numbers.  Other things
are certainly possible from here.
@nik9000
Copy link
Member Author

nik9000 commented Jul 14, 2014

I need to add docs and hit it over REST a few more times to make sure it works, but so far so good.

@nik9000
Copy link
Member Author

nik9000 commented Jul 14, 2014

Might also be good to see what the performance cost is but I don't imagine its huge. Next to 0 if you disable it.

@nik9000 nik9000 changed the title Detect noop updates from doc_as_upsert Detect noop updates when using doc in update API Jul 14, 2014
@nik9000
Copy link
Member Author

nik9000 commented Jul 14, 2014

Tried a quick and dirty performance test

echo '{
    "doc": {"foo": 2},
    "doc_as_upsert": true,
    "detect_noop": {
        "foo": "10%"
    }
}' > /tmp/test2
ab -n 1000 -c 20 -p /tmp/test2 http://localhost:9200/test/test/1/_update

Request dropped from 11ms to 8ms. Cut but not a big deal - I'm more interested in creating less garbage that has to be merged out later.

@jpountz
Copy link
Contributor

jpountz commented Jul 15, 2014

I understand the point of not scheduling future merges if they are not necessary, but I'm not very happy with the part about tolerances. Eg if you have one update that increments a counter by 10 it could be rejected while two updates that would increment the same counter by 5 would be accepted. I think that can be confusing and should rather be dealt with from client side.

@nik9000
Copy link
Member Author

nik9000 commented Jul 15, 2014

I understand the point of not scheduling future merges if they are not necessary, but I'm not very happy with the part about tolerances. Eg if you have one update that increments a counter by 10 it could be rejected while two updates that would increment the same counter by 5 would be accepted. I think that can be confusing and should rather be dealt with from client side.

I could certainly document that adding tolerances to counters is unlikely to be a good idea. In my use case I don't have counters but instead recount on the fly. In that case the tolerances make this more interesting by allowing me to trade off a tiny bit of accuracy for performance.

@jpountz
Copy link
Contributor

jpountz commented Jul 16, 2014

I'd still rather have it implemented on client side, where you have all the flexibility to define what you want to use as a distance between documents to know whether an update is worth it. I'm worried about making the API complicated for something that doesn't bring much value.

@nik9000
Copy link
Member Author

nik9000 commented Jul 16, 2014

Is it still worth having the noop detection without the tolerances? I'm happy to roll them back. I can't use the noop detection without the tolerances but if its worth having its worth having.

I imagine I'll give scripted updates another shot in 1.3 when I can use groovy. They'll probably be much less brittle then MVEL was.

@jpountz
Copy link
Contributor

jpountz commented Jul 16, 2014

Detecting no-ops without tolerances sounds ok to me, but I'm wondering how common it is to submit a no-op update.

I was thinking about making it an automatic optimization rather than an option, but it would sometimes not work as expected. For example, it is allowed to update mappings to add a new multi field. So even an update that would re-submit the same document could result in a different inverted index. So if we decide to add such detection, I think we would need to make it an opt-in to avoid surprises.

My current feeling is that it has a high cost (in terms of development/maintenance) compared to the potential speedup but if no-op updates prove to be common I could change my opinion. @clintongormley What do you think?

@clintongormley
Copy link

@jpountz i agree with you about not liking the tolerances, although what would need to be done to make this work in the application is to retrieve the document, check the tolerances, and then decide whether to reindex or not.

I do hear of people wanting noop updates - I don't think it is a corner case, so I'd be inclined to accept the PR without tolerances, but keeping noop as an option for the reasons you cite. Another reason to make it opt-in: the user updates a synonyms file.

@nik9000
Copy link
Member Author

nik9000 commented Jul 17, 2014

Removed tolerances. I believe one side effect of the way the PR is implemented now is that updates that are literally empty objects ({}) are considered noops even with detection off. I imagine I can fix that pretty easily if its a problem.

Object old = source.get(changesEntry.getKey());
if (old instanceof Map && changesEntry.getValue() instanceof Map) {
// recursive merge maps
modified = update((Map<String, Object>) source.get(changesEntry.getKey()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be |= instead of =? Otherwise the last one would always win?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - I'll fix it.

@jpountz
Copy link
Contributor

jpountz commented Jul 17, 2014

Indeed I think that the behaviour when detect_noop is false should be the same as today, so to always perform an update and increase the version?

@nik9000
Copy link
Member Author

nik9000 commented Jul 17, 2014

Indeed I think that the behaviour when detect_noop is false should be the same as today, so to always perform an update and increase the version?

Can do.

1.  |= for recursive merge
2.  Do not consider empty updates noop without detect_noops
@nik9000
Copy link
Member Author

nik9000 commented Jul 18, 2014

Done with updates from this round of comments. Thanks for reviewing!

@@ -218,6 +218,11 @@ public void postDeleteByQuery(Engine.DeleteByQuery deleteByQuery) {
}
}

public void noopUpdate(String type) {
totalStats.noopUpdates.inc();
typeStats(type).noopUpdates.dec();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean .inc ?

@jpountz
Copy link
Contributor

jpountz commented Jul 22, 2014

Thanks Nik, I just merged this change. FYI I did a minor change while merging to replace dec with inc on the type stats in ShardIndexingService.

@jpountz jpountz closed this Jul 22, 2014
@jpountz jpountz removed the review label Jul 22, 2014
@nik9000
Copy link
Member Author

nik9000 commented Jul 22, 2014

Thanks! I saw the comment and, yeah, it agree it was supposed to be inc.

Thanks again!

Nik

On Tue, Jul 22, 2014 at 8:57 AM, Adrien Grand notifications@github.com
wrote:

Thanks Nik, I just merged this change. FYI I did a minor change while
merging to replace dec with inc on the type stats in ShardIndexingService.


Reply to this email directly or view it on GitHub
#6862 (comment)
.

@clintongormley clintongormley changed the title Detect noop updates when using doc in update API Update API: Detect noop updates when using doc Sep 10, 2014
@clintongormley clintongormley added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >feature release highlight v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update API: Detect noop updates
3 participants