-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhance smoosh to cleanup search indexes when ddocs change #4718
base: main
Are you sure you want to change the base?
Conversation
7dac5cf
to
8ec19b1
Compare
@@ -342,6 +345,8 @@ find_channel(#state{} = State, [Channel | Rest], Object) -> | |||
find_channel(State, Rest, Object) | |||
end. | |||
|
|||
stale_enough({?INDEX_CLEANUP, _}) -> | |||
true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ddoc_updated events are rare, unlike database updates, so choosing not to action one would leave unreferenced indexes on disk for a long time, perhaps indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're also triggering these during every shard compaction attempt, proportionally to the number of shards (so we'd roughly trigger once for each clustered db). That should be a low enough rate, it may be worth checking what it looks like a busy cluster.
I need to add tests to this and intend to, just wanted to show the work so far. |
8ec19b1
to
bb020f8
Compare
Overall this looks like what we would want to happen: as soon as we update the ddoc we can clean up the old index data. With the ddoc trigger one thing I was worried about is the case when we get a ddoc_updated on one node first, and not the others yet: we make a clustered call to get all the ddocs and get the signatures, what if that's stale since it happens concurrently with the |
that's an excellent point, we must ensure that cannot happen, at least not with any higher probability than the existing cleanup code for mrview. |
cleanup_clouseau_indices(Dbs, ActiveSigs) -> | ||
Fun = fun(Db) -> clouseau_rpc:cleanup(Db, ActiveSigs) end, | ||
lists:foreach(Fun, Dbs). | ||
cleanup_nouveau_indices(Dbs, ActiveSigs) -> | ||
Fun = fun(Db) -> nouveau_api:delete_path(nouveau_util:index_name(Db), ActiveSigs) end, | ||
lists:foreach(Fun, Dbs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These shouldn't crash if either of those are not enabled? If there is a chance we could wrap them in a try ... catch
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clouseau_rpc:cleanup
is a gen_server:cast
so that's fine even if the target node doesn't exist.
all the nouveau_api
functions have a send_if_enabled
check and return {error, nouveau_not_enabled}
when it's not enabled.
in either case I ignore the function result, but perhaps I should check that it's one of the two expected results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. The only worry was that we would throw an exception and prevent other indexes from getting cleaned up. Thanks for double-checking, I think it's fine as is, then.
We could punt changing the trigger mechanism for later and just opt to run the nouveau and dreyfus cleanup as is, alongside mrview indexes? |
Overview
automate cleaning up search indexes (clouseau and nouveau) when design documents are updated.
Testing recommendations
TBD
Related Issues or Pull Requests
N/A
Checklist
rel/overlay/etc/default.ini
src/docs
folder