You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when indexing or deleting documents, they are hashed based on the type and id and routed to a shard based on the hash result. When searching, the search request is a broadcast request to all shards within an index.
Sometimes, it make sense to control this routing. For example, indexing blob posts for a specific user might be routed based on the user id. If routing is controlled, then when performing a search on posts for only that user, only the relevant shard that match the user id can be queried, resulting in much faster search and overhaul less load on the system.
The index and delete operation now allow for a routing parameter to be specified. When specified, that routing value will be used to control the shard placement.
The bulk operation allows to specify _routing for each item to control the routing for that index/delete/create operation.
The get operation allows to specify a routing parameter as well, to specify which shard the doc will be fetched from. Note, in our blog post partitioned by user example, doing a lookup just by the post id will not be enough and probably will not find anything, the user id will also need to be specified in the routing parameter.
The search and count operations accept a routing parameter as well, controlling which shards the search will be executed on. The routing parameter accepts a comma separated list of the routing values to use, and the all relevant shards will execute the query.
Note, even when specifying a search for a specific user blog posts using the routing parameter set to the user id, filtering only the user posts is still needed by, for example, adding a term filter with the user id.
The delete_by_query operation also accepts a routing parameter, which is a comma separated list of routing values of controlling which shards the delete query will be executed on.
One question that might arise is why not use indices to get the same behavior. For example, create an index per user. The reason is that indices have a much lower limit on how many of them can be created. A single machine can easily support millions of users with millions of posts. Creating an index per user will mean millions of indices, which is problematic, as even with a single shard per index, it does mean millions of lucene indices.
The text was updated successfully, but these errors were encountered:
Currently, when indexing or deleting documents, they are hashed based on the type and id and routed to a shard based on the hash result. When searching, the search request is a broadcast request to all shards within an index.
Sometimes, it make sense to control this routing. For example, indexing blob posts for a specific user might be routed based on the user id. If routing is controlled, then when performing a search on posts for only that user, only the relevant shard that match the user id can be queried, resulting in much faster search and overhaul less load on the system.
The
index
anddelete
operation now allow for arouting
parameter to be specified. When specified, that routing value will be used to control the shard placement.The
bulk
operation allows to specify_routing
for each item to control the routing for that index/delete/create operation.The
get
operation allows to specify arouting
parameter as well, to specify which shard the doc will be fetched from. Note, in our blog post partitioned by user example, doing a lookup just by the post id will not be enough and probably will not find anything, the user id will also need to be specified in therouting
parameter.The
search
andcount
operations accept arouting
parameter as well, controlling which shards the search will be executed on. Therouting
parameter accepts a comma separated list of the routing values to use, and the all relevant shards will execute the query.Note, even when specifying a search for a specific user blog posts using the
routing
parameter set to the user id, filtering only the user posts is still needed by, for example, adding aterm
filter with the user id.The
delete_by_query
operation also accepts arouting
parameter, which is a comma separated list of routing values of controlling which shards the delete query will be executed on.One question that might arise is why not use indices to get the same behavior. For example, create an index per user. The reason is that indices have a much lower limit on how many of them can be created. A single machine can easily support millions of users with millions of posts. Creating an index per user will mean millions of indices, which is problematic, as even with a single shard per index, it does mean millions of lucene indices.
The text was updated successfully, but these errors were encountered: