$in operator is in-efficient #3251

wildersachin · 2020-11-08T15:18:28Z

wildersachin
Nov 8, 2020

We are on version 2.2.0 (we are using blockchain on kubernetes docker images)
There have been multiple issues surrounding this already, but wanted to know what alternatives are there if the $in query is inefficent.

Our use-case is us using an $in query for a particular set of documents after caching what documents a user can see, so it is very critical that it is efficient in the long run although are dataset isn't particularly large(around a few thousand in a year). It still will affect load times on our front end.

Given the background, and dataset size what do you recommend for us to do?

Desired Behaviour

Optimize data fetches relating to $in queries

Answered by kocolosk

Nov 11, 2020

Hiya, the CouchDB query planner has room for improvement here. Take a look at the indexable_fields function:

https://github.com/apache/couchdb/blob/3.1.1/src/mango/src/mango_idx_view.erl#L252-L312

Currently it does not try to satisfy $in or $or operators using an index, so it's returning everything with docType: data and then filtering at query time. One could certainly imagine improving the planner so it performs multiple point lookups in the index, one for each element of your $in array, but that's not present today. Some possible workarounds:

Submit multiple queries in parallel, one for each dataId and then combine the results in your app
Define a view keyed on [ "docType", "dataId" ] …

View full answer

wildersachin · 2020-11-08T15:19:47Z

wildersachin
Nov 8, 2020
Author

In our development environment:

{
   "use_index": "dataByDataIdsIndex",
   "selector": {
      "docType": "data",
      "dataId": {
         "$in": [
            "6cd2aa70-01c6-11eb-813a-03cb574c94c4"
         ]
      }
   }
}

{
    "index": {
        "fields": [ "docType", "dataId" ]
    },
    "ddoc": "dataByDataIdsIndex", 
    "type": "json"
}

0 replies

kocolosk · 2020-11-11T15:05:51Z

kocolosk
Nov 11, 2020
Collaborator

Hiya, the CouchDB query planner has room for improvement here. Take a look at the indexable_fields function:

https://github.com/apache/couchdb/blob/3.1.1/src/mango/src/mango_idx_view.erl#L252-L312

Currently it does not try to satisfy $in or $or operators using an index, so it's returning everything with docType: data and then filtering at query time. One could certainly imagine improving the planner so it performs multiple point lookups in the index, one for each element of your $in array, but that's not present today. Some possible workarounds:

Submit multiple queries in parallel, one for each dataId and then combine the results in your app
Define a view keyed on [ "docType", "dataId" ] and then use the POST version of the API to retrieve multiple individual rows in a single request: https://docs.couchdb.org/en/3.1.1/api/ddoc/views.html#post--db-_design-ddoc-_view-view

0 replies

wildersachin · 2020-11-11T17:06:03Z

wildersachin
Nov 11, 2020
Author

Thank you @kocolosk , could not have asked for a better response. We are going to look into creating a view, we are also looking into this concepts for joins in the future, luckily we have more than enough time before our app starts crashing

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

$in operator is in-efficient #3251

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

$in operator is in-efficient #3251

wildersachin Nov 8, 2020

Desired Behaviour

Replies: 3 comments

wildersachin Nov 8, 2020 Author

kocolosk Nov 11, 2020 Collaborator

wildersachin Nov 11, 2020 Author

wildersachin
Nov 8, 2020

wildersachin
Nov 8, 2020
Author

kocolosk
Nov 11, 2020
Collaborator

wildersachin
Nov 11, 2020
Author