Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature generic query builder #71

Closed
wants to merge 21 commits into from
Closed

Conversation

danpaz
Copy link
Owner

@danpaz danpaz commented Oct 25, 2016

For experimentation, not meant to merge.

Linked to #65 #70. Following patterns established in #64 and #16 (thanks @nfantone and @johannes-scharlach!).

Introduces a generic QueryBuilder that

  • Does not require definitions for every query that elasticsearch allows, instead it transparently builds a query clause depending on the number and type of arguments to the query function.
  • This means all query types should be build-able by bodybuilder and we can delete the entire src/queries directory.
  • Allows nesting queries in the query clause (needed for nested, has_child and has_parent) by passing a function as the last argument to query.
  • This pattern could be extended to filters (Add filter aggregations #64 is already quite similar) and aggregations.
  • Can we use this pattern to compose filters and queries, or filters and aggregations as in Add filter aggregations #64?

See tests for usage. Run tests using babel-tape-runner test/query-builder.js | tap-spec.

🙇

@danpaz
Copy link
Owner Author

danpaz commented Oct 25, 2016

Continuing to play around on this branch, the last commit may be of interest to you @johannes-scharlach as it allows creating filter aggregations using this generic api. See the last test for an example: https://github.com/danpaz/bodybuilder/pull/71/files#diff-507f3cb6e87ded24d8ab794a830ded03R110.

Copy link
Collaborator

@johannes-scharlach johannes-scharlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of being more generic and totally see the necessity for it to be able to keep up with the changes in the DSL.

However I don't think this is the approach I would go down with:

To me there are 3 elements to this library: The QueryBuilder (only for filter and query), the AggregationBuilder and the BodyBuilder (that brings everything together, is responsible for size, etc.). I'll see if I can come up with a concept MR, similar to this one, so that we can compare approaches :)

t.plan(1)

const result = new QueryBuilder().aggregation('filter', null, 'red_products', (b) => {
return b.filter('term', 'color', 'red').aggregation('avg', 'price', 'avg_price')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm.... b.filter makes sense for filters and filter aggregations, but otherwise not. Should that really be available by default?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was to be as flexible as possible, so this approach allows combining queries, filters and aggregations arbitrarily. Here are a few more examples where this type of composition is needed:

Is your concern that this approach opens the possibility for someone to build an invalid query?

let newAggregation = this._buildAggregation(...args)

if (_.isFunction(nested)) {
let clause = newAggregation[_.findKey(newAggregation)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw: This is the same as

const clause = _.find(newAggregation)


return {
[name]: {
[type]: (() => _.assign({field}, opts))()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case field can be overwritten by opts. Would rather expect

Object.assign({}, opts, {field})

(I also think I understood that _.assign is deprecated in favour of Object.assign, but I might be mistaken)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johannes-scharlach (No, you are correct. It is.)

let clause = {}

if (field && value && opts) {
clause = _.merge({[field]: value}, opts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here the Object.assign({}, opts, {[field]: value}) approach?

@danpaz
Copy link
Owner Author

danpaz commented Oct 26, 2016

Really appreciate your feedback @johannes-scharlach, I think it's important to get this abstraction right. Looking forward to your PR!

@msanguineti
Copy link
Contributor

msanguineti commented Oct 27, 2016

I've found a problem when doing something like:

query('a', 'b', 'c', (q) => {
  return q
    .orQuery('h', 'k')
    .orQuery('s', 't')
    .orQuery('x', 'y', 'z', (q) => {
      return q.('w', 'v')
    })    
})

What happens is that the first two queries in the chain (.orQuery('h', 'k').orQuery('s', 't')) are discarded.

If I reverse the order so that I have the complex query at the top of the chain:

query('a', 'b', 'c', (q) => {
  return q
     .orQuery('x', 'y', 'z', (q) => {
      return q.('w', 'v')
    })
    .orQuery('h', 'k')
    .orQuery('s', 't')       
})

I get a wrong logical grouping where all the queries are processed but instead of having something like:

{
  bool: {
    should: [
      { [object] },
      { [object] },
      { [object] },
    ]
}

I get

{
  bool: {
    should: [
      { [object] }
    ],
    must : [
      { [object] },
      { [object] },
    ]
}

I believe your recursion needs looking into when there is multi level nesting of queries/filters.

This is a real-world (albeit redacted a bit) example query which caused the problems:

body: {
            size: 10,
            from:  0,
            _source: ['a', 'b', 'c', 'd', 'e', 'created_by.date'],
            query: {
              constant_score: {
                filter: {
                  bool: {
                    should: [
                      {
                        term: {
                          'created_by.user_id': 'abc'
                        }
                      }, {
                        nested: {
                          path: 'doc_meta',
                          query: {
                            constant_score: {
                              filter: {
                                term: {
                                  'doc_meta.user_id': 'abc'
                                }
                              }
                            }
                          }
                        }
                      }, {
                        nested: {
                          path: 'tests',
                          query: {
                            constant_score: {
                              filter: {
                                term: {
                                  'tests.created_by.user_id': 'abc'
                                }
                              }
                            }
                          }
                        }
                      }
                    ]
                  }
                }
              }
            }

So to do this I would do (excluding _source, from and size):

  query('constant_score', '', '', (f) => {
    return f
            .orFilter('term', 'created_by.user_id', 'abc')
            .orFilter('nested', 'path', 'doc_meta', (q) => {
              return q.query('constant_score', '', '', (f) => {
                return f.filter('term', 'doc_meta.user_id', 'abc')
              })
            })
            .orFilter('nested', 'path', 'tests', (q) => {
              return q.query('constant_score', '', '', (f) => {
                return f.filter('term', 'tests.created_by.user_id', 'abc')
              })
            })

When running this only the last (nested -> path: tests -> ...) is returned. The rest gets discarded.

 operator: deepEqual
      expected: |-
        { constant_score: { filter: { bool: { should: [ [Object], [Object], [Object] ] } } } }
      actual: |-
        { constant_score: { filter: { nested: { path: 'tests', query: { constant_score: [Object] } } } } }

Of course, maybe I am doing something wrong :)

@msanguineti
Copy link
Contributor

Also, I tried to reproduce the example about quering nested objects given on the official elasticsearch site, and it fails.

Here I reproduce the test case so you can just cut and paste it into test/query-builder.js

test('QueryBuilder should make this chained nested query', (t) => {
  t.plan(1)

  const result = new QueryBuilder().query('match', 'title', 'eggs').query('nested', 'path', 'comments', {score_mode: 'max'} , (q) => {
      return q.query('match', 'comments.name', 'john').query('match', 'comments.age', 28)    
  })

  t.deepEqual(result._queries, {
    bool: {
      must: [
        {
          match: {
            title: 'eggs'
          }
        },
        {
          nested: {
            path: 'comments',
            score_mode: 'max', 
            query: {
              bool: {
                must: [
                  {
                    match: {
                      'comments.name': 'john'
                    }
                  },
                  {
                    match: {
                      'comments.age': 28
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  })
})

This is the output:

  QueryBuilder should make this chained nested query


    × should be equivalent
    -----------------------
      operator: deepEqual
      expected: |-
        { bool: { must: [ { match: { title: 'eggs' } }, { nested: { path: 'comments', query: [Object], score_mode: 'max' } } ] } }
      actual: |-
        { nested: { path: 'comments', query: { bool: { must: [ [Object], [Object] ] } }, score_mode: 'max' } }

…npaz/bodybuilder into feature-generic-query-builder

# Conflicts:
#	test/query-builder.js
@danpaz
Copy link
Owner Author

danpaz commented Oct 27, 2016

@msanguineti thanks for testing this out, looks like there's more work needed to get the nesting right.

@danpaz danpaz mentioned this pull request Oct 27, 2016
@danpaz
Copy link
Owner Author

danpaz commented Oct 27, 2016

Made some updates

@danpaz
Copy link
Owner Author

danpaz commented Oct 27, 2016

Now I remember why I've been using _.assign instead of Object.assign: this is only available in node >= 4 (see failing tests). That's fine, I think we can drop support for node 0.10 and 0.12 soon considering they're both reaching end of their maintenance periods.

@danpaz danpaz closed this Nov 12, 2016
@danpaz danpaz deleted the feature-generic-query-builder branch August 20, 2021 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants