Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated esquery.py file. #58

Merged
merged 2 commits into from
May 31, 2018
Merged

Conversation

aswanipranjal
Copy link
Contributor

Closes #57.

With this PR, all the string queries in esquery.py file have been converted into elasticsearch_dsl objects such as Aggregation, Query and Search.
I've also tried to update the searches and queries to the latest norms that elasticsearch follows by using match_phrase rather than using type: phrase to search for queries.

get_agg returns a dictionary containing the search object with the necessary fields.

Removed all the manually created string queries in all the functions
and replaced them with elasticsearch_dsl objects like A(aggregation),
Q(query) and Search.
Updated queries to use the current methods to search and aggregate.

Closes chaoss#57
@aswanipranjal
Copy link
Contributor Author

@jgbarah, @valeriocos, @acs can you please review this PR?

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @aswanipranjal. If it is possible, I would suggest to add some tests.

Copy link
Contributor

@jgbarah jgbarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. Very good job. Could you please try to address the comments? In addition, we need some tests. Could you write at least one test function per function? I think that producing some queries, and comparing that they are the same (by comparing the resulting DSL dicts, for example) would be good enogugh for a start.


:param filters: dict with the filters to be applied
:param inverse: if True include all the inverse filters (the one starting with *)
:return: a string with the DSL value for these filters
:return: a list of match object(dict)(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all references to elasticsearch_dsl objects, I would expliticly mention the module. For example in this case, "list of elasticsearch_dsl match objects", or if you prefer, "es_dsl" for short. Please, add this to all this kind of references in other methods, so that it is crystal clear what they do.

Besides, what do you mean by (dict)(s)? It is better that you describe in detail the data structure that the method returns, including, if it is convenient, an example. Remember that this message should include enough information for a trained developer to be able of using the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By dicts i meant that the filters are dictionaries. I'll update the comments to be more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx


class ElasticQuery():
""" Helper class for building Elastic queries """

AGGREGATION_ID = 1 # min aggregation identifier
AGG_SIZE = 100 # Default max number of buckets
ES_PRECISION = 3000
ES_PRECISION = 3000 # This is the default value for percision_threshold

@classmethod
def __get_query_filters(cls, filters=None, inverse=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use "filters=[]". Since it is expected to be a list, this way you don't need, in the implementation, to check for None, and it is even more explicit that you're expecting a list for that argument. I would only use None if you find a case where you need to make a difference between that "None" and an empty list as an argument.


return query_filters

@classmethod
def __get_query_range(cls, date_field, start=None, end=None):
"""
Create a string with range DSL query for the date in date_field from start to end dates.
Create a filter dict with date_field from start to end dates.

:param date_field: field with the date value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For dates, and other "complex" data types, please specify the actual format, if possible. For example, in this case, is it "datetime.datetime"? (besides: if it is not something such as datetime.datetime, I would rather consider changing the profile to really use something like that, so that dates are converted into a pythonic format before this method (and others below) are called. We can build some auxiliary function for that, if needed.

query_basic = Search()

# Not using this for now.
# query_basic = query_basic.query("query_string", analyse_wildcard="true", query="*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for keeping the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing special. I was just going to experiment with the wildcard flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, maybe remove it now, and produce later a pr with the extra functionality, if you find it convenient.

if not agg_id:
agg_id = cls.AGGREGATION_ID
query_agg = A("terms", field=field, size=cls.AGG_SIZE, order={"_count": "desc"})
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much better in the description of the method, above.

return bounds

@classmethod
def __get_query_agg_ts(cls, field, time_field, interval=None,
time_zone=None, start=None, end=None,
agg_type='count', offset=None):
"""
Create a string with an aggregation DSL query for getting the time series
values for field.
Create an aggregation DSL object for getting the time series values for field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it really a "es_dsl aggregation object" instead of a "aggregation DSL object"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my bad. Thanks!

""" % (query_basic)

# size=0 gives only the count and not the hits
query = query_basic.extra(size=0)
return query

@classmethod
def get_agg(cls, field=None, date_field=None, start=None, end=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As commented above, in other comments, try to clean up the profile a bit, so that for example fields is a list, dates are datetime or similar, etc. Please remember to do so for all methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

offset=offset)
s.aggs.bucket(agg_id, query_agg)

return json.dumps(s.to_dict())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is returning the query as a JSON file. Shouldn't it better return a es_dsl object, so that we get rid of es DSL queries completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgbarah, that is because in metrics.py, get_metrics_data function requests elasticsearch for the data and it requires the data to be in JSON format other wise we will get a 500 error.
screen shot 2018-05-23 at 2 26 14 pm

We can change the metrics.py file as well to use the Search().execute method to get the data but i thought that we are probably going to change all the methods and replace them with better more agile methods anyways hence I didn't change all the files.

Copy link
Contributor Author

@aswanipranjal aswanipranjal May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgbarah, @valeriocos: Should I be looking at changing all the files so that we are only using elasticsearch_dsl module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Not for now. Let's keep that for another pr, but let's finish this letting it produce JSON as the result.

@aswanipranjal
Copy link
Contributor Author

I'll make another commit with the tests and cleaning up the comments a bit more.

Added details to the comments in esquery.py about the data sctuctures
used and the format that the parameters should be in.
Added tests for every function in esquery.py file.
@aswanipranjal
Copy link
Contributor Author

aswanipranjal commented May 26, 2018

@jgbarah @valeriocos @acs please review. I've updated esquery.py with proper comments and added tests for all the functions.

I am ready to make another PR which completes the cleaning process. Once this PR has been merged, then I'll make the second PR.

Or, if not, I can make that commit in that PR a part of this PR only.

self.assertDictEqual(self.es._ElasticQuery__get_query_range(date_field=self.date_field,
start=self.start, end=self.end), test_range_dict)

def test_get_query_basic(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this test in two: test_no_filters and test_with_filters.

}
self.assertDictEqual(self.es._ElasticQuery__get_bounds(self.start, self.end), test_bounds_dict)

def test_get_query_agg_ts(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this test in two: tests that check exceptions and tests that do not

self.assertDictEqual(test_ts_dict2.to_dict(), test_avg_ts_dict)

def test_get_agg(self):
"""Test the aggregation function"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a blank line after the method description

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @aswanipranjal for this PR. Overall it looks good to me. I have added just some minor comments.

Copy link
Contributor

@jgbarah jgbarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with this version, so that we can build on top of it.

@aswanipranjal please take note of @valeriocos suggestion, and split the tests in a separate pr.

@jgbarah jgbarah merged commit 104b657 into chaoss:master May 31, 2018
@aswanipranjal aswanipranjal deleted the cleanup-esquery branch June 3, 2018 11:04
@aswanipranjal aswanipranjal restored the cleanup-esquery branch June 3, 2018 11:04
@aswanipranjal aswanipranjal deleted the cleanup-esquery branch July 8, 2018 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants