Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some way to query "total" field from result set? #830

Closed
ledjon opened this issue Aug 23, 2016 · 1 comment
Closed

Some way to query "total" field from result set? #830

ledjon opened this issue Aug 23, 2016 · 1 comment

Comments

@ledjon
Copy link

ledjon commented Aug 23, 2016

I can't find a way to have this connector provide access to the "meta data" of the response set.

For example, a very efficient way to get the total count of an elasticsearch query is to simply find the "total" field in the json blob of the first page of results (assuming you are scrolling, etc.)

An example response form a scroll query:
{ u'_scroll_id': u'xxxx', u'_shards': { u'failed': 0, u'successful': 10, u'total': 10}, u'hits': { u'hits': [], u'max_score': 0.0, u'total': 1593020 }, u'timed_out': False, u'took': 3643}

Notice how it took 3 seconds (on a HUGE data set) and I can now use the "hits.total" value to return to the user.

Obviously actually getting the data then means a full fetch, but there are many use cases to simply get the count of the query first.

Ideas/thoughts?

@jbaiera
Copy link
Member

jbaiera commented Aug 23, 2016

@ledjon We use Github issues to track bugs and actionable features only. Please ask questions in the discuss forum.

ES-Hadoop primarily focuses on providing bulk reading of documents from Elasticsearch into Hadoop for analysis and bulk writing of documents from Hadoop into Elasticsearch. While there are some plans to eventually add in the ability to retrieve things like counts and aggregations along with the original data, these initiatives are primarily limited by the available API hooks in Hadoop. If you need these aggregations for a job, my advice is to query them from Elasticsearch and then to serialize them to the tasks for use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants