Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terms facet results retrieve and pick certain number products for top users #6109

Closed
yao23 opened this issue May 9, 2014 · 2 comments
Closed

Comments

@yao23
Copy link

yao23 commented May 9, 2014

I have a collection of products which belong to few users, like

[
{ id: 1, user_id: 1, description: "blabla...", ... },
{ id: 2, user_id: 2, description: "blabla...", ... },
{ id: 3, user_id: 2, description: "blabla...", ... },
{ id: 4, user_id: 3, description: "blabla...", ... },
{ id: 5, user_id: 4, description: "blabla...", ... },
{ id: 6, user_id: 2, description: "blabla...", ... },
{ id: 7, user_id: 3, description: "blabla...", ... },
{ id: 8, user_id: 4, description: "blabla...", ... },
{ id: 9, user_id: 2, description: "blabla...", ... },
{ id: 10, user_id: 3, description: "blabla...", ... },
{ id: 11, user_id: 4, description: "blabla...", ... },
...
]

(the real data has more fields, but most important ones like 1st for product id, 2nd for user id, 3rd for product description.)

I'd like to retrieve 2 products for top 3 users whose products have highest matching score (matching condition is description includes "fashion" and some other keywords, in this case just use "fashion" as example) :

[
{ id: 2, user_id: '2', description: "blabla...", ..., _score: 100},
{ id: 3, user_id: '2', description: "blabla...", ..., _score: 95},
{ id: 4, user_id: '3', description: "blabla...", ..., _score: 90},
{ id: 5, user_id: '4', description: "blabla...", ..., _score: 80},
{ id: 7, user_id: '3', description: "blabla...", ..., _score: 70},
{ id: 8, user_id: '4', description: "blabla...", ..., _score: 65},
...
]

I have 3 possible ways to try:

(1) use term facet to get unique user_id in nested query, then use them for the user id range of outside query which focus on match description with keywords like "fashion".

I don't know how to implement it in ES (stuck in facet terms iteration and construct user_id range with subquery with facet), try in sql like:

select id, user_id, description
from product
where user_id in (
select distinct user_id
from product
limit 3)
order by _score
limit 6
/* 6 = 2 * 3 */

But it cannot guarantee top 6 products coming from 3 different user.

Also, according to the following two links, it seems facet terms specific information iteration feature has not been implemented in ES so far.
http://elasticsearch-users.115913.n3.nabble.com/Terms-stats-facet-Additional-information-td4035199.html

#256

(2) query with term filed in description matched with keywords like "fashion", at same time do statistics for each user_id with aggregation and limit the count to 2, then pick top 6 products with highest matching score.

I still don't know how to implement in ES.

(3) use brute force with multiple queries until find top 3 users, each one has 2 products with highest matching scores.

I mean use a hash map, key is user_id, value is how many times it appears. Query with matching keywords first, then iterate immediate results and check hash map, if value is less than 2, add to final result product list, otherwise skip it.

Please let me know if you can figure it out in the above 1st or 2nd way.

Appreciate in advance.
Yao

@martijnvg
Copy link
Member

I think the best way to achieve this is via executing subsequent requests after the main request with an aggregation definition. For each bucket you need the top documents for, you'll need to execute a search request for it (using the bucket as a filter). You can bundle the search requests in a multi search request in order to optimize this a bit.

There is a new top_hits aggregation (#6124) on its way, that can group the top hits under each bucket being returned.

@clintongormley
Copy link

Closed in favour of #6124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants