Expand component #1

joel-bernstein · 2014-01-10T14:03:27Z

This issue introduces a new search component called the Expand component. The Expand component implements group expansion for a single page of results collapsed by the CollapsingQParserPlugin

I'll be working this ticket initially in my fork of the Heliosearch project in a branch called "expand".

https://github.com/joelbernstein2013/heliosearch

krantiparisa · 2014-01-10T16:17:40Z

for this component.

how costly it is in terms of Memory along with Grouping? Is it scalable with 100 groups and each group has 3 sub groups and each sub groups has 100 docs?

And run this on top of an index with 10M docs?

joel-bernstein · 2014-01-10T16:44:52Z

The expand component works with a single page of collapsed results. So if your page has 100 groups, with 3 sub groups, with 100 docs each, the component will have to work with 30,000 documents.

Not an overwhelming number but not a small number.

The 10 million document set will be collapsed by the CollapsingQParserPlugin. How many distinct top level groups are in the index? It sounds like there might be around 33,333 distinct top level groups if each top level group has 300 docs in it. The CollapsingQParserPlugin will eat that for lunch, very little memory used.

joel-bernstein · 2014-01-10T16:52:10Z

Kranti,

I'll be putting the initial implementation up later today or over the weekend. It doesn't cover sub-grouping yet. So if you want to work on that, that would be excellent. We can collaborate on how to add this to the code.

Joel

krantiparisa · 2014-01-10T16:57:21Z

How many distinct top level groups are in the index?

there could be 300,000 unique top level groups (entity ids) overall in the index (size: 5GB)
but considering the filters, queries - the available unique top level groups for a given request could be max. 20,000
out of the 20,000 top level groups, the page max could be 100.
each group could have 3 sub groups and each sub group might have 100 max docs

can you help me to roughly estimate the memory size and response time
does this have any possible cache hits to get faster responses?

krantiparisa · 2014-01-10T16:58:18Z

Sure, I can work with you on this. you might need to answer my stupid questions at times :)

joel-bernstein · 2014-01-10T17:44:16Z

The CollapsingQParserPlugin creates arrays based on the total number of unique values in the field. Rough esitimates for 300,000 unique terms in the field would be 3-5 MB of transient memory per query.

The expanding of groups I haven't measured yet. With such a large page, part of the issue will be retrieving the stored values for all those documents. This can be very expensive.

krantiparisa · 2014-01-10T18:02:02Z

if we just need docIds at the docList level, means

group1=>1234567 (the value of the group field)
subgroup1=>catalog1 (the value of the sub group field)
docList=> list of doc ids
subgroup2=>catalog2 (the value of the sub group field)
docList=> list of doc ids
group2=>6764237 (the value of the group field)
subgroup1=>catalog1 (the value of the sub group field)
docList=> list of doc ids
subgroup2=>catalog2 (the value of the sub group field)
docList=> list of doc ids

if we get TopGroups like the above, then metadata can be based on what fields the user wants. I am trying find out the memory and response times for the above structure from the API call.

krantiparisa · 2014-01-11T07:22:50Z

Joel,

Is it possible to share the ExpandComponent on Saturday (11 Jan), I can spend good time on Sunday and try to get the Sub Groups. I want to also run few performance tests using traditional grouping and the new implementation for collapsing+expanding in the use cases I was describing above.

joel-bernstein · 2014-01-11T18:29:16Z

Just committed initial implementation of the ExpandComponent at my heliosearch clone in the expand branch:

https://github.com/joelbernstein2013/heliosearch/tree/expand

Initial patch compiles but has not been tested yet.

VadimKirilchuk · 2014-01-11T19:53:02Z

I think it's worth to point to commit itself
joel-bernstein@c6db5bc

2014/1/11 joelbernstein2013 notifications@github.com

Just committed initial implementation of the ExpandComponent at my
heliosearch clone in the expand branch:

https://github.com/joelbernstein2013/heliosearch/tree/expand

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-32103221
.

krantiparisa · 2014-01-14T01:32:06Z

Joel,

I deployed your branch code and started Solr with a pre-populated index having 5M+ documents.

Sample Query:

http://localhost:8983/solr/collection1/select?q=relatedAllIds:8118784557012618112 AND showingType:linear&wt=xml&fq={!collapse field=programId min=windowStart}&fl=programId,windowStart&expand=true&expand.field=showingId&expand.limit=5&expand.rows=1&start=0&rows=2&sort=windowStart asc

Idea is to get the distinct program ids (collapsing/grouping) and sort them based on the windowStart field. Here is the response

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">28</int>
<lst name="params">
<str name="expand.rows">1</str>
<str name="sort">windowStart asc</str>
<str name="fl">programId,windowStart</str>
<str name="expand.limit">5</str>
<str name="start">0</str>
<str name="q">
relatedAllIds:8118784557012618112 AND showingType:linear
</str>
<str name="expand">true</str>
<str name="wt">xml</str>
<str name="fq">{!collapse field=programId min=windowStart}</str>
<str name="rows">2</str>
<str name="expand.field">showingId</str>
</lst>
</lst>
<result name="response" numFound="77" start="0">
<doc>
<long name="programId">8050846173392254112</long>
<long name="windowStart">1389375000000</long>
</doc>
<doc>
<long name="programId">8837586713084788112</long>
<long name="windowStart">1389382200000</long>
</doc>
</result>
<lst name="expanded"/>
</response>

Why is the expanded result is empty? My expectation is, from the collapsed result, for each programId get top 5 showings sorted by windowStart. how to form the query?

As a fake commit, this also closes github pull requests #1 #2 #3 #6 #10 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1555587 13f79535-47bb-0310-9956-ffa450edef68

yonik · 2014-01-15T18:08:45Z

Reopening - looks like my merge-up of trunk closed this accidentally.

joel-bernstein · 2014-01-28T18:56:03Z

Added initial test case:

joel-bernstein@2fb7278

joel-bernstein · 2014-01-29T15:22:31Z

Added a few more tests to cover the basic functionality.

joel-bernstein@a4b688a

My plan now is to add the distributed test cases and test it at scale and then I think this is nearing initial release condition.

Kranti has a few more features he'd like to add (group level paging, subgroup support ) and we can iterate further on these.

joel-bernstein · 2014-02-11T17:44:03Z

Added basic distributed test cases. joel-bernstein@a9e0b4e

Also a small formatting update:joel-bernstein@c7b61a9

Also did some performance testing at scale and the Expand component seems to perform at about the same speed as the CollapsingQParserPlugin. So performing a collapse and expand takes about twice as much time as doing only the collapse.

ghost assigned joel-bernstein Jan 10, 2014

yonik pushed a commit that referenced this issue Jan 15, 2014

Add netbeans to README.txt.

4a59018

As a fake commit, this also closes github pull requests #1 #2 #3 #6 #10 git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1555587 13f79535-47bb-0310-9956-ffa450edef68

yonik closed this as completed in f60a042 Jan 15, 2014

yonik reopened this Jan 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand component #1

Expand component #1

joel-bernstein commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

joel-bernstein commented Jan 10, 2014

joel-bernstein commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

joel-bernstein commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

krantiparisa commented Jan 11, 2014

joel-bernstein commented Jan 11, 2014

VadimKirilchuk commented Jan 11, 2014

krantiparisa commented Jan 14, 2014

yonik commented Jan 15, 2014

joel-bernstein commented Jan 28, 2014

joel-bernstein commented Jan 29, 2014

joel-bernstein commented Feb 11, 2014

Expand component #1

Expand component #1

Comments

joel-bernstein commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

joel-bernstein commented Jan 10, 2014

joel-bernstein commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

joel-bernstein commented Jan 10, 2014

krantiparisa commented Jan 10, 2014

krantiparisa commented Jan 11, 2014

joel-bernstein commented Jan 11, 2014

VadimKirilchuk commented Jan 11, 2014

krantiparisa commented Jan 14, 2014

yonik commented Jan 15, 2014

joel-bernstein commented Jan 28, 2014

joel-bernstein commented Jan 29, 2014

joel-bernstein commented Feb 11, 2014