-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer full titles shown in search results #2191
Comments
@garthg I see what you mean. A search for https://dataverse.harvard.edu/dataverse/antislaverypetitionsma?q=garrison shows "of William Lloyd Garrison" rather than "Senate Unpassed Legislation 1864, referred to next general court, SC1/series 231, Petition of William Lloyd Garrison". This is for https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/L8PGT Perhaps the fix will be as simple as increasing this value: https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize |
Thanks @pdurbin ! You're exactly right about what we'd expect to see there. |
Option 1: Always show the full title with no highlighting
The most straightforward way to make the full title always shown is to simply never show the version with highlights. It would look something like this: Note that the word "Garrison" in the title is no longer in bold. This would be a change to what's currently written in http://guides.dataverse.org/en/4.0.1/user/find-use-data.html which says:
For illustrative purposes, in the screenshot above I'm showing "Title: of William Lloyd Garrison" at the bottom of the card to show what it would look like if we placed the highlighted/matched title at the bottom with other fields that may match. We don't have to do this (we can continue to suppress "title" from being shown at the bottom) but it's an option. The code change would look something like this:
Option 2: Find a better "fragsize"I also played a bit with setting the "fragsize" of the highlight snippets. For example, if you set the fragsize to zero (as in the code below), then all the characters in the field that matched are returned, which can be quite long in the case of descriptions. It's the difference between this... ... and this (when searching for "world"): So the question with the second approach is if we could find a fragsize we're happy with. Perhaps this could be a configurable option so we could tweak it runtime until we setting on a value we like. The default fragsize is 100 (the first screenshot above). Here's how it looks with a fragsize of 300:
|
Thanks @pdurbin for this great breakdown. For my project, it's preferable to have the full title always (your Option 1), and even better if it also includes highlighting. However, I could also see our use case being satisfied with a larger cutoff for the title fragment such that the fragment usually included almost all of our titles. Our titles appear to be usually between 100-150 characters, so if that amount of title was shown (ideally with the matching text highlighted), that would be a fine solution for us as well. In any case, thank you for your responsiveness here! |
@garthg oh sure. We've been discussing this internally as well. Option 1 seems to be ahead but I plan to deploy branch to a test server so we can make sure we like it. For consistency, I'll remove highlighting from the names of dataverses and files as well. |
@pdurbin Great! Thanks for the update. |
I spoke briefly with @eaquigley about how @scolapasta and I were planning on merging the "Option 1: Always show the full title with no highlighting" commit ( 40697f9 ) into the 4.0.2 branch but I'm going to wait until we've had a chance to talk more. |
That makes sense. Thanks for the update! On Tue, Jun 30, 2015 at 4:28 PM, Philip Durbin notifications@github.com
|
@eaquigley @mheppler and I met this morning to discuss this bug as well as #537 which is related. I took some notes in a Google doc: https://docs.google.com/document/d/1p8zXIbzlACxfFhumkZN0_niyOM7V5LR8j_x3cOxkboE/edit?usp=sharing We decided to try option 2 after all, after playing around with a "frag size" of 320. I made a build (number 21) and deployed it to dvn-build and dataverse-internal, where I also set the frag size to 320. I also documented Passing to QA. |
Hi @pdurbin and @eaquigley , I just wanted to say that I'm in favor of "option 2" as discussed, and I'm glad to hear that we're moving in that direction. Garth |
@garthg cool. Thanks. Meanwhile, I've been playing around Solr trying to understand some strange fragsize behavior I was demonstrating to @eaquigley and @mheppler yesterday. "The size, in characters, of the snippets (aka fragments) created by the highlighter" is what fragsize means according to https://wiki.apache.org/solr/HighlightingParameters#hl.fragsize . Setting fragsize to 0 should mean "the whole field value should be used with no fragmenting". This works fine. 100 is the default so we see
... and when I bump fragsize to 110 I get fewer characters, only 10 for Here are the curl commands I'm using:
I should probably report this on the Solr mailing list but using the example data that ships with Solr. |
Yeah, not hard to reproduce with the sample data from Solr. I just emailed the solr list about it: http://lucene.472066.n3.nabble.com/unexpected-hl-fragsize-behavior-td4216356.html |
my search for titles are returning full title results. |
I'm still seeing this bug in production if I scroll halfway down the page at https://dataverse.harvard.edu/dataverse/antislaverypetitionsma?q=garrison My guess is that we need to set the SearchHighlightFragmentSize per http://guides.dataverse.org/en/4.1/installation/installation-main.html#searchhighlightfragmentsize
|
OK, ran the update and full titles are now showing for this test case. Closing. |
Hi,
When searching in a Dataverse, the search results are shown in a list in the right-hand pane. It appears that each result is having its title truncated sometimes, which makes the results unclear. For us it would be preferable if the full title was always shown. See attached screenshot for an example.
Thanks,
Garth
The text was updated successfully, but these errors were encountered: