In #4158 and pull request #4520 we are upgrading from Solr 4.6.0 to Solr 7.2.1 (the latest, as of this writing) and we're seeing some odd behavior in the Solr "highlighting" feature, which we use in Dataverse to show people which fields matched their query. For example, when searching for "brown bag" the results show "Filename Without Extension: Brown bag" in the search card:

If you add show_relevance=true to the Search API, you can see the matches there as well:

The example above is from Dataverse 4.8.4 running Solr 4.6.0.
As of a088d5e in the 4158-update-solr which uses Solr 4.7.1, we're seeing some unexpected highlighting behavior. I was wondering if highlighting in Solr 7 is broken or deprecated or completely different than Solr 4 so I took Dataverse out of the equation and use the "hello world" examples that ship with Solr to see if highlighting works or not. Highlighting seems to work just fine in both Solr 4.6.0 and Solr 4.7.1 when I use their stock config and examples. Here are my results:
Solr 4.6.0
cd solr-4.6.0/example
java -jar start.jar &
cd exampledocs
java -jar post.jar mp500.xml
curl -s 'http://localhost:8983/solr/collection1/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=printer' | jq '.highlighting'
{
"0579B002": {
"features": [
"Multifunction ink-jet color photo <em>printer</em>"
],
"cat": [
"<em>printer</em>"
],
"name": [
"Canon PIXMA MP500 All-In-One Photo <em>Printer</em>"
]
}
}
Solr 7.2.1
cd solr-7.2.1
bin/solr -e techproducts
curl -s 'http://localhost:8983/solr/techproducts/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=video' | jq '.highlighting'
{
"MA147LL/A": {
"name": [
"Apple 60 GB iPod with <em>Video</em> Playback Black"
],
"features": [
"Stores up to 15,000 songs, 25,000 photos, or 150 hours of <em>video</em>"
]
},
"EN7800GTX/2DHTV/256M": {
"features": [
"Dual DVI connectors, HDTV out, <em>video</em> input"
]
},
"100-435805": {
"name": [
"ATI Radeon X1900 XTX 512 MB PCIE <em>Video</em> Card"
]
}
}
From here we need to decide on the next steps.
Do we care that highlighting isn't working as well from Dataverse in pull request #4520?
We we want highlighting to work as it has previously, what are the next steps? I would say we should figure out how our custom config differs from the Solr "techproducts" example above.
In #4158 and pull request #4520 we are upgrading from Solr 4.6.0 to Solr 7.2.1 (the latest, as of this writing) and we're seeing some odd behavior in the Solr "highlighting" feature, which we use in Dataverse to show people which fields matched their query. For example, when searching for "brown bag" the results show "Filename Without Extension: Brown bag" in the search card:
If you add
show_relevance=trueto the Search API, you can see the matches there as well:The example above is from Dataverse 4.8.4 running Solr 4.6.0.
As of a088d5e in the
4158-update-solrwhich uses Solr 4.7.1, we're seeing some unexpected highlighting behavior. I was wondering if highlighting in Solr 7 is broken or deprecated or completely different than Solr 4 so I took Dataverse out of the equation and use the "hello world" examples that ship with Solr to see if highlighting works or not. Highlighting seems to work just fine in both Solr 4.6.0 and Solr 4.7.1 when I use their stock config and examples. Here are my results:Solr 4.6.0
curl -s 'http://localhost:8983/solr/collection1/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=printer' | jq '.highlighting'Solr 7.2.1
curl -s 'http://localhost:8983/solr/techproducts/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=video' | jq '.highlighting'From here we need to decide on the next steps.
Do we care that highlighting isn't working as well from Dataverse in pull request #4520?
We we want highlighting to work as it has previously, what are the next steps? I would say we should figure out how our custom config differs from the Solr "techproducts" example above.