Skip to content

Solr highlighting #4557

@pdurbin

Description

@pdurbin

In #4158 and pull request #4520 we are upgrading from Solr 4.6.0 to Solr 7.2.1 (the latest, as of this writing) and we're seeing some odd behavior in the Solr "highlighting" feature, which we use in Dataverse to show people which fields matched their query. For example, when searching for "brown bag" the results show "Filename Without Extension: Brown bag" in the search card:

screen shot 2018-03-29 at 2 22 02 pm

If you add show_relevance=true to the Search API, you can see the matches there as well:

screen shot 2018-03-29 at 2 26 12 pm

The example above is from Dataverse 4.8.4 running Solr 4.6.0.

As of a088d5e in the 4158-update-solr which uses Solr 4.7.1, we're seeing some unexpected highlighting behavior. I was wondering if highlighting in Solr 7 is broken or deprecated or completely different than Solr 4 so I took Dataverse out of the equation and use the "hello world" examples that ship with Solr to see if highlighting works or not. Highlighting seems to work just fine in both Solr 4.6.0 and Solr 4.7.1 when I use their stock config and examples. Here are my results:

Solr 4.6.0

cd solr-4.6.0/example
java -jar start.jar &
cd exampledocs
java -jar post.jar mp500.xml 

curl -s 'http://localhost:8983/solr/collection1/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=printer' | jq '.highlighting'

{
  "0579B002": {
    "features": [
      "Multifunction ink-jet color photo <em>printer</em>"
    ],
    "cat": [
      "<em>printer</em>"
    ],
    "name": [
      "Canon PIXMA MP500 All-In-One Photo <em>Printer</em>"
    ]
  }
}

Solr 7.2.1

cd solr-7.2.1
bin/solr -e techproducts

curl -s 'http://localhost:8983/solr/techproducts/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=video' | jq '.highlighting'

{
  "MA147LL/A": {
    "name": [
      "Apple 60 GB iPod with <em>Video</em> Playback Black"
    ],
    "features": [
      "Stores up to 15,000 songs, 25,000 photos, or 150 hours of <em>video</em>"
    ]
  },
  "EN7800GTX/2DHTV/256M": {
    "features": [
      "Dual DVI connectors, HDTV out, <em>video</em> input"
    ]
  },
  "100-435805": {
    "name": [
      "ATI Radeon X1900 XTX 512 MB PCIE <em>Video</em> Card"
    ]
  }
}

From here we need to decide on the next steps.

Do we care that highlighting isn't working as well from Dataverse in pull request #4520?

We we want highlighting to work as it has previously, what are the next steps? I would say we should figure out how our custom config differs from the Solr "techproducts" example above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions