Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It'd be cool if the highlighter could combine fields #3750

Closed
nik9000 opened this issue Sep 20, 2013 · 12 comments
Closed

It'd be cool if the highlighter could combine fields #3750

nik9000 opened this issue Sep 20, 2013 · 12 comments

Comments

@nik9000
Copy link
Member

nik9000 commented Sep 20, 2013

So maybe I'm doing this wrong, but when I want to search across the same field with different analyzers run against it I use a multifield with the different analyzers setup in the mapping and then query the field who's analyzer I want to use. This ends up highlighting the field twice - once per subfield. It'd be cool if I could combine those highlight operations into one so the snippets would be combined together and sorted together.

This is an example of how I build the query: https://gist.github.com/nik9000/6638883

@javanna
Copy link
Member

javanna commented Sep 20, 2013

Interesting point @nik9000 ! Did you have the chance to play around with this already? How would we sort the snippets coming from different fields though? I guess score is the only option and depends on the specific scoring algo provided by the highlighter implementation. For instance, this wouldn't work using different highlighters per field (e.g. one field with plain and another one with plain). Also, I'd need to look deeper at how the snippets score is computed to make sure this would work cross-field.

@ghost ghost assigned javanna Sep 20, 2013
@nik9000
Copy link
Member Author

nik9000 commented Sep 20, 2013

Interesting point @nik9000 ! Did you have the chance to play around with this already?

Not really. I wanted to make sure that what I was doing in the query made sense. I'll have a look at some point soon though.

How would we sort the snippets coming from different fields though?

Probably score. I imagine you could use document order too. What is the default? I never use it.

this wouldn't work using different highlighters per field

Indeed. This would be an implementation feature strapped on to each highlighter. At least, that is how it makes sense to me.

I'd need to look deeper at how the snippets score is computed to make sure this would work cross-field.

Yeah.

@nik9000
Copy link
Member Author

nik9000 commented Sep 23, 2013

I've put together a proof of concept for this with the FVH which github seems to have referenced above. It is surprisingly well suited for this. I'm not sure about the plain highlighter though.

@nik9000
Copy link
Member Author

nik9000 commented Sep 23, 2013

For the FVH the POC works by adding matches from the child field during the matched term identification step then pretty much letting the FVH do its job from there on out.
For the Plain Highlighter the POC works by creating a merged token stream from the parent and child fields and feeding it to the highlighter to highlight. This currently only works with requireFieldMatch = false.

Both of these implementations look like they will work but I'm not 100% sure they don't cast too broad a net.

@nik9000
Copy link
Member Author

nik9000 commented Sep 30, 2013

I know it is a lot of work, but would it be possible for someone to have a look at the proof of concept? I'm happy spend some time implementing this but I'd love to know if I'm on the right track.

@jpountz
Copy link
Contributor

jpountz commented Oct 2, 2013

This feature makes a lot of sense to me, but I think the only way to do it correctly would be to fix the highlighters to support several index fields for a single stored field in Lucene. For example, the merged token stream trick can create broken token streams: token streams are supposed to have consistent positions and offsets (meaning that tokens that start at the same position must have the same start offset and tokens that end at the same position must have the same end offset) which won't be the case if the two fields don't use the same tokenizer.

@nik9000
Copy link
Member Author

nik9000 commented Oct 2, 2013

This feature makes a lot of sense to me, but I think the only way to do it correctly would be to fix the highlighters to support several index fields for a single stored field in Lucene. For example, the merged token stream trick can create broken token streams: token streams are supposed to have consistent positions and offsets (meaning that tokens that start at the same position must have the same start offset and tokens that end at the same position must have the same end offset) which won't be the case if the two fields don't use the same tokenizer.

That makes sense to me. The merged token stream thing felt more like a hack then the way the I had the FVH doing it. The FVH actually seems more suited to this then the plain highlighter given the way it separates its decision making.

I'll have another look at all this in a few hours and see what I can come up with.

For reference, is this worth doing if it is only possible to cleanly implement it in the FVH? It might turn out that the plain highlighter just isn't amenable to this without a more serious overhaul that might not be worth implementing.

@jpountz
Copy link
Contributor

jpountz commented Oct 3, 2013

For reference, is this worth doing if it is only possible to cleanly implement it in the FVH? It might turn out that the plain highlighter just isn't amenable to this without a more serious overhaul that might not be worth implementing.

I think it is ok if only fvh supports it.

@nik9000
Copy link
Member Author

nik9000 commented Oct 8, 2013

I had another look at this today. I was able to get the fvh working pretty well. I think the plain highlighter is a lost cause though - it just isn't written for this kind of thing.

I went ahead and plumbed a list of fields to combine. I called it child_fields because those fields ought to be children of the main highlighted field. I'm not in love with the name but that can be fixed.

I'm not sure exactly what to do when the user sets up a bogus child field - right now I don't check for it and let the fvh blindly try to do its thing which either leads to shard failures with string index out of bounds exceptions or funky highlighting. Not good.

Given that I'm likely going to give up on doing this in the plain highlighter I might have to revive #3757. I think I'll want this field combination in my multi-valued fields.

@ghost ghost assigned jpountz Oct 8, 2013
nik9000 added a commit to nik9000/elasticsearch that referenced this issue Nov 30, 2013
The Fast Vector Highlighter can combine matches on multiple fields to
highlight a single field using `matched_fields`.  This is most
intuitive for multifields that analyze the same string in different
ways.  Example:
{
    "query": {
        "query_string": {
            "query": "content.plain:running scissors",
            "fields": ["content"]
        }
    },
    "highlight": {
        "order": "score",
        "fields": {
            "content": {
                "matched_fields": ["content", "content.plain"],
                "type" : "fvh"
            }
        }
    }
}

Closes elastic#3750
@jpountz jpountz closed this as completed in 8e34057 Dec 3, 2013
jpountz pushed a commit that referenced this issue Dec 3, 2013
The Fast Vector Highlighter can combine matches on multiple fields to
highlight a single field using `matched_fields`.  This is most
intuitive for multifields that analyze the same string in different
ways.  Example:
{
    "query": {
        "query_string": {
            "query": "content.plain:running scissors",
            "fields": ["content"]
        }
    },
    "highlight": {
        "order": "score",
        "fields": {
            "content": {
                "matched_fields": ["content", "content.plain"],
                "type" : "fvh"
            }
        }
    }
}

Closes #3750
@clintongormley
Copy link

@nik9000 Did you have a look at how feasible it would be to support combining fields on the postings highlighter as well?

@nik9000
Copy link
Member Author

nik9000 commented Dec 17, 2013

@clintongormley I was thinking about it but I haven't had time to work on it. I'd like to, though.

@clintongormley
Copy link

i'd like you to as well :)

Go nik go!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
The Fast Vector Highlighter can combine matches on multiple fields to
highlight a single field using `matched_fields`.  This is most
intuitive for multifields that analyze the same string in different
ways.  Example:
{
    "query": {
        "query_string": {
            "query": "content.plain:running scissors",
            "fields": ["content"]
        }
    },
    "highlight": {
        "order": "score",
        "fields": {
            "content": {
                "matched_fields": ["content", "content.plain"],
                "type" : "fvh"
            }
        }
    }
}

Closes elastic#3750
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants