Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postings Highlighter does not highlight trailing wildcard matches #5127

Closed
roytmana opened this issue Feb 14, 2014 · 9 comments · Fixed by #5143
Closed

Postings Highlighter does not highlight trailing wildcard matches #5127

roytmana opened this issue Feb 14, 2014 · 9 comments · Fixed by #5143

Comments

@roytmana
Copy link

In 1.0 Postings highlighter does not highlight trailing wildcard matches. I tried with both simple_query_string and query_string and things like photo* does not get highlighted

@javanna
Copy link
Member

javanna commented Feb 14, 2014

Hi @roytmana ,
do you mean that highlighting was working fine with the same query in 0.90? Can you post a recreation please?

@roytmana
Copy link
Author

I already migrated everything including index metadata to 1.0 so I can't
confirm with 100% certainty it was working in 0.90 but if you recall you
and I were working on exactly the same issue a while ago and I believe it
was fixed.

Before I start working on a recreation (need to put it together from
scratch) Do you think that it should NOT work by design?

On Fri, Feb 14, 2014 at 1:29 PM, Luca Cavanna notifications@github.comwrote:

Hi @roytmana https://github.com/roytmana ,
do you mean that highlighting was working fine with the same query in
0.90? Can you post a recreation please?

Reply to this email directly or view it on GitHubhttps://github.com//issues/5127#issuecomment-35110751
.

@s1monw
Copy link
Contributor

s1monw commented Feb 14, 2014

wait that has just been implemented in lucene - I don't think we have support for MTQ in postings highlighter yet? This is coming with Lucene 4.7

@javanna
Copy link
Member

javanna commented Feb 14, 2014

@roytmana I'm asking because I do remember I worked on this and we didn't touch anything in 1.0, thus I expect it to work on both 0.90 and 1.0. We also have tests for this which are green all the time.

@s1monw we have our own custom postings highlighter, to which we added support for wildcards a while ago. Once lucene 4.7 is released I'll have a look at this again though ;)

@javanna javanna self-assigned this Feb 14, 2014
@roytmana
Copy link
Author

@javanna let me create a recreation and test with it explicitly specifying highlighter in query. maybe something else has changed. I will post it shortly

@roytmana
Copy link
Author

It does not work. Here is a recreation (note I could not test actual curl as it does not take json on windows so I used different tools so excuse me if the curl syntax is broken )

curl  -XDELETE http://localhost:9200/test

curl  -XPOST http://localhost:9200/test -d '{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "ht": {
      "dynamic": "strict",
      "properties": {
        "name": {
          "type": "string",
          "index_options": "offsets"
        }
      }
    }
  }
}'

curl  -XPOST http://localhost:9200/test/ht -d '{"name":"photo equipment"}'
curl  -XPOST http://localhost:9200/test/ht -d '{"name":"photography"}'

curl -XPOST "http://localhost:8680/ec-search/test/ht/_search" -d'
{
   "query": {
      "bool": {
         "should": [
            {
               "simple_query_string": {
                  "fields": [
                     "_all"
                  ],
                  "query": "photo"
               }
            }
         ]
      }
   },
   "highlight": {
      "fields": {
         "name": {
            "type": "postings"
         }
      }
   }
}'

curl -XPOST "http://localhost:8680/ec-search/test/ht/_search" -d'
{
   "query": {
      "bool": {
         "should": [
            {
               "simple_query_string": {
                  "fields": [
                     "_all"
                  ],
                  "query": "photo*"
               }
            }
         ]
      }
   },
   "highlight": {
      "fields": {
         "name": {
            "type": "postings"
         }
      }
   }
}'

First (no wildcard) query returned highlight

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.625,
      "hits": [
         {
            "_index": "test",
            "_type": "ht",
            "_id": "XU_c0rhUSBiu2KfVPjP-sg",
            "_score": 0.625,
            "_source": {
               "name": "photo equipment"
            },
            "highlight": {
               "name": [
                  "<em>photo</em> equipment"
               ]
            }
         }
      ]
   }
}

second did not:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test",
            "_type": "ht",
            "_id": "XU_c0rhUSBiu2KfVPjP-sg",
            "_score": 1,
            "_source": {
               "name": "photo equipment"
            }
         },
         {
            "_index": "test",
            "_type": "ht",
            "_id": "tJOe5F7kQJiSMjOvTNPwpg",
            "_score": 1,
            "_source": {
               "name": "photography"
            }
         }
      ]
   }
}

@roytmana
Copy link
Author

Another observation that is not directly related to the wildcards issue.

When query is done on the _all field while highlighting is done on specific fields contributing to all, it works well when the fields are of string type. When the fields are numeric or date there will be no highlighting. However highlighting is done if searching on those numeric/date fields individually

I guess it can't be helped due to field type loss in _all? But if it did work it would have been really great.

Please let me know if I should create a ticket for it or not

@javanna
Copy link
Member

javanna commented Feb 17, 2014

Hi @roytmana thanks for the recreation, I'm looking into this.
The problem is the same against both 0.90 and 1.0, wildcards do work but only when they are in the top-level query :) and not within compound queries. The fact that you query a specific type makes it a filtered query, which triggers this issue.

javanna added a commit that referenced this issue Feb 21, 2014
…level queries

In #4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.

The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.

Closes #5127
javanna added a commit that referenced this issue Feb 21, 2014
…level queries

In #4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.

The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.

Closes #5127
javanna added a commit that referenced this issue Feb 21, 2014
…level queries

In #4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.

The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.

Closes #5127
@javanna
Copy link
Member

javanna commented Feb 21, 2014

This was solved in #5143.

@javanna javanna closed this as completed Feb 21, 2014
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…level queries

In elastic#4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.

The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.

Closes elastic#5127
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…level queries

In elastic#4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.

The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.

Closes elastic#5127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants