Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting Issue (v. 2.2.0) #12

Closed
dripp1 opened this issue Feb 1, 2016 · 17 comments
Closed

Highlighting Issue (v. 2.2.0) #12

dripp1 opened this issue Feb 1, 2016 · 17 comments

Comments

@dripp1
Copy link

dripp1 commented Feb 1, 2016

Thank you for the plugin, but I have an issue with highlighting of matched terms. I use version 2.2.0.
When I run the following query:

{
  "query": {
    "bool": {
      "should": {
        "multi_match": {
          "query": "chữ",
          "fields": [
            "message",
            "user"
          ],
          "analyzer": "default_search"
        }
      }
    }
  },
  "highlight": {
    "pre_tags": [
      "<b>"
    ],
    "post_tags": [
      "</b>"
    ],
    "fragment_size": 0,
    "number_of_fragments": 0,
    "require_field_match": false,
    "fields": {
      "message": {},
      "user": {}
    }
  }
}

I get the following result, where the highlighting offsets are wrong and they wrap the wrong words:

{
    "took" : 14,
    "timed_out" : false,
    "_shards" : {
        "total" : 200,
        "successful" : 200,
        "failed" : 0
    },
    "hits" : {
        "total" : 1,
        "max_score" : 0.030578919,
        "hits" : [{
                "_index" : "fts-vietnamese",
                "_type" : "Document",
                "_id" : "AVKcb6Xy0-uCokJzleqC",
                "_score" : 0.030578919,
                "_source" : {
                    "streamId" : 1,
                    "language" : "vietnamese",
                    "message" : "Có một vấn đề là khi sent text messages dùng tiếng Việt hoặc email qua người khác, chữ tiếng Việt bị mất dấu hoặc mất chữ. Chẳng hạn như chữ “ôm” thì thành",
                    "doc_id" : "VietnameseWords"
                },
                "highlight" : {
                    "message" : [
                        "Có một vấn đề là khi sent text messages dùng tiếng Việt hoặc email <b>qua</b> người khác, chữ tiếng V<b>iệt</b> bị mất dấu h<b>oặc</b> mất chữ. Chẳng hạn như chữ “ôm” thì thành"
                    ]
                }
            }
        ]
    }
}

Can you please advise whether this is a bug that you can fix or something that I can configure in my code.
Thank you!

@duydo
Copy link
Owner

duydo commented Feb 2, 2016

I'll have a look at it. Thanks for your reports.

@dripp1
Copy link
Author

dripp1 commented Feb 3, 2016

Thank you. Also - if you are at it - perhaps you can make a version compatible with Elastic Search 2.2.0 as the current version can only work with 2.1.1 and I get error:

ERROR: Plugin [elasticsearch-analysis-vietnamese] is incompatible with Elasticsearch [2.2.0]. Was designed for version [2.1.1]

Thanks!

@duydo
Copy link
Owner

duydo commented Feb 17, 2016

@dripp1 The plugin now supports ES 2.2.0

@nguyenchiencong
Copy link

👍

@dripp1
Copy link
Author

dripp1 commented Feb 23, 2016

Thank you!

@dripp1
Copy link
Author

dripp1 commented Apr 10, 2016

The plugin indeed now works on ES 2.2.0 and indexing is done, but the above highlighting issue still exists. It would be great if you could fix this, as it makes this important functionality unusable.
Thank you.

@dripp1 dripp1 changed the title Highlighting Issue (v. 2.1.1) Highlighting Issue (v. 2.2.0) Apr 18, 2016
@dripp1
Copy link
Author

dripp1 commented Apr 18, 2016

Hi @duydo - is there a chance that you will have time soon to look at this issue?
The plugin is really not of much use to us as long as this issue exists. Thanks in advance!

@duydo
Copy link
Owner

duydo commented Apr 19, 2016

@dripp1 I have not found any reason behinds this, it seems that it comes from the Elasticsearch itself. I suspect the highlight parser does not work with Vietnamese characters, I will contact with some guys at Elastic this week.

@dripp1
Copy link
Author

dripp1 commented Apr 19, 2016

Thank you very much, it will be great if a solution is found.

@dripp1
Copy link
Author

dripp1 commented May 2, 2016

Hi @duydo - did the Elastic people provide a solution to this?

@dripp1
Copy link
Author

dripp1 commented May 24, 2016

@duydo please see an answer from ES support for this question at https://discuss.elastic.co/t/highlighting-offset-issue-with-vietnamese/50421 - seems like they think that the problem is in the analyzer/tokenizer that produces wrong offsets. It would be great if you can fix this.

@duydo
Copy link
Owner

duydo commented May 26, 2016

I'm working on it, hope it can be fixed soon. Thanks @dripp1

@dripp1
Copy link
Author

dripp1 commented May 26, 2016

Awesome!

@dripp1
Copy link
Author

dripp1 commented Jun 30, 2016

@duydo - Any news on this?

@dripp1
Copy link
Author

dripp1 commented Jul 5, 2016

@duydo - any chance that you may have time to fix this issue? Without that - the plugin is pretty useless...

@duydo
Copy link
Owner

duydo commented Aug 22, 2016

@dripp1 The issue has been fixed.

@duydo duydo closed this as completed Aug 22, 2016
@dripp1
Copy link
Author

dripp1 commented Aug 23, 2016

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants