Full-text search not working with diacritic characters #29

Closed
alexandernst opened this Issue May 25, 2012 · 7 comments

Comments

Projects
None yet
2 participants
Owner

alexandernst commented May 25, 2012

The full-text search isn't working with diacritic characters (mean, no matches found in "Zaratán" when searching "Zaratan").
A possible solution is to replace those characters with the closes-looking ones and then search.
I already opened a feature request in underscore_string ( epeli/underscore.string#124 ).
@addyosmani What do you think about this one?

Owner

addyosmani commented May 25, 2012

How are you determining closest looking ones? We could do something like on http://stackoverflow.com/questions/863800/replacing-diacritics-in-javascript but my worry is the overhead it would bring. If however we include it as an optional thing you can opt into (default is opted out) we could implement something like this for search for sure.

Owner

alexandernst commented May 25, 2012

Hi, that's the link I posted in the feature request in underscore string :p
Yeah, that should be optional as I guess that's somehow not-that-cheap function.
"Closes-looking" means "ä" -> "a", "é" -> "e" and so on...
I know there are some characters that don't sound anything similar with and without those symbols over them, but that's how someone could search for them without writing for them.

The other option (or maybe mix those two) is to use some type of levinstein search (optional, of course).

Let's see if underscore string team wants to implement that, and if so, we'll add the library as optional requirement, so if diacritic search is enabled, the library should be provided too.
If they don't want to implement it, I'll see how/where should we put it.

Regards!

Owner

addyosmani commented May 27, 2012

That sounds reasonable. I think it might come down to performing some jsPerf tests on the algo we opt for vs levinstein (personally had good results with it before, but haven't tested with special characters).

Owner

alexandernst commented Jun 1, 2012

Update: epeli/underscore.string#124
Underscore.string creator won't (I think) merge both branches any time soon because of the reasons he says.
Maybe we should think about

a) write our own diacritic-chars-replace map
b) close this request as "wont-fix"

What do you think?

Owner

addyosmani commented Jun 1, 2012

So my response to this is:

  1. If you have the time to write a patch that is opt-in, I'd be happy to merge
  2. Otherwise lets just make a note in the wiki and say that the implementation on SO is worth checking out if someone wants to add this capability to their version of the paginator.

Make sense? :)

Owner

alexandernst commented Jun 1, 2012

I like the first option more, so I'll write an opt-in plugin-thingy :p
I hope I'll have it this weekend :)

Owner

addyosmani commented Jun 1, 2012

Excellent. Thats great. Thanks!

@addyosmani addyosmani added a commit that referenced this issue Jun 4, 2012

@addyosmani addyosmani Merge pull request #39 from alexandernst/master
Implement a plugin for issue #29
da8d2e7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment