Full unicode support #4

richard-willis opened this Issue Oct 22, 2012 · 2 comments

2 participants


We need to support Unicode character sets. Unfortunately Javascript doesn't make it easy to work with Unicode character sets, so we should consider using http://xregexp.com/.

After some initial testing, it appears the xRegExp library fits the job, but there are certain decisions we need to make before properly integrating it.

The spellchecker plugin currently relies on sending a string of words to the back-end service/s. This means we need to strip punctuation from the string of text. Determining what is punctuation is the difficult part.

The RegExp library gives us the \p{P} (unicode punctuation regexp category) but we don't want to strip punctuation that forms part of a word, eg "Here's". I can't even being to decide on how to handle this for languages other than English.

I suggest we have a look at existing spellcheckers to determine how others handle this to come to a decision. The solution has to be generic for all languages.

I'm making changes related to Unicode support the unicode-support branch.

Any advice or suggestions would be greatly appreciated.

@badsyntax badsyntax was assigned Oct 22, 2012

I've had to make some changes to the findAndReplaceDOMText library to support Unicode find and replace:

@badsyntax badsyntax added a commit that referenced this issue Oct 25, 2012
@badsyntax Added partial Unicode support using XRegExp. Lots to clean up, but th…
…ings are looking good. Refs #4
@badsyntax badsyntax added a commit that referenced this issue Oct 25, 2012
@badsyntax Updated examples: added russian examples; fixed google driver for uni…
…code character sets. Refs #4

I've merged 'unicode-support' branch into develop, as this feature seems to working pretty well now, and i have written some tests for it

@badsyntax badsyntax closed this Nov 1, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment