Skip to content
This repository

Full unicode support #4

Closed
richard-willis opened this Issue · 2 comments

2 participants

Richard Willis Richard Willis
Richard Willis

We need to support Unicode character sets. Unfortunately Javascript doesn't make it easy to work with Unicode character sets, so we should consider using http://xregexp.com/.

After some initial testing, it appears the xRegExp library fits the job, but there are certain decisions we need to make before properly integrating it.

The spellchecker plugin currently relies on sending a string of words to the back-end service/s. This means we need to strip punctuation from the string of text. Determining what is punctuation is the difficult part.

The RegExp library gives us the \p{P} (unicode punctuation regexp category) but we don't want to strip punctuation that forms part of a word, eg "Here's". I can't even being to decide on how to handle this for languages other than English.

I suggest we have a look at existing spellcheckers to determine how others handle this to come to a decision. The solution has to be generic for all languages.

I'm making changes related to Unicode support the unicode-support branch.

Any advice or suggestions would be greatly appreciated.

Richard Willis
Owner

I've had to make some changes to the findAndReplaceDOMText library to support Unicode find and replace:

Richard Willis
Owner

I've merged 'unicode-support' branch into develop, as this feature seems to working pretty well now, and i have written some tests for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.