Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ This can cause problems when attempting to process Strings in order to detect ho
## Java and Unicode
Unfortunately, [Java also has a Unicode problem](https://docs.oracle.com/javase/tutorial/i18n/text/unicode.html)! - when the language was designed, the Unicode standard only used 16-bits to encode each character, and so the corresponding Java char data type was specified to have 16-bits as well. The Unicode standard has since been updated to add many more different characters, and more than 16 bits are required to represent them all. This means that we must [be careful when handling Strings that contain high-value characters](https://docs.oracle.com/javase/tutorial/i18n/text/design.html), we can't rely, for example, on the .length() method returning the correct number of characters in a String.

This [Java class provides a homoglyph-aware search function](https://github.com/codebox/homoglyph/blob/master/java/src/Homoglyph.java) that correctly handles high-value Unicode characters by using the int datatype to represent codepoint values.
This [Java class provides a homoglyph-aware search function](https://github.com/codebox/homoglyph/blob/master/src/main/java/net/codebox/homoglyph/Homoglyph.java) that correctly handles high-value Unicode characters by using the int datatype to represent codepoint values.