Skip to content
Newer
Older
100644 62 lines (48 sloc) 2.8 KB
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
1 Patchwork UTF-8
2 ===============
3
4 The `Patchwork\Utf8` class implements the quasi complete set of string functions
5 that need UTF-8 grapheme clusters awareness:
6
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
7 *strlen, substr, strpos, stripos, strrpos, strripos, strstr, stristr, strrchr,
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
8 strrichr, strtolower, strtoupper, htmlentities, htmlspecialchars, wordwrap, chr,
9 count_chars, ltrim, ord, rtrim, trim, html_entity_decode,
10 get_html_translation_table, str_ireplace, str_pad, str_shuffle, str_split,
11 str_word_count, strcmp, strnatcmp, strcasecmp, strnatcasecmp, strncasecmp,
12 strncmp, strcspn, strpbrk, strrev, strspn, strtr, substr_compare, substr_count,
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
13 substr_replace, ucfirst, lcfirst, ucwords*.
14 Missing are *printf*-family functions and *number_format*.
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
15
16 Some more functions are also provided to help handling UTF-8 strings:
17
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
18 - *isUtf8()*: checks if a string contains well formed UTF-8
19 - *toASCII()*: generic UTF-8 to ASCII transliteration
20 - *bestFit()*: UTF-8 to Code Page conversion using best fit mappings
21 - *strtocasefold()*: unicode transformation for caseless matching
22 - *strtonatfold()*: generic case sensitive transformation for collation matching
23 - *getGraphemeClusters()*: splits a string to an array of grapheme clusters
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
24
25 These functions are all static methods of the `Patchwork\Utf8` class. The best
26 way to use them is to add a `use Patchwork\Utf8 as u;` at the beginning of your
27 files, then when UTF-8 awareness is required, prefix by `u::` when calling them:
28 `echo strlen("déjà");` may become `echo u::strlen("déjà");` eg.
29
30 Portability
31 -----------
32
33 `Patchwork\Utf8` relies on the `mbstring`, `iconv` and `intl` PHP extensions.
34
35 When one or all of these extensions are missing, partial PHP fallback
36 implementations are provided:
37
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
38 - `mbstring`: *mb_convert_encoding, mb_decode_mimeheader, mb_encode_mimeheader,
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
39 mb_convert_case, mb_internal_encoding, mb_list_encodings, mb_strlen,
40 mb_strpos, mb_strrpos, mb_strtolower, mb_strtoupper, mb_substitute_character,
41 mb_substr, mb_stripos, mb_stristr, mb_strrchr, mb_strrichr, mb_strripos,
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
42 mb_strstr*.
43 - `iconv`: *iconv, iconv_mime_decode, iconv_mime_decode_headers,
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
44 iconv_get_encoding, iconv_set_encoding, iconv_mime_encode, ob_iconv_handler,
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
45 iconv_strlen, iconv_strpos, iconv_strrpos, iconv_substr*.
46 - `intl`: *Normalizer, grapheme_extract, grapheme_stripos, grapheme_stristr,
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
47 grapheme_strlen, grapheme_strpos, grapheme_strripos, grapheme_strrpos,
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
48 grapheme_strstr, grapheme_substr*.
aab2c75 @nicolas-grekas * README and cleanups
nicolas-grekas authored Jan 22, 2012
49
6fb5fb0 @nicolas-grekas * merge Mbstring500 and 520
nicolas-grekas authored Jan 22, 2012
50 Fallback implementations for `utf8_encode` and `utf8_decode` are also provided,
fb7c106 @nicolas-grekas * typo
nicolas-grekas authored Jan 27, 2012
51 *enhanced to Windows-1252 instead of ISO-8859-1*.
6fb5fb0 @nicolas-grekas * merge Mbstring500 and 520
nicolas-grekas authored Jan 22, 2012
52
93b4065 @nicolas-grekas + bootup.utf8.php
nicolas-grekas authored Jan 22, 2012
53 Usage
54 -----
55
56 Including the `bootup.utf8.php` file is the easiest way to enable these features
57 and configure PHP for a UTF-8 aware application.
58
59 This code is extracted from the [Patchwork](http://pa.tchwork.com/) framework.
60 It is released here standalone in the hope that it can be used in a different
61 context successfully!
Something went wrong with that request. Please try again.