Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 168 lines (158 sloc) 7.228 kb
b7f2e08 David Persson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
1 <?php
2 /**
3 * Lithium: the most rad php framework
4 *
a6c9b3c David Persson Fixing header typo. What happened here, dear vim?
davidpersson authored
5 * @copyright Copyright 2012, Union of RAD (http://union-of-rad.org)
b7f2e08 David Persson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
6 * @license http://opensource.org/licenses/bsd-license.php The BSD License
7 */
8 namespace lithium\g11n;
9 use lithium\core\Libraries;
10
11 /**
35f25d5 David Persson Rewording `Multibyte` class docblock.
davidpersson authored
12 * The `Multibyte` class helps operating with UTF-8 encoded strings. Here
7b42233 David Persson Rewording `Multibyte` class docblock.
davidpersson authored
13 * multibyte is synonymous to UTF-8 which is probably the most widespread
14 * multibyte encoding in recent web application development.
b7f2e08 David Persson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
15 *
7b42233 David Persson Rewording `Multibyte` class docblock.
davidpersson authored
16 * Over time - as the importance of multibyte encoding support grew - a variety
17 * of extensions appeared. While each achieves its goal somewhat differently
18 * and might be preferred over the other, they still all do that one thing.
19 *
20 * What can a framework provide, those extensions aren't? It can provide
21 * abstractions that allow portable code. While this might not be a requirement
22 * for application code, it's a definite must for the framework's core code.
23 *
24 * As previously mentioned extensions appeared in a semi-evolutionary way. This
25 * leaves us with the situation where extensions are heterogeneously spread out
26 * over environments. There certainly is no clear winner and we're left with
27 * the situation of "supporting them all".
b7f2e08 David Persson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
28 *
7b42233 David Persson Rewording `Multibyte` class docblock.
davidpersson authored
29 * Technically this class does very little in terms of abstraction. Its main
30 * purpose is to allow adapting to changing environments: virtually creating
31 * something you can rely on, something that's always there while it actually
32 * is there only in one way or the other. And - yes - some convenience methods
33 * are also on board.
b7f2e08 David Persson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
34 */
35 class Multibyte extends \lithium\core\Adaptable {
36
37 /**
b38b11f Nate Abele Fixing `g11n\Multibyte` to prevent configuration from leaking globally.
nateabele authored
38 * Contains adapter configurations for `Multibyte` adapters.
39 *
40 * @var array
41 */
42 protected static $_configurations = array();
43
44 /**
b7f2e08 David Persson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
45 * `Libraries::locate()`-compatible path to adapters for this class.
46 *
47 * @see lithium\core\Libraries::locate()
48 * @var string Dot-delimited path.
49 */
50 protected static $_adapters = 'adapter.g11n.multibyte';
51
52 /**
53 * Checks if a given string is UTF-8 encoded and is valid UTF-8.
54 *
55 * In _quick_ mode it will check only for non ASCII characters being used
56 * indicating any multibyte encoding. Don't use quick mode for integrity
57 * validation of UTF-8 encoded strings.
58 *
59 * @link http://www.w3.org/International/questions/qa-forms-utf-8.en
60 * @param string $string The string to analyze.
61 * @param array $options Allows to toggle mode via the `'quick'` option, defaults to `false`.
62 * @return boolean Returns `true` if the string is UTF-8.
63 */
64 public static function is($string, array $options = array()) {
65 $defaults = array('quick' => false);
66 $options += $defaults;
67
68 if ($options['quick']) {
69 $regex = '/[^\x09\x0A\x0D\x20-\x7E]/m';
70 } else {
71 $regex = '/\A(';
72 $regex .= '[\x09\x0A\x0D\x20-\x7E]'; // ASCII
73 $regex .= '|[\xC2-\xDF][\x80-\xBF]'; // non-overlong 2-byte
74 $regex .= '|\xE0[\xA0-\xBF][\x80-\xBF]'; // excluding overlongs
75 $regex .= '|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}'; // straight 3-byte
76 $regex .= '|\xED[\x80-\x9F][\x80-\xBF]'; // excluding surrogates
77 $regex .= '|\xF0[\x90-\xBF][\x80-\xBF]{2}'; // planes 1-3
78 $regex .= '|[\xF1-\xF3][\x80-\xBF]{3}'; // planes 4-15
79 $regex .= '|\xF4[\x80-\x8F][\x80-\xBF]{2}'; // plane 16
80 $regex .= ')*\z/m';
81 }
82 return (boolean) preg_match($regex, $string);
83 }
84
85 /**
86 * Gets the string length. Multibyte enabled version of `strlen()`.
87 *
88 * @link http://php.net/manual/en/function.strlen.php
89 * @param string $string The string being measured for length.
90 * @param array $options Allows for selecting the adapter to use via the
91 * `name` options. Will use the `'default'` adapter by default.
92 * @return integer The length of the string on success.
93 */
94 public static function strlen($string, array $options = array()) {
95 $defaults = array('name' => 'default');
96 $options += $defaults;
97 return static::adapter($options['name'])->strlen($string);
98 }
0e1f413 David Persson Adding multibyte support for `strpos()`, `strrpos()` and `substr()`.
davidpersson authored
99
100 /**
101 * Finds the position of the _first_ occurrence of a string within a string.
102 * Multibyte enabled version of `strpos()`.
103 *
104 * Not all adapters must support interpreting - thus applying - passed
105 * numeric values as ordinal values of a character.
106 *
107 * @link http://php.net/manual/en/function.strpos.php
108 * @param string $haystack The string being checked.
109 * @param string $needle The string to find in the haystack.
110 * @param integer $offset If specified, search will start this number of
111 * characters counted from the beginning of the string. The
112 * offset cannot be negative.
113 * @param array $options Allows for selecting the adapter to use via the
114 * `name` options. Will use the `'default'` adapter by default.
115 * @return integer Returns the numeric position of the first occurrence of
116 * the needle in the haystack string. If needle is not found,
117 * it returns `false`.
118 */
119 public static function strpos($haystack, $needle, $offset = 0, array $options = array()) {
120 $defaults = array('name' => 'default');
121 $options += $defaults;
122 return static::adapter($options['name'])->strpos($haystack, $needle, $offset);
123 }
124
125 /**
126 * Finds the position of the _last_ occurrence of a string within a string.
127 * Multibyte enabled version of `strrpos()`.
128 *
129 * Not all adapters must support interpreting - thus applying - passed
130 * numeric values as ordinal values of a character. The `Iconv` adapter
131 * doesn't support an offset as `strpos()` does - this constitutes the
132 * lowest common denominator here.
133 *
134 * @link http://php.net/manual/en/function.strrpos.php
135 * @param string $haystack The string being checked.
136 * @param string $needle The string to find in the haystack.
137 * @param array $options Allows for selecting the adapter to use via the
138 * `name` options. Will use the `'default'` adapter by default.
139 * @return integer Returns the numeric position of the last occurrence of
140 * the needle in the haystack string. If needle is not found,
141 * it returns `false`.
142 */
143 public static function strrpos($haystack, $needle, array $options = array()) {
144 $defaults = array('name' => 'default');
145 $options += $defaults;
146 return static::adapter($options['name'])->strrpos($haystack, $needle);
147 }
148
149 /**
150 * Returns the portion of string specified by the start and length parameters.
151 * Multibyte enabled version of `substr()`.
152 *
153 * @link http://php.net/manual/en/function.substr.php
154 * @param string $string The string to extract the substring from.
155 * @param integer $start Position of first character in string (offset).
156 * @param integer $length Maximum numbers of characters to use from string.
157 * @param array $options Allows for selecting the adapter to use via the
158 * `name` options. Will use the `'default'` adapter by default.
159 * @return string The substring extracted from given string.
160 */
161 public static function substr($string, $start, $length = null, array $options = array()) {
162 $defaults = array('name' => 'default');
163 $options += $defaults;
164 return static::adapter($options['name'])->substr($string, $start, $length);
165 }
b7f2e08 David Persson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
166 }
167
168 ?>
Something went wrong with that request. Please try again.