Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 168 lines (158 sloc) 7.228 kB
b7f2e08 @davidpersson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
1 <?php
2 /**
3 * Lithium: the most rad php framework
4 *
a6c9b3c @davidpersson Fixing header typo. What happened here, dear vim?
davidpersson authored
5 * @copyright Copyright 2012, Union of RAD (http://union-of-rad.org)
b7f2e08 @davidpersson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
6 * @license http://opensource.org/licenses/bsd-license.php The BSD License
7 */
8 namespace lithium\g11n;
9 use lithium\core\Libraries;
10
11 /**
35f25d5 @davidpersson Rewording `Multibyte` class docblock.
davidpersson authored
12 * The `Multibyte` class helps operating with UTF-8 encoded strings. Here
7b42233 @davidpersson Rewording `Multibyte` class docblock.
davidpersson authored
13 * multibyte is synonymous to UTF-8 which is probably the most widespread
14 * multibyte encoding in recent web application development.
b7f2e08 @davidpersson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
15 *
7b42233 @davidpersson Rewording `Multibyte` class docblock.
davidpersson authored
16 * Over time - as the importance of multibyte encoding support grew - a variety
17 * of extensions appeared. While each achieves its goal somewhat differently
18 * and might be preferred over the other, they still all do that one thing.
19 *
20 * What can a framework provide, those extensions aren't? It can provide
21 * abstractions that allow portable code. While this might not be a requirement
22 * for application code, it's a definite must for the framework's core code.
23 *
24 * As previously mentioned extensions appeared in a semi-evolutionary way. This
25 * leaves us with the situation where extensions are heterogeneously spread out
26 * over environments. There certainly is no clear winner and we're left with
27 * the situation of "supporting them all".
b7f2e08 @davidpersson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
28 *
7b42233 @davidpersson Rewording `Multibyte` class docblock.
davidpersson authored
29 * Technically this class does very little in terms of abstraction. Its main
30 * purpose is to allow adapting to changing environments: virtually creating
31 * something you can rely on, something that's always there while it actually
32 * is there only in one way or the other. And - yes - some convenience methods
33 * are also on board.
b7f2e08 @davidpersson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
34 */
35 class Multibyte extends \lithium\core\Adaptable {
36
37 /**
b38b11f @nateabele Fixing `g11n\Multibyte` to prevent configuration from leaking globally.
nateabele authored
38 * Contains adapter configurations for `Multibyte` adapters.
39 *
40 * @var array
41 */
42 protected static $_configurations = array();
43
44 /**
b7f2e08 @davidpersson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
45 * `Libraries::locate()`-compatible path to adapters for this class.
46 *
47 * @see lithium\core\Libraries::locate()
48 * @var string Dot-delimited path.
49 */
50 protected static $_adapters = 'adapter.g11n.multibyte';
51
52 /**
53 * Checks if a given string is UTF-8 encoded and is valid UTF-8.
54 *
55 * In _quick_ mode it will check only for non ASCII characters being used
56 * indicating any multibyte encoding. Don't use quick mode for integrity
57 * validation of UTF-8 encoded strings.
58 *
59 * @link http://www.w3.org/International/questions/qa-forms-utf-8.en
60 * @param string $string The string to analyze.
61 * @param array $options Allows to toggle mode via the `'quick'` option, defaults to `false`.
62 * @return boolean Returns `true` if the string is UTF-8.
63 */
64 public static function is($string, array $options = array()) {
65 $defaults = array('quick' => false);
66 $options += $defaults;
67
68 if ($options['quick']) {
69 $regex = '/[^\x09\x0A\x0D\x20-\x7E]/m';
70 } else {
71 $regex = '/\A(';
72 $regex .= '[\x09\x0A\x0D\x20-\x7E]'; // ASCII
73 $regex .= '|[\xC2-\xDF][\x80-\xBF]'; // non-overlong 2-byte
74 $regex .= '|\xE0[\xA0-\xBF][\x80-\xBF]'; // excluding overlongs
75 $regex .= '|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}'; // straight 3-byte
76 $regex .= '|\xED[\x80-\x9F][\x80-\xBF]'; // excluding surrogates
77 $regex .= '|\xF0[\x90-\xBF][\x80-\xBF]{2}'; // planes 1-3
78 $regex .= '|[\xF1-\xF3][\x80-\xBF]{3}'; // planes 4-15
79 $regex .= '|\xF4[\x80-\x8F][\x80-\xBF]{2}'; // plane 16
80 $regex .= ')*\z/m';
81 }
82 return (boolean) preg_match($regex, $string);
83 }
84
85 /**
86 * Gets the string length. Multibyte enabled version of `strlen()`.
87 *
88 * @link http://php.net/manual/en/function.strlen.php
89 * @param string $string The string being measured for length.
90 * @param array $options Allows for selecting the adapter to use via the
91 * `name` options. Will use the `'default'` adapter by default.
92 * @return integer The length of the string on success.
93 */
94 public static function strlen($string, array $options = array()) {
95 $defaults = array('name' => 'default');
96 $options += $defaults;
97 return static::adapter($options['name'])->strlen($string);
98 }
0e1f413 @davidpersson Adding multibyte support for `strpos()`, `strrpos()` and `substr()`.
davidpersson authored
99
100 /**
101 * Finds the position of the _first_ occurrence of a string within a string.
102 * Multibyte enabled version of `strpos()`.
103 *
104 * Not all adapters must support interpreting - thus applying - passed
105 * numeric values as ordinal values of a character.
106 *
107 * @link http://php.net/manual/en/function.strpos.php
108 * @param string $haystack The string being checked.
109 * @param string $needle The string to find in the haystack.
110 * @param integer $offset If specified, search will start this number of
111 * characters counted from the beginning of the string. The
112 * offset cannot be negative.
113 * @param array $options Allows for selecting the adapter to use via the
114 * `name` options. Will use the `'default'` adapter by default.
115 * @return integer Returns the numeric position of the first occurrence of
116 * the needle in the haystack string. If needle is not found,
117 * it returns `false`.
118 */
119 public static function strpos($haystack, $needle, $offset = 0, array $options = array()) {
120 $defaults = array('name' => 'default');
121 $options += $defaults;
122 return static::adapter($options['name'])->strpos($haystack, $needle, $offset);
123 }
124
125 /**
126 * Finds the position of the _last_ occurrence of a string within a string.
127 * Multibyte enabled version of `strrpos()`.
128 *
129 * Not all adapters must support interpreting - thus applying - passed
130 * numeric values as ordinal values of a character. The `Iconv` adapter
131 * doesn't support an offset as `strpos()` does - this constitutes the
132 * lowest common denominator here.
133 *
134 * @link http://php.net/manual/en/function.strrpos.php
135 * @param string $haystack The string being checked.
136 * @param string $needle The string to find in the haystack.
137 * @param array $options Allows for selecting the adapter to use via the
138 * `name` options. Will use the `'default'` adapter by default.
139 * @return integer Returns the numeric position of the last occurrence of
140 * the needle in the haystack string. If needle is not found,
141 * it returns `false`.
142 */
143 public static function strrpos($haystack, $needle, array $options = array()) {
144 $defaults = array('name' => 'default');
145 $options += $defaults;
146 return static::adapter($options['name'])->strrpos($haystack, $needle);
147 }
148
149 /**
150 * Returns the portion of string specified by the start and length parameters.
151 * Multibyte enabled version of `substr()`.
152 *
153 * @link http://php.net/manual/en/function.substr.php
154 * @param string $string The string to extract the substring from.
155 * @param integer $start Position of first character in string (offset).
156 * @param integer $length Maximum numbers of characters to use from string.
157 * @param array $options Allows for selecting the adapter to use via the
158 * `name` options. Will use the `'default'` adapter by default.
159 * @return string The substring extracted from given string.
160 */
161 public static function substr($string, $start, $length = null, array $options = array()) {
162 $defaults = array('name' => 'default');
163 $options += $defaults;
164 return static::adapter($options['name'])->substr($string, $start, $length);
165 }
b7f2e08 @davidpersson Adding `Multibyte` class for working with UTF-8 encoded strings.
davidpersson authored
166 }
167
168 ?>
Something went wrong with that request. Please try again.