forked from mozilla-conduit/phabricator
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathphp_pitfalls.diviner
329 lines (250 loc) · 9.23 KB
/
php_pitfalls.diviner
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
@title PHP Pitfalls
@group php
This document discusses difficult traps and pitfalls in PHP, and how to avoid,
work around, or at least understand them.
= `array_merge()` in Incredibly Slow When Merging A List of Arrays =
If you merge a list of arrays like this:
COUNTEREXAMPLE, lang=php
$result = array();
foreach ($list_of_lists as $one_list) {
$result = array_merge($result, $one_list);
}
...your program now has a huge runtime because it generates a large number of
intermediate arrays and copies every element it has previously seen each time
you iterate.
In a libphutil environment, you can use @{function@arcanist:array_mergev}
instead.
= `var_export()` Hates Baby Animals =
If you try to `var_export()` an object that contains recursive references, your
program will terminate. You have no chance to intercept or react to this or
otherwise stop it from happening. Avoid `var_export()` unless you are certain
you have only simple data. You can use `print_r()` or `var_dump()` to display
complex variables safely.
= `isset()`, `empty()` and Truthiness =
A value is "truthy" if it evaluates to true in an `if` clause:
lang=php
$value = something();
if ($value) {
// Value is truthy.
}
If a value is not truthy, it is "falsey". These values are falsey in PHP:
null // null
0 // integer
0.0 // float
"0" // string
"" // empty string
false // boolean
array() // empty array
Disregarding some bizarre edge cases, all other values are truthy. Note that
because "0" is falsey, this sort of thing (intended to prevent users from making
empty comments) is wrong in PHP:
COUNTEREXAMPLE
if ($comment_text) {
make_comment($comment_text);
}
This is wrong because it prevents users from making the comment "0". //THIS
COMMENT IS TOTALLY AWESOME AND I MAKE IT ALL THE TIME SO YOU HAD BETTER NOT
BREAK IT!!!// A better test is probably `strlen()`.
In addition to truth tests with `if`, PHP has two special truthiness operators
which look like functions but aren't: `empty()` and `isset()`. These operators
help deal with undeclared variables.
In PHP, there are two major cases where you get undeclared variables -- either
you directly use a variable without declaring it:
COUNTEREXAMPLE, lang=php
function f() {
if ($not_declared) {
// ...
}
}
...or you index into an array with an index which may not exist:
COUNTEREXAMPLE
function f(array $mystery) {
if ($mystery['stuff']) {
// ...
}
}
When you do either of these, PHP issues a warning. Avoid these warnings by
using `empty()` and `isset()` to do tests that are safe to apply to undeclared
variables.
`empty()` evaluates truthiness exactly opposite of `if()`. `isset()` returns
`true` for everything except `null`. This is the truth table:
| Value | `if()` | `empty()` | `isset()` |
|-------|--------|-----------|-----------|
| `null` | `false` | `true` | `false` |
| `0` | `false` | `true` | `true` |
| `0.0` | `false` | `true` | `true` |
| `"0"` | `false` | `true` | `true` |
| `""` | `false` | `true` | `true` |
| `false` | `false` | `true` | `true` |
| `array()` | `false` | `true` | `true` |
| Everything else | `true` | `false` | `true` |
The value of these operators is that they accept undeclared variables and do
not issue a warning. Specifically, if you try to do this you get a warning:
```lang=php, COUNTEREXAMPLE
if ($not_previously_declared) { // PHP Notice: Undefined variable!
// ...
}
```
But these are fine:
```lang=php
if (empty($not_previously_declared)) { // No notice, returns true.
// ...
}
if (isset($not_previously_declared)) { // No notice, returns false.
// ...
}
```
So, `isset()` really means
`is_declared_and_is_set_to_something_other_than_null()`. `empty()` really means
`is_falsey_or_is_not_declared()`. Thus:
- If a variable is known to exist, test falsiness with `if (!$v)`, not
`empty()`. In particular, test for empty arrays with `if (!$array)`. There
is no reason to ever use `empty()` on a declared variable.
- When you use `isset()` on an array key, like `isset($array['key'])`, it
will evaluate to "false" if the key exists but has the value `null`! Test
for index existence with `array_key_exists()`.
Put another way, use `isset()` if you want to type `if ($value !== null)` but
are testing something that may not be declared. Use `empty()` if you want to
type `if (!$value)` but you are testing something that may not be declared.
= usort(), uksort(), and uasort() are Slow =
This family of functions is often extremely slow for large datasets. You should
avoid them if at all possible. Instead, build an array which contains surrogate
keys that are naturally sortable with a function that uses native comparison
(e.g., `sort()`, `asort()`, `ksort()`, or `natcasesort()`). Sort this array
instead, and use it to reorder the original array.
In a libphutil environment, you can often do this easily with
@{function@arcanist:isort} or @{function@arcanist:msort}.
= `array_intersect()` and `array_diff()` are Also Slow =
These functions are much slower for even moderately large inputs than
`array_intersect_key()` and `array_diff_key()`, because they can not make the
assumption that their inputs are unique scalars as the `key` varieties can.
Strongly prefer the `key` varieties.
= `array_uintersect()` and `array_udiff()` are Definitely Slow Too =
These functions have the problems of both the `usort()` family and the
`array_diff()` family. Avoid them.
= `foreach()` Does Not Create Scope =
Variables survive outside of the scope of `foreach()`. More problematically,
references survive outside of the scope of `foreach()`. This code mutates
`$array` because the reference leaks from the first loop to the second:
```lang=php, COUNTEREXAMPLE
$array = range(1, 3);
echo implode(',', $array); // Outputs '1,2,3'
foreach ($array as &$value) {}
echo implode(',', $array); // Outputs '1,2,3'
foreach ($array as $value) {}
echo implode(',', $array); // Outputs '1,2,2'
```
The easiest way to avoid this is to avoid using foreach-by-reference. If you do
use it, unset the reference after the loop:
```lang=php
foreach ($array as &$value) {
// ...
}
unset($value);
```
= `unserialize()` is Incredibly Slow on Large Datasets =
The performance of `unserialize()` is nonlinear in the number of zvals you
unserialize, roughly `O(N^2)`.
| zvals | Approximate time |
|-------|------------------|
| 10000 |5ms |
| 100000 | 85ms |
| 1000000 | 8,000ms |
| 10000000 | 72 billion years |
= `call_user_func()` Breaks References =
If you use `call_use_func()` to invoke a function which takes parameters by
reference, the variables you pass in will have their references broken and will
emerge unmodified. That is, if you have a function that takes references:
```lang=php
function add_one(&$v) {
$v++;
}
```
...and you call it with `call_user_func()`:
```lang=php, COUNTEREXAMPLE
$x = 41;
call_user_func('add_one', $x);
```
...`$x` will not be modified. The solution is to use `call_user_func_array()`
and wrap the reference in an array:
```lang=php
$x = 41;
call_user_func_array(
'add_one',
array(&$x)); // Note '&$x'!
```
This will work as expected.
= You Can't Throw From `__toString()` =
If you throw from `__toString()`, your program will terminate uselessly and you
won't get the exception.
= An Object Can Have Any Scalar as a Property =
Object properties are not limited to legal variable names:
```lang=php
$property = '!@#$%^&*()';
$obj->$property = 'zebra';
echo $obj->$property; // Outputs 'zebra'.
```
So, don't make assumptions about property names.
= There is an `(object)` Cast =
You can cast a dictionary into an object.
```lang=php
$obj = (object)array('flavor' => 'coconut');
echo $obj->flavor; // Outputs 'coconut'.
echo get_class($obj); // Outputs 'stdClass'.
```
This is occasionally useful, mostly to force an object to become a Javascript
dictionary (vs a list) when passed to `json_encode()`.
= Invoking `new` With an Argument Vector is Really Hard =
If you have some `$class_name` and some `$argv` of constructor arguments
and you want to do this:
```lang=php
new $class_name($argv[0], $argv[1], ...);
```
...you'll probably invent a very interesting, very novel solution that is very
wrong. In a libphutil environment, solve this problem with
@{function@arcanist:newv}. Elsewhere, copy `newv()`'s implementation.
= Equality is not Transitive =
This isn't terribly surprising since equality isn't transitive in a lot of
languages, but the `==` operator is not transitive:
```lang=php
$a = ''; $b = 0; $c = '0a';
$a == $b; // true
$b == $c; // true
$c == $a; // false!
```
When either operand is an integer, the other operand is cast to an integer
before comparison. Avoid this and similar pitfalls by using the `===` operator,
which is transitive.
= All 676 Letters in the Alphabet =
This doesn't do what you'd expect it to do in C:
```lang=php
for ($c = 'a'; $c <= 'z'; $c++) {
// ...
}
```
This is because the successor to `z` is `aa`, which is "less than" `z`.
The loop will run for ~700 iterations until it reaches `zz` and terminates.
That is, `$c` will take on these values:
```
a
b
...
y
z
aa // loop continues because 'aa' <= 'z'
ab
...
mf
mg
...
zw
zx
zy
zz // loop now terminates because 'zz' > 'z'
```
Instead, use this loop:
```lang=php
foreach (range('a', 'z') as $c) {
// ...
}
```