Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit fcb75d0

Browse files
author
epriestley
committed
Fix an issue where prose diffing may fail after hitting the PCRE backtracking limit
Summary: Fixes T13554. For certain prose diff inputs and PCRE backtracking limits, this regular expression may back track too often and fail. A characteristic input is "x x x x ...", i.e. many sequences where `(.*?)\s*\z` looks like it may be able to match but actually can not. I think writing an expression which has all the behavior we'd like without this backtracking issue isn't trivial (at least, I don't think I know how to do it offhand); just use a strategy based on "trim()" insetad, which avoids any PCRE complexities here. Test Plan: Locally, this passes the "x x x ..." test which the previous code failed. I'm not including that test because it won't reproduce across values of "pcre.backtrac_limit", PCRE versions, etc. Maniphest Tasks: T13554 Differential Revision: https://secure.phabricator.com/D21422
1 parent 8f9ba48 commit fcb75d0

File tree

2 files changed

+68
-16
lines changed

2 files changed

+68
-16
lines changed

src/infrastructure/diff/prose/PhutilProseDifferenceEngine.php

Lines changed: 35 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -142,22 +142,9 @@ private function stitchPieces(array $pieces, $level) {
142142
}
143143

144144
if ($level < 2) {
145-
// Split pieces into separate text and whitespace sections: make one
146-
// piece out of all the whitespace at the beginning, one piece out of
147-
// all the actual text in the middle, and one piece out of all the
148-
// whitespace at the end.
149-
150-
$matches = null;
151-
preg_match('/^(\s*)(.*?)(\s*)\z/s', $result, $matches);
152-
153-
if (strlen($matches[1])) {
154-
$results[] = $matches[1];
155-
}
156-
if (strlen($matches[2])) {
157-
$results[] = $matches[2];
158-
}
159-
if (strlen($matches[3])) {
160-
$results[] = $matches[3];
145+
$trimmed_pieces = $this->trimApart($result);
146+
foreach ($trimmed_pieces as $trimmed_piece) {
147+
$results[] = $trimmed_piece;
161148
}
162149
} else {
163150
$results[] = $result;
@@ -272,4 +259,36 @@ private function newDocumentEngineBlocks(array $parts) {
272259
return $blocks;
273260
}
274261

262+
public static function trimApart($input) {
263+
// Split pieces into separate text and whitespace sections: make one
264+
// piece out of all the whitespace at the beginning, one piece out of
265+
// all the actual text in the middle, and one piece out of all the
266+
// whitespace at the end.
267+
268+
$parts = array();
269+
270+
$length = strlen($input);
271+
272+
$corpus = ltrim($input);
273+
$l_length = strlen($corpus);
274+
if ($l_length !== $length) {
275+
$parts[] = substr($input, 0, $length - $l_length);
276+
}
277+
278+
$corpus = rtrim($corpus);
279+
$lr_length = strlen($corpus);
280+
281+
if ($lr_length) {
282+
$parts[] = $corpus;
283+
}
284+
285+
if ($lr_length !== $l_length) {
286+
// NOTE: This will be a negative value; we're slicing from the end of
287+
// the input string.
288+
$parts[] = substr($input, $lr_length - $l_length);
289+
}
290+
291+
return $parts;
292+
}
293+
275294
}

src/infrastructure/diff/prose/__tests__/PhutilProseDiffTestCase.php

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,39 @@
33
final class PhutilProseDiffTestCase
44
extends PhabricatorTestCase {
55

6+
public function testTrimApart() {
7+
$map = array(
8+
'' => array(),
9+
'a' => array('a'),
10+
' a ' => array(
11+
' ',
12+
'a',
13+
' ',
14+
),
15+
' a' => array(
16+
' ',
17+
'a',
18+
),
19+
'a ' => array(
20+
'a',
21+
' ',
22+
),
23+
' a b ' => array(
24+
' ',
25+
'a b',
26+
' ',
27+
),
28+
);
29+
30+
foreach ($map as $input => $expect) {
31+
$actual = PhutilProseDifferenceEngine::trimApart($input);
32+
$this->assertEqual(
33+
$expect,
34+
$actual,
35+
pht('Trim Apart: %s', $input));
36+
}
37+
}
38+
639
public function testProseDiffsDistance() {
740
$this->assertProseParts(
841
'',

0 commit comments

Comments
 (0)