Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit 36075f6

Browse files
author
epriestley
committed
Correct a prose diff behavior when prose pieces include newlines
Summary: See <https://discourse.phabricator-community.org/t/bad-regex-in-prose-diff-logic/3969>. The prose splitting rules normally guarantee that newlines appear only at the beginning or end of blocks. However, if a prose sentence ends with text like "...x\n.", we can end up with a newline inside a "sentence". If we do, the regular expression that breaks it into pieces will fail. Arguably, this is an error in how sentences are split apart (we might prefer to split this into two sentences, "x\n" and ".", rather than a single "x\n." sentence) but in the general case it's not unreasonable for blocks to contain newlines, so a simple fix is to make the pattern more robust. Test Plan: Added a failing test which includes this behavior, made it pass. Differential Revision: https://secure.phabricator.com/D21295
1 parent f686a0b commit 36075f6

File tree

2 files changed

+9
-1
lines changed

2 files changed

+9
-1
lines changed

src/infrastructure/diff/prose/PhutilProseDifferenceEngine.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ private function stitchPieces(array $pieces, $level) {
148148
// whitespace at the end.
149149

150150
$matches = null;
151-
preg_match('/^(\s*)(.*?)(\s*)\z/', $result, $matches);
151+
preg_match('/^(\s*)(.*?)(\s*)\z/s', $result, $matches);
152152

153153
if (strlen($matches[1])) {
154154
$results[] = $matches[1];

src/infrastructure/diff/prose/__tests__/PhutilProseDiffTestCase.php

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,14 @@ public function testProseDiffsDistance() {
3030
),
3131
pht('Remove Paragraph'));
3232

33+
$this->assertProseParts(
34+
'xxx',
35+
"xxxyyy\n.zzz",
36+
array(
37+
'= xxx',
38+
"+ yyy\n.zzz",
39+
),
40+
pht('Amend paragraph, and add paragraph starting with punctuation'));
3341

3442
// Without smoothing, the alogorithm identifies that "shark" and "cat"
3543
// both contain the letter "a" and tries to express this as a very

0 commit comments

Comments
 (0)