Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add FixerBlame class and one test #7865

Draft
wants to merge 53 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
54f56a1
Register hidden `WorkerCommand`
Wirone Jan 12, 2024
9410b9f
Configuration flow for parallel run
Wirone Jan 12, 2024
9319a3e
Extract factory method for linting aware file iterator
Wirone Jan 22, 2024
351f48c
Initial implementation of parallel analysis 🎉
Wirone Jan 25, 2024
a2bbbc2
Use sequential runner by default
Wirone Jan 27, 2024
3c91819
Display information about parallel runner being experimental feature
Wirone Jan 27, 2024
af14b86
Auto detection for parallel config
Wirone Jan 27, 2024
2769ad3
Use parallel runner with config auto-detection for the project itself
Wirone Jan 28, 2024
622e675
Pass analysis errors from worker to the main runner and report them t…
Wirone Jan 28, 2024
cc98df0
Use exact same PHP binary for worker process
Wirone Jan 29, 2024
d20f3aa
Remove quasi-parallel Composer script
Wirone Jan 29, 2024
aa27b5d
Document usage of parallel runner
Wirone Jan 29, 2024
be5a196
Allow only static factory for creating `Process`
Wirone Jan 30, 2024
0e60e18
Determine files count early from simple iterator
Wirone Jan 30, 2024
847a7b1
Revert BC break in `Runner` signature
Wirone Feb 1, 2024
d873e38
Improve type in `Error` constructor
Wirone Feb 1, 2024
396b5de
Fix test for fallback ParallelConfig
Wirone Feb 1, 2024
24c15b9
Fix CS
Wirone Feb 1, 2024
7e7e596
Explicitly require `react/stream`
Wirone Feb 7, 2024
c8b83a9
Introduce `DirectoryInterface::getAbsolutePath()`
Wirone Feb 7, 2024
60c72b1
Support cache in parallel analysis
Wirone Feb 7, 2024
dc2b8c1
Save cache info in parallel analysis only when it makes sense
Wirone Feb 7, 2024
7c8ecbe
Apply Julien's suggestions from code review
Wirone Feb 7, 2024
c889d18
Sync `--using-cache` description
Wirone Feb 7, 2024
a4fb45a
Use `InvalidArgumentException` for parallelisation misconfiguration +…
Wirone Feb 7, 2024
63e4dfc
Fix BC-break in `DirectoryInterface`
Wirone Feb 8, 2024
a566d41
Report file analysis result per file instead of per chunk
Wirone Feb 8, 2024
1f80310
Clean up parallel actions handling
Wirone Feb 8, 2024
f25abdc
Dirty workaround for running sequential runner in tests
Wirone Feb 12, 2024
0cf69be
Add most of the tests
Wirone Feb 13, 2024
7986cb8
Extract process creation to separate factory
Wirone Feb 14, 2024
cefdfae
Fix "$nextResult must not be accessed before initialization"
Wirone Feb 14, 2024
d54e66f
Fix tests on 7.4 with lowest deps
Wirone Feb 14, 2024
ddfb858
Test that `FixCommand` can be run with parallel runner
Wirone Feb 14, 2024
9ecdad4
Allow wider range of React packages
Wirone Feb 14, 2024
0ea17b6
Introduce `--sequential` option to `fix` command
Wirone Feb 15, 2024
f680ea2
More tests, more coverage
Wirone Feb 15, 2024
87172ca
Fix smoke tests
Wirone Feb 15, 2024
295c088
Fix parallelisation for PHAR usage
Wirone Feb 15, 2024
4a1aa7e
Add missing phpDoc for callable supported in `ProcessPool` constructor
Wirone Feb 15, 2024
e63922b
Add missing tests
Wirone Feb 15, 2024
424711c
Fix broken comment
Wirone Feb 16, 2024
88559e7
Add test for error manager in parallel run
Wirone Feb 16, 2024
c37cff3
Backward compatibility for `cs:fix:parallel` script
Wirone Feb 16, 2024
43e39e1
Test that callback is executed on server close
Wirone Feb 16, 2024
dff0884
Test server-worker communication
Wirone Feb 16, 2024
cac24b9
Add error handling
Wirone Mar 1, 2024
4e90bc3
Fix test file/class name
Wirone Mar 1, 2024
3dfe8b3
Add simple test for `PhpCsFixer\Runner\Parallel\Process` class
Wirone Mar 1, 2024
0764ef8
Remove superfluous assertion
Wirone Mar 1, 2024
0403f9b
Run `WorkerCommandTest::testWorkerCommunicatesWithTheServer` only on …
Wirone Mar 1, 2024
b2ebcce
Add various tests related to error handling
Wirone Mar 1, 2024
57bcf57
feat: add FixerBlame tool and one test
connorhu Mar 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions src/FixerBlame/CodeChange.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<?php
Copy link
Member

@keradus keradus Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid having all commits of Wirone in your PR, as they are other PR and making diff for your PR crazy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's an outcome of my comment, but I only asked for taking the parallel runner into consideration (make sure it will work), not to rebase on my work 😅. It could work if we had branches in the same repo, and a proper target branch was selected, but not like this. Hard to tell what is the best approach here, maybe a draft with a branch based on master (like before), but waiting until my PR is merged?

Copy link
Author

@connorhu connorhu Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to align it with PR #7777 so I rebaseled it to see how much conflict there was (fortunately I only had to resolve 2-3 things). @Wirone's work is the more polished, more important and is expected to be in the master sooner. As mentioned in the PR description, once the parallel feature is merged this can be rebased back to the master and then the commits for the parallel run will be removed. There is no point in starting the review until there are implementation (and performance) issues with the PR anyway.


declare(strict_types=1);

/*
* This file is part of PHP CS Fixer.
*
* (c) Fabien Potencier <fabien@symfony.com>
* Dariusz Rumiński <dariusz.ruminski@gmail.com>
*
* This source file is subject to the MIT license that is bundled
* with this source code in the file LICENSE.
*/

namespace PhpCsFixer\FixerBlame;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please describe approach you want to take in RFC style before implementing it. it will help reviewers to recognise what and how you want to do, and help you to align on direction before spending time on implementing it.

right now, I'm happy with udiff and don't seeing blame-named class is very confusing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems with the implementation at the moment is that, because of the way udiff works, I have to do a lot of manual work to track exactly which lines, which fixer, made what changes (which lines were removed and which lines were inserted). One sign of this is that I keep converting the token array back to text and running diff on it (and then trying to identify the separate patches in the diff output). The next logical step would be to see how much work it would be to modify udiff to be able to do its job on any data structure (in this case a Token[]). If this is a solvable problem, then 1) I would use less resources 2) I wouldn't have to deal with the problem of diffing the token array.

The theory works anyway, because it can accurately track multi-step transformations, but it's also obvious that in its current form it doesn't just do what it's actually supposed to do.

That's why I can't describe in any style what happens, because it doesn't happen the way I want to see it happen and 1000% I'll have to rewrite the blame class if udiff knows the comparison.

Copy link
Member

@keradus keradus Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I failed to understand the comment.

Please describe the problem you want to solve and describe the concept of solution, before you go deep into blame class.
Without that I fail to understand why you even want to tech udiff to understand Token[]

Copy link
Author

@connorhu connorhu Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem?
The problem is that for various outputs, the violations indicate incorrect line and character positions.

What is a possible solution to this problem?
a) remove the outputs or not report the line and character numbers
b) the fixers let the Tokens class know what they have done
c) keep track from fixer to fixer which fixer changes what

What is the problem with which approach?

a) while I'm all for something either working well or not existing at all, this is obviously not the answer
b) touching all fixers, modifying existing interfaces is a lot of work and an unwanted BC break
c) this might work if we can somehow track what happened between two fixer states, because then we only have to track the change within the Runner and don't have to go deeper, but it is inaccurate

What tools do we currently have to monitor changes?
Currently Sebastian's library Diff is available, which he basically created for texts or arrays of texts, and he told me that he didn't want to change that, which is acceptable.
With the implementation I have made, I currently only have the ability to track line level changes and in addition I always have to convert Tokens back to text to diff two states together.
This back-conversion currently means unwanted memory usage and unnecessary iterations, so it's not ideal yet. This is where I am at the moment:

  1. Sebastian's tool is given and cannot be changed
  2. It is not practical to import a new diff library just to use it in this one case
  3. Implementing my own diff just to keep track of token changes is a lot of work and logic that is only present because of one function.

The idea works as long as you don't put two violations in a line:

if ($variable !== false && $variable2 !== false)

That's two yoda style violations, but due to the limitations of diff I see this as one. So far I am not happy with the solution for this. As long as I am not satisfied with what my solution can do, there is no point in talking about how to integrate it nicely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I'm now wondering if just changing the Tokens class is enough to keep track of the changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine the case:

<?php
class A {
    public method zzz() {} ## line 2

    public method aaa() {
        var_dump("a" == "b"); ## line 5
    }
}
  • 1st fixer moves method zzz from top of file to bottom of file
  • 2nd fixer change content of method (eg == => ===)

which lines should be reported as modified?
imho 1st fixer: line 2 and 2nd fixer: line 5.
But when 2nd fixer is applied, line 5 is actually line 3.

  • i do not want to touch Tokens
  • i do not want to have own logic for tracking modified lines (Eg PatchInfo)

I suggest a new command/BC break that is not applying one fixer on top of another (like we need for fix), but one after another , separately, always against original file variant - that way, you can determine modified line (with current udiff approach) for original code.
That would work only for check and not for fix (for when we actually want to have those reports) and will make check blind for changes that are based on top of changes done by other rule (so check may detect less things that fix would do, because fix will apply changes on modified file variant)

Copy link
Author

@connorhu connorhu Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I achieved quite nice results on complex code with the solution I made ( https://github.com/connorhu/PHP-CS-Fixer/blob/57bcf57cde0cc5844070eb62ed38a5320085eb35/tests/Fixtures/Integration/misc/blame_test.test#L46-L56 ). The function will only work correctly if it finds the original position where the violation occurs, and now the code can do it (mostly). I'm between two extremes at the moment: in the Runner class I lose some details, and in the Tokens class I get too much detail about the changes. Anyway, the potential of the Tokens class is very promising, but the setCode method makes the changes untraceable (fortunately only 1 fixer uses it as far as I can see).

In the beginning I did as you wrote. I made a "complex" code, set several fixers and saved each step in a separate file, then diffed them. I wrote down what I wanted to see (what the logical end result would be) and iterated until I got what the logical end result is.

I suggest a new command...

I'll sleep on it and think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-03-18 at 22 59 53

pssst, this one already exists - check (just it's checking one after another and not independently vs original code - the line 5 vs line 3 topic I mentioned)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pssst, this one already exists - check (just it's checking one after another and not independently vs original code - the line 5 vs line 3 topic I mentioned)

I know, so I don't understand why the code you wrote is an issue. Fixing only fixes, checking only checks.
The "think about it" was to see what options are available if you don't want to modify the Tokens class and don't want to work on the diff result. For now it feels like a step backwards, but I have to think about it.

Your example code results in two violations:

  • method order at line 2 char 0
  • triple equal sign comparison operator at line 5 at char 0

In both cases, char 0 is still an incorrect result, but this is a side effect of the diff.

For me it's already an issue:

        var_dump("a" == "b", "c" == "b");

and

        var_dump("a" == "b");
        var_dump("c" == "b");

My solution takes these as one violation. Which is not ok.

I can see that you don't want to deal with such things, which you have the possibility to do anyway, by removing the output formats (or outsourcing them from the core code to a plugin system).


final class CodeChange
{
private string $content;
private int $change;
private ?int $newLineNumber;
private ?int $oldLineNumber;

public function __construct(string $content, int $change, ?int $newLineNumber = null, ?int $oldLineNumber = null)
{
$this->content = $content;
$this->change = $change;
$this->newLineNumber = $newLineNumber;
$this->oldLineNumber = $oldLineNumber;
}

public function getContent(): string
{
return $this->content;
}

public function getChange(): int
{
return $this->change;
}

public function getNewLineNumber(): ?int
{
return $this->newLineNumber;
}

public function getOldLineNumber(): ?int
{
return $this->oldLineNumber;
}
}
297 changes: 297 additions & 0 deletions src/FixerBlame/FixerBlame.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@
<?php

declare(strict_types=1);

/*
* This file is part of PHP CS Fixer.
*
* (c) Fabien Potencier <fabien@symfony.com>
* Dariusz Rumiński <dariusz.ruminski@gmail.com>
*
* This source file is subject to the MIT license that is bundled
* with this source code in the file LICENSE.
*/

namespace PhpCsFixer\FixerBlame;

use PhpCsFixer\Fixer\FixerInterface;
use PhpCsFixer\Tokenizer\Tokens;
use SebastianBergmann\Diff\Differ;
use SebastianBergmann\Diff\Output\StrictUnifiedDiffOutputBuilder;

final class FixerBlame
{
private Differ $differ;

/**
* @var array<array{
* fixerName: string,
* source: string
* }>
*/
private array $changeStack = [];

public function __construct()
{
$this->differ = new Differ(new StrictUnifiedDiffOutputBuilder([
'collapseRanges' => true,
'commonLineThreshold' => 1,
'contextLines' => 1,
'fromFile' => 'Original',
'toFile' => 'New',
]));
}

/**
* @param string|Tokens $code
*/
public function originalCode($code): void
{
if ($code instanceof Tokens) {
$code = $code->generateCode();
}

$this->changeStack = [
[
'fixerName' => '__initial__',
'source' => $code,
],
];
}

public function snapshotCode(string $fixerName, string $source): void
{
$this->changeStack[] = [
'fixerName' => $fixerName,
'source' => $source,
];
}

public function snapshotTokens(FixerInterface $fixer, Tokens $tokens): void
{
$this->changeStack[] = [
'fixerName' => $fixer->getName(),
'source' => $tokens->generateCode(),
];
}

/**
* @return list<FixerChange>
*/
public function calculateChanges(): array
{
$changes = [];

foreach ($this->changeStack as $changeIndex => $change) {
if (0 === $changeIndex) {
continue;
}

$oldChangeContent = $this->changeStack[$changeIndex - 1]['source'];
$newChangeContent = $change['source'];

$fixerName = $change['fixerName'];

$diffResults = $this->diff($oldChangeContent, $newChangeContent);
$patches = $this->findPatches($diffResults);

foreach ($patches as $patchInfo) {
$patchContent = $patchInfo->getPatchContent($diffResults);

$numberOfChanges = \count($patchContent);

// simple remove
if (1 === $numberOfChanges && Differ::REMOVED === $patchContent[0]->getChange()) {
$changes[] = [
'fixerName' => $fixerName,
'start' => $patchContent[0]->getOldLineNumber(),
'changedSum' => $patchInfo->getChangeSum(),
'changedAt' => 0,
];

continue;
}

// line changed
if (2 === $numberOfChanges && Differ::REMOVED === $patchContent[0]->getChange() && Differ::ADDED === $patchContent[1]->getChange()) {
$addedLine = $patchContent[1]->getContent();
$removedLine = $patchContent[0]->getContent();

$changedAt = null;

for ($i = 0; $i < min(\strlen($addedLine), \strlen($removedLine)); ++$i) {
if ($addedLine[$i] !== $removedLine[$i]) {
$changedAt = $i + 1;

break;
}
}

$changes[] = [
'fixerName' => $fixerName,
'start' => $patchContent[0]->getOldLineNumber(),
'changedSum' => $patchInfo->getChangeSum(),
'changedAt' => $changedAt ?? \strlen($removedLine) + 1,
];

continue;
}

$onlyRemove = 0x1;
$onlyAdd = 0x1;

foreach ($patchContent as $patchRow) {
if (Differ::ADDED === $patchRow->getChange()) {
$onlyAdd &= 0x1;
} else {
$onlyAdd &= 0;
}

if (Differ::REMOVED === $patchRow->getChange()) {
$onlyRemove &= 0x1;
} else {
$onlyRemove &= 0;
}
}

if (1 === $onlyAdd xor 1 === $onlyRemove) {
if (1 === $onlyAdd) {
$lineNumber = $patchContent[0]->getNewLineNumber();
} else {
$lineNumber = $patchContent[0]->getOldLineNumber();
}

$changes[] = [
'fixerName' => $fixerName,
'start' => $lineNumber,
'changedSum' => $patchInfo->getChangeSum(),
'changedAt' => 0,
];

continue;
}
if (Differ::ADDED === $patchContent[0]->getChange()) {
throw new \RuntimeException('added lines first?');
}

$changes[] = [
'fixerName' => $fixerName,
'start' => $patchContent[0]->getOldLineNumber(),
'changedSum' => $patchInfo->getChangeSum(),
'changedAt' => 0,
];
}
}

$changeSet = [];
foreach ($changes as $index => $change) {
$lineChanges = 0;
for ($i = $index - 1; $i >= 0; --$i) {
if ($changes[$i]['start'] >= $change['start']) {
continue;
}

$lineChanges -= $changes[$i]['changedSum'];
}

$changeSet[] = new FixerChange($change['fixerName'], $change['start'] + $lineChanges, $change['changedAt']);
}

return $changeSet;
}

/**
* @return array<CodeChange>
*/
private function diff(string $oldCode, string $newCode): array
{
$diffResults = $this->differ->diffToArray($oldCode, $newCode);

$linePointerInOldContent = 1;
$linePointerInNewContent = 1;

$buffer = [];
foreach ($diffResults as $diffResult) {
if (Differ::ADDED === $diffResult[1]) {
$buffer[] = new CodeChange($diffResult[0], Differ::ADDED, $linePointerInNewContent++);

continue;
}

if (Differ::REMOVED === $diffResult[1]) {
$buffer[] = new CodeChange($diffResult[0], Differ::REMOVED, null, $linePointerInOldContent++);

continue;
}

$buffer[] = new CodeChange($diffResult[0], Differ::OLD, $linePointerInNewContent++, $linePointerInOldContent++);
}

return $buffer;
}

/**
* @param array<CodeChange> $diffs
*
* @return array<PatchInfo>
*/
private function findPatches(array $diffs): array
{
/** @var array<PatchInfo> $patches */
$patches = [];
$patchInfo = null;
$state = 'file_start';

foreach ($diffs as $key => $diffResult) {
if ('file_start' === $state) {
if (Differ::OLD === $diffResult->getChange()) {
$state = 'between_patch';

continue;
}

if (Differ::ADDED === $diffResult->getChange() || Differ::REMOVED === $diffResult->getChange()) {
$patchInfo = new PatchInfo();
$patchInfo->setStartKey($key);
$patchInfo->countChange($diffResult->getChange());

$state = 'in_patch';

continue;
}
}

if ('between_patch' === $state && (Differ::ADDED === $diffResult->getChange() || Differ::REMOVED === $diffResult->getChange())) {
$patchInfo = new PatchInfo();
$patchInfo->setStartKey($key);
$patchInfo->countChange($diffResult->getChange());

$state = 'in_patch';

continue;
}

if ('in_patch' === $state && Differ::OLD === $diffResult->getChange()) {
$state = 'between_patch';

$patchInfo->setEndKey($key);
$patches[] = $patchInfo;
$patchInfo = null;

continue;
}

if ('in_patch' === $state) {
$patchInfo->countChange($diffResult->getChange());
}
}

if ('in_patch' === $state) {
$patchInfo->setEndKey(\count($diffs) - 1);
$patches[] = $patchInfo;
$patchInfo = null;
}

return $patches;
}
}
Loading
Loading