Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloc --git-diff-rel does not respond consistently when dealing with copy/rename #780

Closed
EnricoPicci opened this issue Oct 25, 2023 · 2 comments

Comments

@EnricoPicci
Copy link

Describe the bug
The command cloc --git-diff-rel --csv --by-file commit-sha^1 commit-sha does not always return the same number of records, i.e. the same number of differences.
In particular, differences which are marked as copy/rename by the git diff --numstat, sometimes are skipped by cloc --git-diff-rel and sometimes are listed as 2 differences.

cloc; OS; OS version

  • cloc version: 1.98
  • If running the cloc source, Perl version:
  • OS Linux:
  • OS version: 22.04

To Reproduce
I have tried to reproduce the issue on some public repos but I have not been able to reproduce it

Expected result
Every run of cloc --git-diff-rel --csv --by-file commit-sha^1 commit-sha should return the same amount of records.

Additional context
Let's say that git diff --numstat returns something like
1 1 src/{old-subdir=> new-subdir}/my-file.java
stating that my-file.java has been moved from old-subdir to new-subdir adding and removing one line of code.

what I get if I run the same comparison with cloc --git-diff-rel --csv --by-file what I get is any of these responses:

  • src/new-subdir/my-file.java has one line removes and one line changed, no mention to the file src/old-subdir/my-file.java
  • src/new-subdir/my-file.java has 100 lines added and src/old-subdir/my-file.java has 100 lines removes (assuming that the files contains 100 lines)

which one of the 2 answers I get is random.

@AlDanial
Copy link
Owner

This problem will be difficult (impossible?) to resolve without a way for me to reproduce it. For now I will close this issue but please reopen it if you find a public repo that can demonstrate the inconsistent output.

In the long term, I have a goal to implement in cloc the swith --git-diff-simindex which will attempt to follow renames via git's --find-renames option.

@EnricoPicci
Copy link
Author

EnricoPicci commented Nov 1, 2023

I think I have found a way to reproduce the problem I have described in the issue.
Clone the repo https://github.com/EnricoPicci/git-metrics.
Then run repeatedly the following command:
cloc --git-diff-rel --csv --by-file --timeout=10 --quiet 6fb8624bad8d62ee14da5c7a527c786b301f7529^1 6fb8624bad8d62ee14da5c7a527c786b301f7529

What I get on my machine (running Ubuntu 22.04.2) is either a list of 11 differences or (less often) a list of 16 differences.

I have written this nodejs script to run the commands few times - the scripts exits when it finds that the result of the previous run of the command is different from the result of the last run of the command

var child_process = require("child_process");
var fs = require("fs");

var previousLength = 0;
var i = 0;
var maxNumberOfIterations = 100;

setInterval(function () {
    console.log('iteration ' + i);
    console.log('lastLength ' + previousLength);
    child_process.execSync('cloc --git-diff-rel --csv --by-file --timeout=10 --quiet  6fb8624bad8d62ee14da5c7a527c786b301f7529^1 6fb8624bad8d62ee14da5c7a527c786b301f7529 > out.csv');
    var lastLength = fs.readFileSync('out.csv', 'utf8').split('\n').length;
    if (previousLength && lastLength !== previousLength) {
        console.log('the command has not returned the same number of lines: once it was ' + previousLength + ' and now it is ' + lastLength);
        process.exit(1);
    }
    if (i === maxNumberOfIterations) {
        console.log('the command has returned the same number of lines every time after ' + i + ' iterations');
        process.exit(0);
    }
    i++;
    previousLength = lastLength;
}, 100);

Attached below you may find both the results I get, the one that returns 11 records and the one that returns 16 records.

out_11.csv
out_16.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@AlDanial @EnricoPicci and others