Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git archive error when filenames have a hyphen followed by a space #320

Closed
vigoku opened this issue Aug 10, 2018 · 16 comments
Closed

git archive error when filenames have a hyphen followed by a space #320

vigoku opened this issue Aug 10, 2018 · 16 comments

Comments

@vigoku
Copy link

vigoku commented Aug 10, 2018

Just sharing an issue seen in v1.76 and also in master cloc file

A file that is named - "PE01 - Points Earned.js" (without double quotes) tends to fail when doing cloc with --git --diff due to the fix for -# backslash whitespace within file names (#257).

The intermediate command that gets generated is ::
git archive -o x.tar PE01\ -\ Points Earned.js

Unfortunately, git archive seems to interpret anything with hyphen as an option leading to the error..

error: unknown switch `'

and then eventually

Failed to create tarfile of files from git. at script/cloc-1.76.pl line 4179.

what works is --
git archive -o x.tar 'PE01 - Points Earned.js'

So, i fiddled with the cloc-1.76.pl, at line number 4126 and 4128 and changed
map {$ =~ s/(\s)/\$1/g; $}
to
map {$ =~ s/(^.(\s).$)/'$1'/g; $}

and things seemed to work.

Also I observed this issue in the latest version too .. but not too sure about the fix

@AlDanial
Copy link
Owner

Thanks for doing the troubleshooting, that saved me a lot of time. I updated tests/make_git_repo.bash to create more problematic filenames, then rewrote cloc's logic to handle them all. At the moment I don't have an automated way to do the git diff testing so if you want to use that script, run it in /tmp or something, then invoke a few cloc --git --diff runs using the first & last commit hashes.

In any event, let me know if the code update works for you.

@vigoku
Copy link
Author

vigoku commented Aug 13, 2018

Thank you for the quick fix!! I was blocked at the point of git archive, it has now moved beyond it. So the reported issue is surely fixed. Also i did confirm that a non-diff based count works fine.

Post the archival I am getting some new error messages. I've enabled verbose logs but still not figure out. Is there any further detailed level of logging that i could enable to debug further?

git ls-tree --name-only -r b1fe5887dc9c1baceb7883ea27a850192b39aa03
git ls-tree --name-only -r b3c9f77cbe0a648153d585a7cc4d206daf5e8c84
git diff-tree -r --no-commit-id --name-only b1fe5887dc9c1baceb7883ea27a850192b39aa03 b3c9f77cbe0a648153d585a7cc4d206daf5e8c84
git archive -o /tmp/hrH6E7XcbF.tar b1fe5887dc9c1baceb7883ea27a850192b39aa03
git archive -o /tmp/jqcY6re_CG.tar b3c9f77cbe0a648153d585a7cc4d206daf5e8c84
mkdir temp/1
cd temp/1
tar xf '/tmp/hrH6E7XcbF.tar'
mkdir temp/2
cd temp/2
tar xf '/tmp/jqcY6re_CG.tar'
247 text files.
266 text files.
Use of uninitialized value $file in scalar chomp at ..\cloc line 4728, line 248.
Use of uninitialized value $file in hash element at ..\cloc line 4729, line 248.
Argument "cancel_selfService-spec.js\n" isn't numeric in sort at ..\cloc line 4741, line 263. <vgk note :: could it be the addition of \n by the program is causing some wierdness?>
Use of uninitialized value in hash element at ..\cloc line 4747, line 263.
Use of uninitialized value $file in scalar chomp at ..\cloc line 4728, line 248.
Use of uninitialized value $file in hash element at ..\cloc line 4729, line 248.
Argument "cancel_selfService-spec.js\n" isn't numeric in sort at ..\cloc line 4741, line 275.
Use of uninitialized value in hash element at ..\cloc line 4747, line 275.
Using temp dir [temp/3] to install Regexp::Common
Unable to read temp/2/test/mockedResponses/fetchProfiles/singleMakedResponses/fetchPromotions/0promo.json <vgk note (edited) :: Verified that file does exist. is it too many characters for windows?>
Unable to read temp/2/test/mockedResponses/fetchProfiles/singleMakedResponses/fetchPromotions/0promo.json
Unable to read temp/2/test/mockedResponses/fetchProfiles/singleMakedResponses/fetchPromotions/0promo.json
Use of uninitialized value $Lang_L in hash element at ..\cloc line 2158.
Use of uninitialized value $Lang_R in hash element at ..\cloc line 2158.
Use of uninitialized value $Lang_L in string eq at ..\cloc line 2167.
Use of uninitialized value $Lang_R in string eq at ..\cloc line 2167.
Use of uninitialized value $Lang_L in hash element at ..\cloc line 2169.
Can't use an undefined value as an ARRAY reference at ..\cloc line 2167.

@AlDanial
Copy link
Owner

Looks like possibly another issue with an unusually named file. There's no built-in debugging statement that will help trap this so the easiest thing to do is insert an extra print statement before the problem appears. Here's the before and after code:

before

 4725     while (<$fh>) {
 4726         ++$n;
 4727         my ($size_in_bytes, $language, $file) = split(/,/, $_, 3);
 4728         chomp($file);
 4729         $rh_Language->{$file} = $language;

after

 4725     while (<$fh>) {
 4726         ++$n;
 4727     print "remove_duplicate_files:$_";     # debug print statement <--
 4728         my ($size_in_bytes, $language, $file) = split(/,/, $_, 3);
 4729         chomp($file);
 4730         $rh_Language->{$file} = $language;

in other words, add the line after 4726. The complaint in your output is that $file wasn't defined when it tried to split the comma-separated line. I'm curious to see what the original input line, $_, contains before it prints the error.

@AlDanial AlDanial mentioned this issue Aug 21, 2018
@nijikon
Copy link

nijikon commented Aug 21, 2018

@vigoku any reply to what @AlDanial wrote? Asking because this is blocking the release 😃

@vigoku
Copy link
Author

vigoku commented Aug 24, 2018

@nijikon Apologies, i was stuck in some stuff, will update ASAP.

@vigoku
Copy link
Author

vigoku commented Aug 24, 2018

Updating as per suggestion from @AlDanial ::::

Debug-AfterAdditionOfPrintStatement.txt
debug_beforeLineAddition.txt

@vigoku
Copy link
Author

vigoku commented Aug 24, 2018

Also i noticed an error at
Use of uninitialized value $Lang_L in hash element at ..\cloc line 2158.
Use of uninitialized value $Lang_R in hash element at ..\cloc line 2158.
Use of uninitialized value $Lang_L in string eq at ..\cloc line 2167.
Use of uninitialized value $Lang_R in string eq at ..\cloc line 2167.
Use of uninitialized value $Lang_L in hash element at ..\cloc line 2169.
Can't use an undefined value as an ARRAY reference at ..\cloc line 2167.

So, i uncommented the print -- 2138 - -print "main step 6 file_L=$file_L file_R=$file_R\n";

I got the below additional logs

Debug-AdditionalLogs-mssingfile_L.txt

@AlDanial
Copy link
Owner

Thanks for the debug files. One of two things is happening: 1/a temporary file cloc writes of file names it is considering counting is getting corrupted in the middle of the run, or 2/you have a file name with an embedded newline or other character movement control (newline, line feed, vspace, etc). The first one seems unlikely, and I've never before run into the situation suggested by the second one.

I'd like to see the contents of the two temp files. The three lines denoted by Debug La, Lb, and Lc will create them (one for the diff left set and one for the right):

 1485 foreach my $FH (@fh) {  # loop over each pair of file sets
 1486     ++$n_set;
 1487 seek($FH, 0, 0); # rewind file handle              # Debug La
 1488 my @all_lines = <$FH>;                             # Debug Lb
 1489 write_file("debug_set_${n_set}.txt", @all_lines);  # Debug Lc
 1490     remove_duplicate_files($FH,

Please add the lines to your cloc file, run with the updates, then post the files debug_set_1.txt and debug_set_2.txt.

@vigoku
Copy link
Author

vigoku commented Aug 27, 2018

Thank you for the debug instructions.

Here is the output : :
debug_set_1.txt
debug_set_2.txt

Also the logs generated -
debug-27-Aug.txt

@AlDanial
Copy link
Owner

Line 133 in both debug_set_1.txt and debug_set_2.txt files is messed up:

132 2012,JSON,temp/2/test/mockedResponses/fetchProfiles/singleMatchNotPinProtected.json
133 1993,JSON,temp/2/test/mockedResponses/fetchProfiles/singleMa5,JSON,temp/1/test/mockedResponses/fetchTripsHistory/5stay.json
134 1914,JSON,temp/1/test/mockedResponses/updateAddress/joe.json

It looks like there's a file that starts with temp/2/test/mockedResponses/fetchProfiles/singleMa that has weird characters after that last a. Can you check if that's the case?

For now I'll add foolproofing logic to the cloc code that ingests these temp files and just skip the problem lines.

@AlDanial
Copy link
Owner

Please grab the code from master (commit dc7cf04) and test it with your inputs. It skips inputs it can't understand. This isn't ideal (the ideal would be to properly treat whatever weird characters are in the file names) but should be enough for a new stable release.

@vigoku
Copy link
Author

vigoku commented Aug 28, 2018

@AlDanial That took care of the issue. Thank you for fixing this !!

Just to explore a bit, I just did a comparison between 1.76 version vs 1.77. There are some new languages supported, but just looking at the xml, JS, JSON code count, I think (due to skipping?) the modified line count seems to be always 0.

CodeCountv1.76.txt
CodeCount-v1.77.txt

@AlDanial
Copy link
Owner

Thanks for testing 1.76 v. 1.77 on your code base -- I did refine the git diff logic in 1.77 but it retrospect it isn't clear which logic is better. I'm thinking I'll need to support both, perhaps with a new switch, --diff2.

Here's the difference in logic: 1.76 only diff'ed files that changed between the two git commits. 1.77 diffs all files in commit 1 against all files in commit 2. If few files changed, 1.77 will show lots of zero modifications/additions/deletions.

Since I don't have access to your code, let's work with a repo we both can reach: the Python 'requests' module here on github. The two git hashes I chose are about a year apart:

git clone https://github.com/requests/requests.git
cd requests/
cloc         -v --git --diff ad1aff52a0 9cfd292da3 | tee diff.177
cloc-1.76.pl -v --git --diff ad1aff52a0 9cfd292da3 | tee diff.176

The summary section from the two diff.* files is

# diff.177
-------------------------------------------------------------------------------
SUM:
 same                           23            291           1922           6366
 modified                       27              0             22            125
 added                           3            102             74            405
 removed                         0              5              4             37
-------------------------------------------------------------------------------

and

# diff.176
-------------------------------------------------------------------------------
SUM:
 same                            0              0           1803           5146
 modified                       27              0             22            125
 added                           3            102             74            405
 removed                         0              5              4             37
-------------------------------------------------------------------------------

The results match except for the same counts. It strikes me that the 1.76 logic might be more desirable, so I should relegate the 1.77 logic to --diff2.

The summary block in your two CodeCount*.txt files tells a slightly different story though. Since I can't check the file alignment and what happened exactly, not much for me to do except guess.

Perhaps the requests repo is too simple and doesn't capture complexity your repo has. If you can find another publicly available repo that exhibits results similar to yours we should be able to make progress.

@AlDanial
Copy link
Owner

AlDanial commented Sep 3, 2018

I fixed the git diff logic to match that in previous releases. My plan is to release v 1.78 based on the current git master by Sept. 8 so please confirm that it works the way you desire.

@vigoku
Copy link
Author

vigoku commented Sep 5, 2018

Hey @AlDanial, i could not get time to test things as per your previous comment. But i verified the latest one and its working similar to 1.76.

Thanks for the release plan!

I will try to find a public repo for the "modified 0" line issue and probably raise a separate issue?

@AlDanial
Copy link
Owner

AlDanial commented Sep 7, 2018

The fix appears in the just-released v 1.78.

@AlDanial AlDanial closed this as completed Sep 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants