New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git archive error when filenames have a hyphen followed by a space #320
Comments
Thanks for doing the troubleshooting, that saved me a lot of time. I updated In any event, let me know if the code update works for you. |
Thank you for the quick fix!! I was blocked at the point of git archive, it has now moved beyond it. So the reported issue is surely fixed. Also i did confirm that a non-diff based count works fine. Post the archival I am getting some new error messages. I've enabled verbose logs but still not figure out. Is there any further detailed level of logging that i could enable to debug further? git ls-tree --name-only -r b1fe5887dc9c1baceb7883ea27a850192b39aa03 |
Looks like possibly another issue with an unusually named file. There's no built-in debugging statement that will help trap this so the easiest thing to do is insert an extra print statement before the problem appears. Here's the before and after code: before 4725 while (<$fh>) { 4726 ++$n; 4727 my ($size_in_bytes, $language, $file) = split(/,/, $_, 3); 4728 chomp($file); 4729 $rh_Language->{$file} = $language; after 4725 while (<$fh>) { 4726 ++$n; 4727 print "remove_duplicate_files:$_"; # debug print statement <-- 4728 my ($size_in_bytes, $language, $file) = split(/,/, $_, 3); 4729 chomp($file); 4730 $rh_Language->{$file} = $language; in other words, add the line after 4726. The complaint in your output is that |
@nijikon Apologies, i was stuck in some stuff, will update ASAP. |
Updating as per suggestion from @AlDanial :::: Debug-AfterAdditionOfPrintStatement.txt |
Also i noticed an error at So, i uncommented the print -- 2138 - -print "main step 6 file_L=$file_L file_R=$file_R\n"; I got the below additional logs |
Thanks for the debug files. One of two things is happening: 1/a temporary file cloc writes of file names it is considering counting is getting corrupted in the middle of the run, or 2/you have a file name with an embedded newline or other character movement control (newline, line feed, vspace, etc). The first one seems unlikely, and I've never before run into the situation suggested by the second one. I'd like to see the contents of the two temp files. The three lines denoted by Debug La, Lb, and Lc will create them (one for the diff left set and one for the right):
Please add the lines to your cloc file, run with the updates, then post the files |
Thank you for the debug instructions. Here is the output : : Also the logs generated - |
Line 133 in both
It looks like there's a file that starts with For now I'll add foolproofing logic to the cloc code that ingests these temp files and just skip the problem lines. |
Please grab the code from master (commit dc7cf04) and test it with your inputs. It skips inputs it can't understand. This isn't ideal (the ideal would be to properly treat whatever weird characters are in the file names) but should be enough for a new stable release. |
@AlDanial That took care of the issue. Thank you for fixing this !! Just to explore a bit, I just did a comparison between 1.76 version vs 1.77. There are some new languages supported, but just looking at the xml, JS, JSON code count, I think (due to skipping?) the modified line count seems to be always 0. |
Thanks for testing 1.76 v. 1.77 on your code base -- I did refine the git diff logic in 1.77 but it retrospect it isn't clear which logic is better. I'm thinking I'll need to support both, perhaps with a new switch, Here's the difference in logic: 1.76 only diff'ed files that changed between the two git commits. 1.77 diffs all files in commit 1 against all files in commit 2. If few files changed, 1.77 will show lots of zero modifications/additions/deletions. Since I don't have access to your code, let's work with a repo we both can reach: the Python 'requests' module here on github. The two git hashes I chose are about a year apart:
The summary section from the two
and
The results match except for the The summary block in your two Perhaps the requests repo is too simple and doesn't capture complexity your repo has. If you can find another publicly available repo that exhibits results similar to yours we should be able to make progress. |
I fixed the git diff logic to match that in previous releases. My plan is to release v 1.78 based on the current git master by Sept. 8 so please confirm that it works the way you desire. |
Hey @AlDanial, i could not get time to test things as per your previous comment. But i verified the latest one and its working similar to 1.76. Thanks for the release plan! I will try to find a public repo for the "modified 0" line issue and probably raise a separate issue? |
The fix appears in the just-released v 1.78. |
Just sharing an issue seen in v1.76 and also in master cloc file
A file that is named - "PE01 - Points Earned.js" (without double quotes) tends to fail when doing cloc with --git --diff due to the fix for -# backslash whitespace within file names (#257).
The intermediate command that gets generated is ::
git archive -o x.tar PE01\ -\ Points Earned.js
Unfortunately, git archive seems to interpret anything with hyphen as an option leading to the error..
error: unknown switch `'
and then eventually
Failed to create tarfile of files from git. at script/cloc-1.76.pl line 4179.
what works is --
git archive -o x.tar 'PE01 - Points Earned.js'
So, i fiddled with the cloc-1.76.pl, at line number 4126 and 4128 and changed
map {$ =~ s/(\s)/\$1/g; $}
to
map {$ =~ s/(^.(\s).$)/'$1'/g; $}
and things seemed to work.
Also I observed this issue in the latest version too .. but not too sure about the fix
The text was updated successfully, but these errors were encountered: