Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some specific Specman e file causes cloc to hang #206

Closed
mojotooth opened this issue Jul 18, 2017 · 13 comments
Closed

Some specific Specman e file causes cloc to hang #206

mojotooth opened this issue Jul 18, 2017 · 13 comments

Comments

@mojotooth
Copy link

Hey Al. It might be touchy to solve this one. I'll need to heavily obfuscate the code before I can send a snippet. Perhaps you can help me get some debug help here and maybe it won't be necessary.

I have one particular Specman e file that causes cloc to hang. That is, 18 hours later it hasn't made forward progress on that file as far as I can tell.

I have done two things to get some debug information. First, I ran with verbosity set to some high number. Secondly, I ran with the perl debugger to see what exactly was hanging.

The high-verbosity printout follows:

-> no_autogen(0)
<- no_autogen()
-> uncompress_archive_cmd(/nfs/pdx/disks/ngc.design.2/sross/specman/testcase.e)
<- uncompress_archive_cmd
-> make_file_list(testcase.e)
Using temp file list [/tmp/M9Js3ZSkOb]
       1 text file.
classifying testcase.e
-> classify_file(testcase.e)
testcase.e extension=[e]
<- make_file_list()
-> remove_duplicate_files
<- remove_duplicate_files
       1 unique file.                              
-> call_counter(testcase.e, Specman e)
-> read_file(testcase.e)
<- read_file
-> rm_blanks(language=Specman e)
-> remove_matches(pattern=^\s*$)
<- remove_matches
<- rm_blanks(language=Specman e)
-> rm_comments(file=testcase.e)
rm_comments file=testcase.e sub=pre_post_fix
-> pre_post_fix with '>, <'
<- pre_post_fix
-> remove_matches(pattern=^\s*$)
<- remove_matches
rm_comments file=testcase.e sub=remove_between_general
-> remove_between_general(start=^'>, end=^<')
<- remove_between_general
-> remove_matches(pattern=^\s*$)
<- remove_matches
rm_comments file=testcase.e sub=call_regexp_common
-> call_regexp_common for C++
Using temp dir [/tmp/pkNRrGnLJG] to install Regexp::Common
^C
65001.970u 10.180s 18:07:28.96 99.6%    0+0k 0+64io 0pf+0w

The ^C at the bottom is me coming back to the terminal and finally killing the process, 18 hours later.

I ran cloc under perl debugger and I found that this is the specific line of cloc that is hanging.

    if ($all_lines =~ $RE{comment}{$language}) {
        # Suppress "Use of uninitialized value in regexp compilation" that
        # pops up when $1 is undefined--happens if there's a bug in the $RE
        # This Pascal comment will trigger it:
        #         (* This is { another } test. **)
        # Curiously, testing for "defined $1" breaks the substitution.
        no warnings;
        # remove   comments
        $all_lines =~ s/$1//g;             # <<<<<<<<<<<<<<<<<<< THIS LINE
    }

Stepping into that marked line above causes the hang. If I re-execute the debugger and do a "print $1" before stepping on that line, the debugger prints out an empty line. I don't know if there are whitespace tidbits on the blank line, or what. Maybe it's literally a blank string. I'm surprised that perl will hang without warning if it executes a substitution wherein it is looking for an empty string. I don't know how to make perl tell me what's going on so that I can figure out why that substitution is hanging. I will continue to try to narrow down my testcase to see if I can isolate the specific line of data that causes the hang.

Do you have any steps you want me to take?

Thanks for your time.

@AlDanial
Copy link
Owner

First off, I appreciate the debug efforts you've put in so far, saves me a lot of work. The line where things hang looks innocuous enough. My gut feel is the C++ regex from Regexp::Common freaked out on the input (which is $all_lines btw, would love to see the contents of that before it gets inside the if statement) and caused an infinite loop.

From past experience, problematic C and C++ input happens when people decorate their code with things like

   /*///////////////////////////////*/
   /***************////////******//

and so on, in other words, inadvertently mix /*,*/ and // comment markers.
Also problematic are C/C++ comment markers within text strings--cloc doesn't recognize the strings and goes straight for the comment markers. Things break down if it sees a /* but not its closing */ for example.

If you could bisect the code and still reproduce the hang (anything longer than 5 seconds for one file and you might as well kill it as it is hung) that would be most helpful. Ideally you'll be able to trim it down to just a few lines of code and if the cause of the problem isn't obvious then, then at least the obfuscation work will be easier and you could send it.

@mojotooth
Copy link
Author

I've figured out that it's not a "hang" per se, but rather a highly exponential performance loss based on the number of lines in the file. I have figured out the exact character that I can delete to make the problem go away. You're right; it's a case where there is a multi-line comment token "/*" accidentally embedded inside a quoted string, with no equivalent matching closing token. But if I shorten the testcase, keeping this problem in the file, cloc will eventually finish the analysis. It reports that it's finishing about ~30 lines per second when the problem is present, when the file is about 1000 lines long. Normally the file is about 5000 lines long and that apparently translates to "not finishing in 18 hours."

Is this an intelligence that you think you can grant to cloc? I don't own the analyzed code in question, so getting this particular landmine removed will be problematic for me. But I realize that cloc isn't trying to be a universal parser, either.

Thanks!

@AlDanial
Copy link
Owner

Being able to understand what makes a string is difficult. The real solution is to correctly parse each language according to its syntax rules but I don't have an easy avenue to that.
It's kind of a pathetic work-around, but I'll add a timeout to this section of code so that at least you won't have to wait hours. When it hits a problematic file, it will issue a warning then move to the next file.

@AlDanial
Copy link
Owner

Please give 2d19bf8 a try with your original file set. It won't catch the unmatched /*, but it should take a lot less time and complain about the problem file.

@mojotooth
Copy link
Author

I tried this version and it still hangs on the file. I do see examples where the timeout is exceeded, but not for this particular file. :(

Line count, exceeded timeout: foo/bar/bzz.e

What if you just analyzed the string that the regexp is about to use, and determined that the string was pathological, likely to cause a hang, and just skipped it instead? Probably with a warning.

@AlDanial
Copy link
Owner

Nuts. It is going to be really tough to make progress on this without being able to duplicate the problem on my own. As far as analyzing the string, the string in question is the contents of the entire file. Figuring out what makes up a string which encompasses an unmatched /* is difficult because I'd need to separate code string from commented strings--a chicken and egg problem if I can't use the regex to remove comments first.

@mojotooth
Copy link
Author

Ok. Will look at ways to mitigate the time sink of obfuscating the code.

@mojotooth
Copy link
Author

Please find attached the obfuscated code. In order to paste it I had to change the extension from .e to .txt so you'll wanna change that back.
obfuscate_this.txt

@AlDanial
Copy link
Owner

cloc runs without issue for me on this file:

> mv obfuscate_this.txt issue206_specman.e
> md5sum issue206_specman.e 
645f890dc1d3fb441352831b69c8bb9f  issue206_specman.e
> cloc issue206_specman.e 
       1 text file.
       1 unique file.                              
       0 files ignored.

github.com/AlDanial/cloc v 1.73  T=0.09 s (10.9 files/s, 43749.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Specman e                        1            542            908           2575
-------------------------------------------------------------------------------

@mojotooth
Copy link
Author

Argh. What version of Perl?

@AlDanial
Copy link
Owner

Perl v5.22.1

> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.2 LTS
Release:        16.04
Codename:       xenial
Ubuntu 16.04 
Linux vex 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

@mojotooth
Copy link
Author

Ok. I have replicated the problem on several versions of perl up to 5.8.7.

Perl 5.12.1 and onwards appears not to demonstrate the problem, although the problematic line apparently results in incorrect code line count for that file.

Thanks for your help.

@AlDanial
Copy link
Owner

OK, will chalk this up as a Perl version issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants