Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# line count is broken if verbatim string with backslash is present #71

Closed
ptupitsyn opened this issue Apr 30, 2019 · 9 comments · Fixed by #76
Closed

C# line count is broken if verbatim string with backslash is present #71

ptupitsyn opened this issue Apr 30, 2019 · 9 comments · Fixed by #76
Assignees
Labels
bug Something isn't working

Comments

@ptupitsyn
Copy link

ptupitsyn commented Apr 30, 2019

Describe the bug
Count is broken when there is a line in C# file with verbatim string in it like this:
private const string BasePath = @"a:\";.

To Reproduce

  1. Download file SccTokeiFailure.zip
  2. Extract
  3. scc

Actual behavior

Language                 Files     Lines     Code  Comments   Blanks Complexity
C#                           1        20       20         0        0          0

Expected behavior

Language                 Files     Lines     Code  Comments   Blanks Complexity
C#                           1        20       14        3        3          0

Tested on

  • Debian 9.7
  • Windows 10
@boyter boyter added the bug Something isn't working label Apr 30, 2019
@boyter boyter self-assigned this Apr 30, 2019
@boyter
Copy link
Owner

boyter commented Apr 30, 2019

Thank you for the detailed bug report. I had forgotten that in C# you can prefix the @ which should disable the \ prefix check that exists inside scc.

Looks like tokei is also affected by this, although it counts the blanks still oddly...

This actually ties into #62 somewhat as it means specific rules need to be able to added to some string types. Docstrings for example should be counted as comments and in this case the escape should be ignored.

Annoyingly this is going to require a modification to the languages.json to accommodate this change.

Strings are currently handled via

"quotes": [
  [
    "@\"",
    "\""
  ],
  [
    "\"",
    "\""
  ]
]

I am thinking this needs to be modified to something like the below which allows these conditions to be taken into account and would allow for expansion for future edge cases,

C# (subset)
"quotes": [
  {
    "startEnd": [
      "@\"",
      "\""
    ],
    "ignoreEscape": true
  },
  [
    "\"",
    "\""
  ]
]

Python (subset)
"quotes": [
  {
    "startEnd": [
      "\"\"\"",
      "\"\"\""
    ],
    "isComment": false
  }
]

@ptupitsyn
Copy link
Author

Yep, Tokei is also affected in a slightly different way, filed the bug for them too #330.

@boyter
Copy link
Owner

boyter commented May 1, 2019

Its actually rather interesting because you can break all the parsers in interesting ways. For example,

@"C:\" /*
a */

Should be 1 line of code and 1 comment, but is 2 lines of code for all parsers.

If you add support for @" and don't skip over the @" portion then you get something like the below,

@"\ /*"
a

Which then does enter the comment state because it enters it on the @ and leaves on the ". Certainly a very edge case bug, and if it were not for your example of @"a:\" which is actually something likely to be in a lot of C# code bases I wouldn't bother fixing it.

As it is I can fix this and add Python DocStrings at the same time so I may as well since I was planning on doing that anyway.

@ptupitsyn
Copy link
Author

That would be awesome!

Some context if it matters: I scan a lot of code bases with Cloc for code analysis purposes. Cloc has it's own issues (like timeouts on seemingly simple files, giving flaky results, and limited comment handling). So I'm looking for alternatives, but so far Cloc was the most accurate (even though a lot slower).

@boyter
Copy link
Owner

boyter commented May 1, 2019

Oh thats interesting. I have not seen any situations where cloc would be more accurate than scc or tokei as it is unaware on how to count strings, see https://github.com/AlDanial/cloc#Limitations for details.

I also remembered why tokei is giving odd results, its because it counts a line as blank, even if it is inside a string so the following python,

t = '''some

thing'''

Would be counted as having a blank line. I don't agree with this behavior as it is still technically code and treated as such by the parser or compiler hence the difference there.

I have started work on the following branch https://github.com/boyter/scc/tree/issue71 to resolve both this and #62 but I will probably start with this issue as it is slightly easier to resolve.

@ptupitsyn
Copy link
Author

unaware on how to count strings, see https://github.com/AlDanial/cloc#Limitations for details

Yep, I almost know this page by heart at this point 😄

I have not seen any situations where cloc would be more accurate than scc or tokei

Comment markers within strings are rare, while stuff like @"c:\" is not rare at all and has bigger impact.

And Tokei is completely unusable, unfortunately, due to this XAMPPRocky/tokei#301. Fails on some files and does not count them.

@boyter
Copy link
Owner

boyter commented May 2, 2019

That last bug is odd... Why is it even trying to decode the file at all? scc just works on the bytes, since most programming files are ASCII and I can live with the slight loss of precision if someone is embedding odd things to try and throw it off. It also avoids those issues entirely.

Fair enough that cloc is working for you because of that specific issue. Ill get around to resolving it and then put it in the README as a reason to use it over other counters.

@boyter
Copy link
Owner

boyter commented May 6, 2019

A fix is sitting on #76 if you want to build from that branch and report back. It fixes the cases you have mentioned above, and a few others I invented to help catch it.

$ go run main.go examples/tricky2.cs
───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines     Code  Comments   Blanks Complexity
───────────────────────────────────────────────────────────────────────────────
C#                           1        20       14         3        3          0
───────────────────────────────────────────────────────────────────────────────
Total                        1        20       14         3        3          0
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop $305
Estimated Schedule Effort 0.708275 months
Estimated People Required 0.051095
───────────────────────────────────────────────────────────────────────────────

NB the change on that branch still needs development work. It passes all tests but is not ready to be merged so the final fix is still some time away.

@boyter
Copy link
Owner

boyter commented Jun 16, 2019

This is merged in now. Build from master for the latest.

Still a few things to implement before a release but its getting close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants